The Daily WTF: Curious Perversions in Information Technology

Steve_The_Cynic · 2023-06-08 Reply Admin

Frist of all, if the character in the input string isn't valid for base36 (it's a parethesis, for example), the function provided will cause undefined behaviour because map::find() will return mapvar.end(), and dereferencing that is undefined behaviour.

And of course this:

In this case, it likely wouldn't do a search, and still be O(1), but it's the wrong method to call when at or [] would be more correct

Is incorrect, since it's a std::map, which guarantees only O(log N) searches. (std::map is normally a red-black tree.)

And [] is ever more wrongerer, because it will create an entry with an uninitialised .second(1)(2), which will go very wrong.

(1) See https://en.cppreference.com/w/cpp/language/default_initialization especially this text:

Default initialization of non-class variables with automatic and dynamic storage duration produces objects with indeterminate values

(2) It's something of a wart on the definition of std::map, that operator [](key) is not const, and creates a default-initialised object at the given key if the key is not found. at least you can use at()to work around that lack, provided you are willing to catch a std::out_of_range() exception if the key is not found.

Overall, I would say that the given approach is very, very, very far from the best solution.

2023-06-08 Reply Admin

"ever more wrongerer" -- u win the internetz.

I wonder if this snippet of code (like so many others) could be used to test a candidate in the course of a job interview?

"How many wrong things can you find in this snatch of code?"

2023-06-08 Reply Admin

Yes

2023-06-08 Reply Admin

I once ended up in an interview like this. They had 2 Java classes full of WTFeries. The guy asked me how I would fix this. My reply:"It would be quicker to throw it all away and rewrite from scratch.". Apparently this was the wrong answer.

2023-06-08 Reply Admin

Ah, the memories. Long ago I was asked in an interview how I would determine the volume of water in the Earth's oceans. The interviewer wasn't pleased when I said I'd just look it up. (I knew they were looking for a solution that minimized error but I had already decided I wasn't interested in working there....)

2023-06-08 Reply Admin

And [] is ever more wrongerer, because it will create an entry with an uninitialised .second(1)(2), which will go very wrong.

It is not uninitialised, it is value-initialised. For "int" that means it is initialised to zero. Which is still wrong, but at least it's defined behaviour.

2023-06-08 Reply Admin

Is there a guide to the valid markup here? I keep wanting to type code but it I don't know how.

2023-06-08 Reply Admin

Kind of reminds me when I was asked (in a C# role) how I'd reverse a string. I replied, "I'd use string.Reverse()".

When asked what if the framework didn't have such capabilities, I replied, "I'd start looking for a new framework."

Oddly enough, I got home, called the recruiter to let her know I'd been up there, and during that call, the company called her to say the loved me, and asked if I could come back for a second interview with the VP-type guy that afternoon.

2023-06-08 Reply Admin

Take infinite slices of the planet, determine the curve describing the surface above water and surface under water of each slice and than use integration to calculate the volume of the "wet" and "dry" earths respectively. Than subtract the two.

Useless questions get technically correct but useless answers.

Steve_The_Cynic · 2023-06-08 Reply Admin

Ugh, yes, you're right. That means that the description on cppreference.com is inconsistent, since:

Inserts value_type(key, T()) if the key does not exist.

and

mapped_type must meet the requirements of CopyConstructible and DefaultConstructible

but

If an insertion is performed, the mapped value is value-initialized (default-constructed for class types, zero-initialized otherwise) and a reference to it is returned.

Because that last part is inconsistent with "T()" as the initialisation.

MaxiTB · 2023-06-08 Reply Admin

char* digits = "0123456789abcdefghijklmnopqrstuvwxyz";

2023-06-08 Reply Admin


int getBase36(char c) {
  char* digits = "0123456789abcdefghijklmnopqrstuvwxyz";
  return strpos(digits, c, 0);
}

cellocgw · 2023-06-08 Reply Admin

Coincidentally, I'm finishing up a package in the R language that converts any base to any other base in the range 2:36. And, yes, I check for illegal characters first, (AND zero-length inputs) and then do some simple indexing . strtoi won't cut it for me, because I want to allow arbitrary size integers. Thanks to gmp lib for helping there.

2023-06-08 Reply Admin

This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

Well, actually I'd use the standard library, though, so oh well. Still, it's the thought that counts.

2023-06-08 Reply Admin

Where's std::make_pair?

2023-06-08 Reply Admin

Also, where's b36.emplace?

2023-06-08 Reply Admin

It uses markdown, similar to Stack Overflow. Use single backticks around inline code, and triple-backtick code fences before/after multi-line code blocks.

LorenPechtel · 2023-06-08 Reply Admin

This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

EBCDIC support was my first thought on seeing it.

MaxiTB · 2023-06-08 Reply Admin

Code for both ASCII/EDCDIC tho I would obviously use platform defines to have two separate implementations:

int getBase36(char c)
{
  if(c >= '0' && c <= '9') return c - '0';
  
  if(c >= 'a' && c <= 'z')
  {
    if(c <= 'i') return c - 'a' + 10;
    if(c >= 'j' && c <= 'r') return c - 'j' + 10 + 9;
    if(c >= 's') return c - 's' + 10 + 9 + 9;
  }
  
  if(c >= 'A' && c <= 'Z')
  {
    if(c <= 'I') return c - 'A' + 10;
    if(c >= 'J' && c <= 'R') return c - 'J' + 10 + 9;
    if(c >= 'S') return c - 'S' + 10 + 9 + 9;
  }

  return -1;
}

Addendum 2023-06-08 16:01: And yes, because we are talking here very niche charset, it of course works for Commodore 64 as well :-)

2023-06-08 Reply Admin

@ rosuav ref

This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

In the event you were serious ...

And that is exactly the kind of domain-mixing that produces deeply buried bugs. A char is not a number. A char cannot be meaningfully compared to a number. At least it can't be and keep your code operating within the domain of chars.

Yes, that kind of crap was commonplace 30 years ago. Doesn't mean it was smart then or correct then or now. Even if it generated the right answer, it did so for the wrong reason.

As MaxiTB demonstrates with his (I hope sarcastic) but very creative horror.

Mr. TA · 2023-06-08 Reply Admin

Obligatory joke about missing an API call and XML serialization goes here

2023-06-09 Reply Admin

Ah, people are already starting to pass CrapGPT advice off as their own.

MaxiTB · 2023-06-09 Reply Admin

Kinda, that's actually how you would efficiently implement it for EDCDIC (which also means it would work for ASCII, but would be way more inefficient because of the additional nested conditions).

While I agree with you that you should never treat a character as a number, on a low level like that you can get away with clean arithmetic like I showed up (I left intentional the const operations so it becomes clear whats going on with the offset calculations). Using a hash lookup table would be insanely expensive in comparison (we are talking factor 100+ here) and it generates a ton of stack usage as well (so basically from zero two to whatever the lookup methods need if you make the map itself static).

So yeah, for low level operations this would be a proper implementation, but obviously the point of having low level functions is that you can separate it from the rest of the code to stay "cleaner".

2023-06-09 Reply Admin

Because that last part is inconsistent with "T()" as the initialisation.

I don't follow. "T()" means value-initialisation.

MaxiTB · 2023-06-09 Reply Admin

I had to check if ChatGPT3 can pull off that simple algorithm; Close but no:

int getBase36Value_ASCII(char c) {
    if (c >= '0' && c <= '9') {
        return c - '0';
    } else if (c >= 'A' && c <= 'Z') {
        return c - 'A' + 10;
    } else if (c >= 'a' && c <= 'z') {
        return c - 'a' + 10;
    }
    
    // Character is not alphanumeric
    return -1;
}

int getBase36Value_EBCDIC(char c) {
    if (c >= '0' && c <= '9') {
        return c - '0';
    } else if (c >= 'A' && c <= 'I') {
        return c - 'A' + 10;
    } else if (c >= 'J' && c <= 'R') {
        return c - 'A' + 9;
    } else if (c >= 'S' && c <= 'Z') {
        return c - 'A' + 8;
    } else if (c >= 'a' && c <= 'i') {
        return c - 'a' + 10;
    } else if (c >= 'j' && c <= 'r') {
        return c - 'a' + 9;
    } else if (c >= 's' && c <= 'z') {
        return c - 'a' + 8;
    }
    
    // Character is not alphanumeric
    return -1;
}

ChatGPT3 doesn't realize that it's EDCDIC code is actually a subset of the ASCII code and it also doesn't nest the conditions, so you end up with way more comparisons. Plus for some weird reason the EDCDIC offsets are always calculated from the 'a'/'A' character which results in those weird non-descriptive additions. Finally it uses else chains, which is bad in this case because you already have return conditions, so they are literally boiler plate code making everything harder to read. Interestingly it chose -1 as the error result, but in C that's very common. The naming is kinda wrong, C functions usually use lower snakecase.

Honestly I'm surprised that the code is correct; ChatGPT3 is really bad when it comes to math or heavily functional algorithms... or race conditions... or anything really beyond very basic stuff.

MaxiTB · 2023-06-09 Reply Admin

Ha, I told it to combine both functions and then this nonsense got generated:

int getBase36Value(char c) {
    if (c >= '0' && c <= '9') {
        return c - '0';
    } else if (c >= 'A' && c <= 'Z') {
        return c - 'A' + 10;
    } else if (c >= 'a' && c <= 'z') {
        return c - 'a' + 10;
    } else if (c >= 193 && c <= 218) { // EBCDIC uppercase
        return c - 193 + 10;
    } else if (c >= 225 && c <= 250) { // EBCDIC lowercase
        return c - 225 + 10;
    }
    
    // Character is not alphanumeric
    return -1;
}

jeremypnet · 2023-06-10 Reply Admin

Given that modern computers generally have gigabytes of memory, clearly the correct solution* is just to create a lookup table indexed by char. For any character set where the digits and letters used in a base 36 number are representable in 8 bits, the lookup table would only have to be 256 bytes. Use -1 to indicate "not a valid base 36 digit" and you're sorted.

This reminds me of a question I once answered on Stackoverflow which was "what is a fast way to calculate Fibonacci numbers". I suggested pre-calculating all the values and creating a look up table.Even using uint64_t you can only calculate 94 Fibonacci numbers before you experience integer overflow.

*if strtol() is not allowed.

dkf · 2023-06-10 Reply Admin

C functions usually use lower snakecase

That varies wildly from codebase to codebase. There's literally no common rule at all.

MaxiTB · 2023-06-10 Reply Admin

True, that's why I wrote usually and not naming convention dictates ;-) Custom codebases were always a hot mess, however if you check standard libraries, they follow the snake case naming convention for functions up to STL in C++ ;-)

2023-06-12 Reply Admin

And yes, because we are talking here very niche charset, it of course works for Commodore 64 as well :-)

OMG. My life has been so sheltered all these years.

https://en.wikipedia.org/wiki/PETSCII

Base-36 Conversion

Leave a comment on “Base-36 Conversion”