• (nodebb)

    Frist of all, if the character in the input string isn't valid for base36 (it's a parethesis, for example), the function provided will cause undefined behaviour because map::find() will return mapvar.end(), and dereferencing that is undefined behaviour.

    And of course this:

    In this case, it likely wouldn't do a search, and still be O(1), but it's the wrong method to call when at or [] would be more correct

    Is incorrect, since it's a std::map, which guarantees only O(log N) searches. (std::map is normally a red-black tree.)

    And [] is ever more wrongerer, because it will create an entry with an uninitialised .second(1)(2), which will go very wrong.

    (1) See https://en.cppreference.com/w/cpp/language/default_initialization especially this text:

    Default initialization of non-class variables with automatic and dynamic storage duration produces objects with indeterminate values

    (2) It's something of a wart on the definition of std::map, that operator [](key) is not const, and creates a default-initialised object at the given key if the key is not found. at least you can use at()to work around that lack, provided you are willing to catch a std::out_of_range() exception if the key is not found.

    Overall, I would say that the given approach is very, very, very far from the best solution.

  • Prime Mover (unregistered) in reply to Steve_The_Cynic

    "ever more wrongerer" -- u win the internetz.

    I wonder if this snippet of code (like so many others) could be used to test a candidate in the course of a job interview?

    "How many wrong things can you find in this snatch of code?"

  • LZ79LRU (unregistered) in reply to Prime Mover

    Yes

  • Michael R (unregistered) in reply to Prime Mover

    I once ended up in an interview like this. They had 2 Java classes full of WTFeries. The guy asked me how I would fix this. My reply:"It would be quicker to throw it all away and rewrite from scratch.". Apparently this was the wrong answer.

  • Steve (unregistered) in reply to Michael R

    Ah, the memories. Long ago I was asked in an interview how I would determine the volume of water in the Earth's oceans. The interviewer wasn't pleased when I said I'd just look it up. (I knew they were looking for a solution that minimized error but I had already decided I wasn't interested in working there....)

  • Entropy (unregistered) in reply to Steve_The_Cynic

    And [] is ever more wrongerer, because it will create an entry with an uninitialised .second(1)(2), which will go very wrong.

    It is not uninitialised, it is value-initialised. For "int" that means it is initialised to zero. Which is still wrong, but at least it's defined behaviour.

  • FTB (unregistered)

    Is there a guide to the valid markup here? I keep wanting to type code but it I don't know how.

  • Scott (unregistered) in reply to Steve

    Kind of reminds me when I was asked (in a C# role) how I'd reverse a string. I replied, "I'd use string.Reverse()".

    When asked what if the framework didn't have such capabilities, I replied, "I'd start looking for a new framework."

    Oddly enough, I got home, called the recruiter to let her know I'd been up there, and during that call, the company called her to say the loved me, and asked if I could come back for a second interview with the VP-type guy that afternoon.

  • LZ79LRU (unregistered) in reply to Steve

    Take infinite slices of the planet, determine the curve describing the surface above water and surface under water of each slice and than use integration to calculate the volume of the "wet" and "dry" earths respectively. Than subtract the two.

    Useless questions get technically correct but useless answers.

  • (nodebb) in reply to Entropy

    Ugh, yes, you're right. That means that the description on cppreference.com is inconsistent, since:

    Inserts value_type(key, T()) if the key does not exist.

    and

    mapped_type must meet the requirements of CopyConstructible and DefaultConstructible

    but

    If an insertion is performed, the mapped value is value-initialized (default-constructed for class types, zero-initialized otherwise) and a reference to it is returned.

    Because that last part is inconsistent with "T()" as the initialisation.

  • (nodebb)

    char* digits = "0123456789abcdefghijklmnopqrstuvwxyz";

  • dozer (unregistered)
    int getBase36(char c) { char* digits = "0123456789abcdefghijklmnopqrstuvwxyz"; return strpos(digits, c, 0); }
  • (nodebb)

    Coincidentally, I'm finishing up a package in the R language that converts any base to any other base in the range 2:36. And, yes, I check for illegal characters first, (AND zero-length inputs) and then do some simple indexing . strtoi won't cut it for me, because I want to allow arbitrary size integers. Thanks to gmp lib for helping there.

  • rosuav (unregistered)

    This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

    Well, actually I'd use the standard library, though, so oh well. Still, it's the thought that counts.

  • Stella (unregistered)

    Where's std::make_pair?

  • Stella (unregistered)

    Also, where's b36.emplace?

  • Barry Margolin (github) in reply to FTB

    It uses markdown, similar to Stack Overflow. Use single backticks around inline code, and triple-backtick code fences before/after multi-line code blocks.

  • (nodebb) in reply to rosuav

    This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

    EBCDIC support was my first thought on seeing it.

  • (nodebb) in reply to LorenPechtel

    Code for both ASCII/EDCDIC tho I would obviously use platform defines to have two separate implementations:

    int getBase36(char c)
    {
      if(c >= '0' && c <= '9') return c - '0';
      
      if(c >= 'a' && c <= 'z')
      {
        if(c <= 'i') return c - 'a' + 10;
        if(c >= 'j' && c <= 'r') return c - 'j' + 10 + 9;
        if(c >= 's') return c - 's' + 10 + 9 + 9;
      }
      
      if(c >= 'A' && c <= 'Z')
      {
        if(c <= 'I') return c - 'A' + 10;
        if(c >= 'J' && c <= 'R') return c - 'J' + 10 + 9;
        if(c >= 'S') return c - 'S' + 10 + 9 + 9;
      }
    
      return -1;
    }
    

    Addendum 2023-06-08 16:01: And yes, because we are talking here very niche charset, it of course works for Commodore 64 as well :-)

  • WTFGuy (unregistered)

    @ rosuav ref

    This code is awesome because, unlike what I would have done (checking ranges of ASCII values), this would most likely work on an EBCDIC system too!

    In the event you were serious ...

    And that is exactly the kind of domain-mixing that produces deeply buried bugs. A char is not a number. A char cannot be meaningfully compared to a number. At least it can't be and keep your code operating within the domain of chars.

    Yes, that kind of crap was commonplace 30 years ago. Doesn't mean it was smart then or correct then or now. Even if it generated the right answer, it did so for the wrong reason.

    As MaxiTB demonstrates with his (I hope sarcastic) but very creative horror.

  • (nodebb)

    Obligatory joke about missing an API call and XML serialization goes here

  • löchlein deluxe (unregistered)

    Ah, people are already starting to pass CrapGPT advice off as their own.

  • (nodebb) in reply to WTFGuy

    Kinda, that's actually how you would efficiently implement it for EDCDIC (which also means it would work for ASCII, but would be way more inefficient because of the additional nested conditions).

    While I agree with you that you should never treat a character as a number, on a low level like that you can get away with clean arithmetic like I showed up (I left intentional the const operations so it becomes clear whats going on with the offset calculations). Using a hash lookup table would be insanely expensive in comparison (we are talking factor 100+ here) and it generates a ton of stack usage as well (so basically from zero two to whatever the lookup methods need if you make the map itself static).

    So yeah, for low level operations this would be a proper implementation, but obviously the point of having low level functions is that you can separate it from the rest of the code to stay "cleaner".

  • Entropy (unregistered) in reply to Steve_The_Cynic

    Because that last part is inconsistent with "T()" as the initialisation.

    I don't follow. "T()" means value-initialisation.

  • (nodebb) in reply to löchlein deluxe

    I had to check if ChatGPT3 can pull off that simple algorithm; Close but no:

    int getBase36Value_ASCII(char c) {
        if (c >= '0' && c <= '9') {
            return c - '0';
        } else if (c >= 'A' && c <= 'Z') {
            return c - 'A' + 10;
        } else if (c >= 'a' && c <= 'z') {
            return c - 'a' + 10;
        }
        
        // Character is not alphanumeric
        return -1;
    }
    
    int getBase36Value_EBCDIC(char c) {
        if (c >= '0' && c <= '9') {
            return c - '0';
        } else if (c >= 'A' && c <= 'I') {
            return c - 'A' + 10;
        } else if (c >= 'J' && c <= 'R') {
            return c - 'A' + 9;
        } else if (c >= 'S' && c <= 'Z') {
            return c - 'A' + 8;
        } else if (c >= 'a' && c <= 'i') {
            return c - 'a' + 10;
        } else if (c >= 'j' && c <= 'r') {
            return c - 'a' + 9;
        } else if (c >= 's' && c <= 'z') {
            return c - 'a' + 8;
        }
        
        // Character is not alphanumeric
        return -1;
    }
    

    ChatGPT3 doesn't realize that it's EDCDIC code is actually a subset of the ASCII code and it also doesn't nest the conditions, so you end up with way more comparisons. Plus for some weird reason the EDCDIC offsets are always calculated from the 'a'/'A' character which results in those weird non-descriptive additions. Finally it uses else chains, which is bad in this case because you already have return conditions, so they are literally boiler plate code making everything harder to read. Interestingly it chose -1 as the error result, but in C that's very common. The naming is kinda wrong, C functions usually use lower snakecase.

    Honestly I'm surprised that the code is correct; ChatGPT3 is really bad when it comes to math or heavily functional algorithms... or race conditions... or anything really beyond very basic stuff.

  • (nodebb)

    Ha, I told it to combine both functions and then this nonsense got generated:

    int getBase36Value(char c) {
        if (c >= '0' && c <= '9') {
            return c - '0';
        } else if (c >= 'A' && c <= 'Z') {
            return c - 'A' + 10;
        } else if (c >= 'a' && c <= 'z') {
            return c - 'a' + 10;
        } else if (c >= 193 && c <= 218) { // EBCDIC uppercase
            return c - 193 + 10;
        } else if (c >= 225 && c <= 250) { // EBCDIC lowercase
            return c - 225 + 10;
        }
        
        // Character is not alphanumeric
        return -1;
    }
    
  • (nodebb)

    Given that modern computers generally have gigabytes of memory, clearly the correct solution* is just to create a lookup table indexed by char. For any character set where the digits and letters used in a base 36 number are representable in 8 bits, the lookup table would only have to be 256 bytes. Use -1 to indicate "not a valid base 36 digit" and you're sorted.

    This reminds me of a question I once answered on Stackoverflow which was "what is a fast way to calculate Fibonacci numbers". I suggested pre-calculating all the values and creating a look up table.Even using uint64_t you can only calculate 94 Fibonacci numbers before you experience integer overflow.

    *if strtol() is not allowed.

  • (nodebb) in reply to MaxiTB

    C functions usually use lower snakecase

    That varies wildly from codebase to codebase. There's literally no common rule at all.

  • (nodebb) in reply to dkf

    True, that's why I wrote usually and not naming convention dictates ;-) Custom codebases were always a hot mess, however if you check standard libraries, they follow the snake case naming convention for functions up to STL in C++ ;-)

  • Gearhead (unregistered) in reply to MaxiTB
    Comment held for moderation.

Leave a comment on “Base-36 Conversion”

Log In or post as a guest

Replying to comment #:

« Return to Article