• Martin (unregistered)

    "unicode complaint" should be "unicode compliant"

  • rosuav (unregistered)

    One of them is Unicode "complaint". Yeah, that sounds about right for most people, actually.

  • Steve (unregistered) in reply to Martin

    Not true. Unicode complaint is pretty much spot on. Ah, ninja'd by rosuav!

  • (nodebb)

    Character encodings are hard,

    The hard part is that they are entirely arbitrary (really, there's nothing magic about 65 // 0x41 that makes it right for "A"), and there have therefore been a multitude of different schemas that are bad, worse, or even worse than that, and essentially none of them are, as such, good. Yes, there are standards, but when talking about standards, we should always bear https://xkcd.com/927/ in mind...

    but good documentation is even harder.

    A universal truth.

  • Anon (unregistered)

    Looks like the objective of the code is to only allow ascii characters.

  • (nodebb) in reply to Steve_The_Cynic

    "Character encodings are hard," -- RAD-50 -- all you should ever need <grin>

  • (nodebb)

    BRING BACK BAUDOT

  • Hasseman (unregistered)

    My first programming book was about 15mm thick and told me everything I needed to program FORTRAN. There were some other books with about the same size on libraries.

    "A first introduction" to java is about 900 pages and 60mm thick, contains a lot of words but no content. Then you need all the framework books.

  • Reminds me of... (unregistered) in reply to Steve_The_Cynic

    Reminds me of.... oh, I don't know..... cryptocurrencies?? BTC would be an exceptional case if it survived, being the first version of a standard as it is with its major flaws.

  • (nodebb) in reply to TheCPUWizard

    RAD-50 -- all you should ever need <grin>

    I'll admit that my example of "all you'll ever need" was CDC Display Code, where lower case is a MBCS bodge, and sizeof(char) would have to be 0.1 ...

    Um. 0.1 60-bit words.

    On a CDC Cyber 815, in my experience of using it.

    And ':' == '\0'...

  • I'm not a robot (unregistered)
    They don't escape characters the same way, because one of them is unicode compliant, and the other isn't.
    Not true, escape() supports Unicode just fine. It handles non-ASCII characters differently than encodeURIComponent() does, but the result is well-defined and reversible by the unescape() function, for all valid Unicode codepoints or combinations thereof.
  • John (unregistered)
    Comment held for moderation.
  • Clubs21ids (unregistered)
    Comment held for moderation.
  • Guest (unregistered) in reply to Steve_The_Cynic

    Display code was an enormous kludge. Even worse when the government required it to have 64 printing characters and end of line all with just 6 bits. You just can’t legislate 65 states into 6 bits. But CDC tried by having ‘:’ mean colon normally, or end of line if the rest of the word was all zeroes. Which meant that lines ending in colon, which were common, had a space after the colon tacked on, and sometimes removed on the other end. Lore was that meeting that requirement took a full year of everyone’s time. And of course it couldn’t possibly work right in every app in all cases. From such minor conniptions major companies fail.

  • (nodebb) in reply to Steve_The_Cynic

    there's nothing magic about 65 // 0x41 that makes it right for "A"

    It's much better for that than many other choices would be; it's 64 (a nice binary number) plus the one-based index of character in the alphabet (as used in US English). Even better, the letters are contiguous and converting to lower-case (and back) is just toggling another bit in the byte. Digits start at 0x30 and are also contiguous, which is again very convenient. Symbols… were just slotted in wherever they could fit.

    Unicode only became practical as computers gained a lot more memory.

Leave a comment on “A Coded Escape”

Log In or post as a guest

Replying to comment #532704:

« Return to Article