The Daily WTF: Curious Perversions in Information Technology

2021-09-13 Reply Admin

"unicode complaint" should be "unicode compliant"

2021-09-13 Reply Admin

One of them is Unicode "complaint". Yeah, that sounds about right for most people, actually.

2021-09-13 Reply Admin

Not true. Unicode complaint is pretty much spot on. Ah, ninja'd by rosuav!

Steve_The_Cynic · 2021-09-13 Reply Admin

Character encodings are hard,

The hard part is that they are entirely arbitrary (really, there's nothing magic about 65 // 0x41 that makes it right for "A"), and there have therefore been a multitude of different schemas that are bad, worse, or even worse than that, and essentially none of them are, as such, good. Yes, there are standards, but when talking about standards, we should always bear https://xkcd.com/927/ in mind...

but good documentation is even harder.

A universal truth.

2021-09-13 Reply Admin

Looks like the objective of the code is to only allow ascii characters.

TheCPUWizard · 2021-09-13 Reply Admin

"Character encodings are hard," -- RAD-50 -- all you should ever need <grin>

Ross_Presser · 2021-09-13 Reply Admin

BRING BACK BAUDOT

2021-09-13 Reply Admin

My first programming book was about 15mm thick and told me everything I needed to program FORTRAN. There were some other books with about the same size on libraries.

"A first introduction" to java is about 900 pages and 60mm thick, contains a lot of words but no content. Then you need all the framework books.

2021-09-13 Reply Admin

Reminds me of.... oh, I don't know..... cryptocurrencies?? BTC would be an exceptional case if it survived, being the first version of a standard as it is with its major flaws.

Steve_The_Cynic · 2021-09-13 Reply Admin

RAD-50 -- all you should ever need <grin>

I'll admit that my example of "all you'll ever need" was CDC Display Code, where lower case is a MBCS bodge, and sizeof(char) would have to be 0.1 ...

Um. 0.1 60-bit words.

On a CDC Cyber 815, in my experience of using it.

And ':' == '\0'...

2021-09-14 Reply Admin

They don't escape characters the same way, because one of them is unicode compliant, and the other isn't.

Not true, escape() supports Unicode just fine. It handles non-ASCII characters differently than encodeURIComponent() does, but the result is well-defined and reversible by the unescape() function, for all valid Unicode codepoints or combinations thereof.

2021-09-15 Reply Admin

Display code was an enormous kludge. Even worse when the government required it to have 64 printing characters and end of line all with just 6 bits. You just can’t legislate 65 states into 6 bits. But CDC tried by having ‘:’ mean colon normally, or end of line if the rest of the word was all zeroes. Which meant that lines ending in colon, which were common, had a space after the colon tacked on, and sometimes removed on the other end. Lore was that meeting that requirement took a full year of everyone’s time. And of course it couldn’t possibly work right in every app in all cases. From such minor conniptions major companies fail.

dkf · 2021-09-19 Reply Admin

there's nothing magic about 65 // 0x41 that makes it right for "A"

It's much better for that than many other choices would be; it's 64 (a nice binary number) plus the one-based index of character in the alphabet (as used in US English). Even better, the letters are contiguous and converting to lower-case (and back) is just toggling another bit in the byte. Digits start at 0x30 and are also contiguous, which is again very convenient. Symbols… were just slotted in wherever they could fit.

Unicode only became practical as computers gained a lot more memory.

A Coded Escape

Leave a comment on “A Coded Escape”