- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
"unicode complaint" should be "unicode compliant"
Admin
One of them is Unicode "complaint". Yeah, that sounds about right for most people, actually.
Admin
Not true. Unicode complaint is pretty much spot on. Ah, ninja'd by rosuav!
Admin
The hard part is that they are entirely arbitrary (really, there's nothing magic about 65 // 0x41 that makes it right for "A"), and there have therefore been a multitude of different schemas that are bad, worse, or even worse than that, and essentially none of them are, as such, good. Yes, there are standards, but when talking about standards, we should always bear https://xkcd.com/927/ in mind...
A universal truth.
Admin
Looks like the objective of the code is to only allow ascii characters.
Admin
"Character encodings are hard," -- RAD-50 -- all you should ever need <grin>
Admin
BRING BACK BAUDOT
Admin
My first programming book was about 15mm thick and told me everything I needed to program FORTRAN. There were some other books with about the same size on libraries.
"A first introduction" to java is about 900 pages and 60mm thick, contains a lot of words but no content. Then you need all the framework books.
Admin
Reminds me of.... oh, I don't know..... cryptocurrencies?? BTC would be an exceptional case if it survived, being the first version of a standard as it is with its major flaws.
Admin
I'll admit that my example of "all you'll ever need" was CDC Display Code, where lower case is a MBCS bodge, and sizeof(char) would have to be 0.1 ...
Um. 0.1 60-bit words.
On a CDC Cyber 815, in my experience of using it.
And
':' == '\0'
...Admin
Admin
Display code was an enormous kludge. Even worse when the government required it to have 64 printing characters and end of line all with just 6 bits. You just can’t legislate 65 states into 6 bits. But CDC tried by having ‘:’ mean colon normally, or end of line if the rest of the word was all zeroes. Which meant that lines ending in colon, which were common, had a space after the colon tacked on, and sometimes removed on the other end. Lore was that meeting that requirement took a full year of everyone’s time. And of course it couldn’t possibly work right in every app in all cases. From such minor conniptions major companies fail.
Admin
It's much better for that than many other choices would be; it's 64 (a nice binary number) plus the one-based index of character in the alphabet (as used in US English). Even better, the letters are contiguous and converting to lower-case (and back) is just toggling another bit in the byte. Digits start at 0x30 and are also contiguous, which is again very convenient. Symbols… were just slotted in wherever they could fit.
Unicode only became practical as computers gained a lot more memory.