The Daily WTF: Curious Perversions in Information Technology

Lawrence · 2016-01-07 Reply Admin

And once restore works, test it regularly.

$PARENT is not computer engineer but knew she needed backups, contracted someone to set them up, got instruction sheet, followed instruction sheet religiously, all was well. Several years later some other contractor upgraded something, the tape backup continued to work through the tapes without any errors at all, but it was only writing zeroes... then some third contractor installed a new CD-ROM reader (yes this was some time ago) and managed to overwrite the hard disk . . . it cost several months' salary to get the data back.

Spectre · 2016-01-07 Reply Admin

rc4:
Because different countries have different "standards"?

I don't see why these "standards" should affect one's data format. CSV files are made for computers, and computers don't have nationalities.

rc4 · 2016-01-07 Reply Admin

CSV files also contain languages, which differ in how various things are represented. Can you imagine trying to represent european currency values with comma delimiters?

Spectre · 2016-01-07 Reply Admin

rc4:
Can you imagine trying to represent european currency values with comma delimiters?

Are you talking about currency values that are a part of arbitrary text in a natural language, or standalone currency values? In the former case, you'd have a more-or-less the same amount of pain with culture-dependent delimiters, because arbitrary text can contain any delimiter. In the latter case, it's just a number, and whether it's European or a currency value is irrelevant. You should just encode it in a culture-independent way.

rc4 · 2016-01-08 Reply Admin

Should and what actually happens are different. Thus, different delimiters. Why don't we all use comma dot instead of dot comma in currency? Why don't we all speak the same language? You are failing to account for the fact that people can be different.

Dreikin · 2016-01-08 Reply Admin

Protoman:
How about ASCII 0x1E (record separator) or 0x1F (unit separator)?

I was going to suggest those myself, along with their fellows:

Dec	Hex	Acronym	Symbol	Name	Usage
25	0x19	EM	␙	End of Medium	Intended as means of indicating on paper or magnetic tapes that the end of the usable portion of the tape had been reached. Not needed, but may be useful in the case of files with multiple ␜.
26	0x1A	SUB	␚	Substitute	Need to keep a particular symbol (or set of symbols) from appearing in data? Use substitute to mark their replacement character(s)!*
27	0x1B	ESC	␛	Escape	What better to start escape sequences with than the actual, official, "Escape"?*
28	0x1C	FS	␜	File Separator	End of file. Or between a concatenation of what might otherwise be separate files.
29	0x1D	GS	␝	Group Separator	Between sections of data. Not needed in simple data files.
30	0x1E	RS	␞	Record Separator	End of a record or row.
31	0x1F	US	␟	Unit Separator	Between fields of a record, or members of a row.
32	0x20	SP	␠	~~Word Separator~~Space	Between words of a field.

*: Not necessarily official usage semantics. (Mis)Use at your own risk.

Unicode also has a line separator (U+2028) and paragraph separator (U+2029) so one could even embed multi-line and multi-paragraph fields in a record in a platform-independent way! I might be somewhat unserious here.

Protoman:
Everyone loves unprintable control characters in their text files, right?

No reason for them to be unprintable. For one, they have official symbols (see above), and for another an editor could show them specially (e.g., colored barriers between sections).

rc4:
You are failing to account for the fact that people can be different.

Maybe out there. Around here, we're all just @boomzilla.

rc4 · 2016-01-08 Reply Admin

Well, except for Fox.

da_Doctah · 2016-01-08 Reply Admin

Dreikin:

I was going to suggest those myself, along with their fellows:

   Dec     Hex     Acronym     Symbol     Name     Usage              25     0x19     EM     ␙     End of Medium     Intended as means of indicating on paper or magnetic tapes that the end of the usable portion of the tape had been reached.  Not needed, but may be useful in the case of files with multiple ␜.           26     0x1A     SUB     ␚     Substitute     Need to keep a particular symbol (or set of symbols) from appearing in data? Use substitute to mark their replacement character(s)!*           27     0x1B     ESC     ␛     Escape     What better to start escape sequences with than the actual, official, "Escape"?*           28     0x1C     FS     ␜     File Separator     End of file. Or between a concatenation of what might otherwise be separate files.           29     0x1D     GS     ␝     Group Separator     Between sections of data. Not needed in simple data files.           30     0x1E     RS     ␞     Record Separator     End of a record or row.           31     0x1F     US     ␟     Unit Separator     Between fields of a record, or members of a row.           32     0x20     SP     ␠

Word SeparatorSpace Between words of a field.

No no no no no no no!

FS doesn't stand for "File Separator". Since time in a memorial, it has stood for "Field Separator", which makes it the only reasonable choice to delimit values within a single data record.

Dreikin · 2016-01-08 Reply Admin

da_Doctah:
FS doesn't stand for "File Separator". Since time in a memorial, it has stood for "Field Separator", which makes it the only reasonable choice to delimit values within a single data record.

Blame ASCII ("most recent update during 1986"). "Page Separator" might've been a better choice, but "Unit Separator" and "File Separator" isn't bad.

cheong · 2016-01-08 Reply Admin

chubertdev:
Hong Kong

I don't know we're a country now.

And sad to see Macau is MIA. (Both places are of equal status in China)

flabdablet · 2016-01-08 Reply Admin

dkf:

Yazeran:
I usually prefer ; as it is rare that that could show up in data other than text.

If you've got control over what is being generated, tabs are a really good choice

It pisses me off that ASCII defines four perfectly good control codes specifically for delimiting structured text and nobody ever uses them.

flabdablet · 2016-01-08 Reply Admin

da_Doctah:
Since time in a memorial, it has stood for "Field Separator"

Never seen that. Eggcorn suspected. Cite required.

Cheater · 2016-01-08 Reply Admin

You missed South Africa

Erik · 2016-01-08 Reply Admin

It's Wikipedia... you're supposed to use a little bit of skepticism. With or without any banners about content quality! :smiley:

Erik · 2016-01-08 Reply Admin

Yes! You're not even supposed to have a backup strategy... you're supposed to have a restore strategy!

Gurth · 2016-01-08 Reply Admin

Protoman:
I've never used Apple Numbers, but I suspect if you tried launching it from the console after setting the environment variable `LC_ALL` to something like `en_US.utf-8`, it would probably work (and just `LC_NUMERIC` may also be sufficient, too).

I suspect not, because according to printenv, neither of those variables are defined in OS X. Even if they were, I doubt many native applications refer to them.

It’s a WTF in any case: the program assuming that because formulas in the spreadsheet don’t use , as a separator, neither do files with comma-separated values. Far better would be to pop up a dialog before opening the file to ask what separator should be used to parse the file, or provide some other (simple) method of selecting that.

rc4:
Why don't we all speak the same language?

At least that is explained by history ;)

Maciejasjmj · 2016-01-08 Reply Admin

Yazeran:
Especially since some country-settings specify to use , as the decimal point (we do in Denmark) causing all manner of fun when some rows contain floating point values

The most fun is when the app gets confused and just uses a comma as both the separator and the decimal point. Have fun untangling that mess!

Dlareg · 2016-01-08 Reply Admin

Simple solution. Just replace the delimiter comma by c0cb5f0fcf239ab3d9c1fcd31fff1efc and you are done.

Dlareg · 2016-01-08 Reply Admin

you mean those around 30?

Quite · 2016-01-08 Reply Admin

Yazeran:
Actually the worst is that people use , blindly as a separator.
I usually prefer ; as it is rare that that could show up in data other than text.

Especially since some country-settings specify to use , as the decimal point (we do in Denmark) causing all manner of fun when some rows contain floating point values (other places it is used as a 1000 - separator which also can result in hilarity).....

Only last year I was using a new extension of our flagship product that had i18n added to it, where the various language variants were handled in a home-rolled csv parser. No escaping for commas, so if you happened to have a string with a comma in, it interpreted the comma as a separator.

Zemm · 2016-01-08 Reply Admin

Gurth:
Far better would be to pop up a dialog before opening the file to ask what separator should be used to parse the file, or provide some other (simple) method of selecting that.

You mean like how LibreOffice does it? (I'm ignoring its ancestors on purpose)

Steve_The_Cynic · 2016-01-08 Reply Admin

accalia:
the kraw and the lac....or however you spell them....

Conventionally, crore and lakh in the Latin alphabet.

Or करोड़ and लाख in Hindi.

accalia · 2016-01-08 Reply Admin

Steve_The_Cynic:
Conventionally, crore and lakh in the Latin alphabet.

hmm....

completely off base for one, but surprisingly close for the other...

still no wonder google was no help in getting the correct spellings given how far off i was on the first one.

Spectre · 2016-01-08 Reply Admin

rc4:
You are failing to account for the fact that people can be different.

You are failing to account for the fact that computers are not people. If you represent data in a culture-neutral format, it's easier to write, easier to read, and can be transferred between users with different locale settings.

If you're saying that some systems already use culture-dependent CSV files and you may have to be compatible with those - yeah, sure, that's an argument. But it's not The Right Thing, and you should not do it in the absense of compatibility constraints.

LB_ · 2016-01-08 Reply Admin

This. Don't serialize in a locale-dependent way. Otherwise you may as well ignore endianness in your binary files and network streams...

flabdablet · 2016-01-08 Reply Admin

Dlareg:
you mean those around 30?

31 = 0x1F = ctrl-_ = US = Unit Separator: what should be used instead of tabs, commas, pipes or *ing asterisks 30 = 0x1E = ctrl-^ = RS = Record Separator: what should be used instead of CR, CRLF or LF 29 = 0x1D = ctrl-] = GS = Group Separator: for delimiting groups of records within a file 28 = 0x1C = ctrl-\ = FS = File Separator: for delimiting files within an archive or stream (CP/M and DOS should have used this instead of ctrl-Z = SUB, and Unix should have used it instead of ctrl-D = EOT, for marking EOF for data entered from the keyboard)

rc4 · 2016-01-08 Reply Admin

You somehow fail to understand what I'm saying in one paragraph and then understand it in the next, but dismiss it immediately afterwards because it's :doing_it_wrong:. Okay, Jeff.

MockMyBeret · 2016-01-08 Reply Admin

So, I'm an Active Directory Engineer and my name is Tommy and yeah, this is about me. I like all of your arguments, but here's a good solution.

http://blogs.technet.com/b/activedirectoryua/archive/2015/01/19/ad-magic-restore-script-published-on-codeplex.aspx

or

http://blogs.technet.com/b/ashleymcglone/archive/2014/04/24/oh-snap-active-directory-attribute-recovery-with-powershell.aspx

Just my two cents.

Yazeran · 2016-01-08 Reply Admin

Oh yea, I remember having an issue like that some 15 years ago when I tried to import some data (might have been into origin) and had some WTF moments when the data looked all weird until I noticed all those big integers in the raw data tables.....

I think i ended with having to do a series of search-replace on the data prior to import before i got it to work.....

hungrier · 2016-01-08 Reply Admin

Dlareg:
Simple solution. Just replace the delimiter comma by ```c0cb5f0fcf239ab3d9c1fcd31fff1efc``` and you are done.

Your solution is to Discourse the separator value? *

Dlareg · 2016-01-08 Reply Admin

so yes those around thirty

hungrier:
Your solution is to Discourse the separator value? *dae361af79b04c9c8e7057f60cc6**

that was the Bad Idea

Protoman · 2016-01-08 Reply Admin

Gurth:
I suspect not, because according to printenv, neither of those variables are defined in OS X. Even if they were, I doubt many native applications refer to them.

No, they're not, but they fall back onto to $LANG by default when unset, see locale(1), which should be your system locale setting (mine is en_US.UTF-8). Most applications don't directly refer to the locale variables, but a number of functions in the C runtime library do, like printf.

Gurth:
It’s a WTF in any case

Completely agree.

BaconBits · 2016-01-08 Reply Admin

As someone who does BASH/Unix and PowerShell scripting regularly this looks to me like a Unix guy trying to program PowerShell without learning how it works first. The problem here is not PowerShell, the problem is whoever wrote that mess.

Agreed. I was also immediately suspicious when the article claimed PowerShell was "executed poorly". I have my complaints about PowerShell and the PowerShell community, but if you think it was poorly executed you're just making yourself look like a fool. This script author clearly doesn't understand what an object is.

The difference between Unix and Windows is: in Unix, everything is a character string, in Windows, everything is an object. That's it. If you can grok what that means, you're 95% of the way to understanding how the two systems work. The biggest problem Unix people have when they work in Windows is to try to make everything into a character string, and then they blame the OS when that doesn't work. If you think Windows administration involves installing Cygwin, you're doing it wrong.

BaconBits · 2016-01-08 Reply Admin

Blaming PowerShell because Active Directory and LDAP property names are both numerous and long is hardly fair.

Zylon · 2016-01-08 Reply Admin

The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.

Gurth · 2016-01-08 Reply Admin

I guess that makes me a non-adult [spoiler]as I've never seen a single second of that show.[/spoiler]

antiquarian · 2016-01-08 Reply Admin

Zylon:
The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.

[image]

accalia · 2016-01-08 Reply Admin

antiquarian:

Zylon:
The RWTF is people who somehow transition into adulthood without coming to the realization that Animaniacs sucked.
[image]

i always add a second 't' into that and giggle......

antiquarian · 2016-01-08 Reply Admin

[image]

accalia · 2016-01-08 Reply Admin

that's the bunny

Scarlet_Manuka · 2016-01-08 Reply Admin

dkf:
tabs are a really good choice. They virtually never turn up in structured data or user input.

Ah, if only that were the case. Had to modify a couple of reports recently to explicitly remove tabs from selected fields, because someone had entered data with a tab included. (The really evil one was the one with a trailing tab on the last field of the record - you're not going to find that from a visual inspection.)

Still, tabs are often one of the best available choices and I generally do prefer them.

Tsaukpaetra · 2016-01-08 Reply Admin

Scarlet_Manuka:
someone had found a field that wasn't sanitized and entered data with a tab included.

FTFF

Scarlet_Manuka · 2016-01-09 Reply Admin

Well, I don't know the specifications of that field offhand, not being an Oracle developer, but it's very possible that they decided to allow the user to put in whatever they like and that it works perfectly well within the application. (No doubt it's equally possible that they just didn't bother thinking about it.) It's just that when we then include that field in a tab-delimited report, parsing the output becomes harder.

anotherusername · 2016-01-09 Reply Admin

Most fields don't even allow you to enter a tab as data... normally pressing tab just shifts the focus to the next element in the tab index.

Tsaukpaetra · 2016-01-09 Reply Admin

anotherusername:
don't even allow you to use the keyboard's Tab key toenter a tab as data

PTFY. Many text-entry controls that allow pasting, especially those that allow multi-line, allow you to paste in a Tab character from the clipboard.

dkf · 2016-01-09 Reply Admin

Scarlet_Manuka:
Ah, if only that were the case. Had to modify a couple of reports recently to explicitly remove tabs from selected fields, because someone had entered data with a tab included.

You're giving your users text fields? Are you mad?

;)

ben_lubar · 2016-01-09 Reply Admin

flabdablet:
ctrl-\

On Unix, that's short for "crash the program".

Gurth · 2016-01-09 Reply Admin

There already are two Ts in that.

PleegWat · 2016-01-09 Reply Admin

ben_lubar:

flabdablet:
ctrl-\

On Unix, that's short for "crash the program".

That's because Ctrl+Letter on a terminal doesn't code for a character by default. It codes for a signal. In the case of Ctrl+\, I believe the signal is SIGABRT, which generates a coredump, though I'm not sure of that or what in general the mapping is between letters and signals. I do know Ctrl+V is escape, so Ctrl+V, Ctrl+\ will input FS.

ben_lubar · 2016-01-09 Reply Admin

PleegWat:
I believe the signal is SIGABRT

It's actually SIGQUIT.

Good Idea, Bad Idea

Leave a comment on “Good Idea, Bad Idea”