- Feature Articles
- CodeSOD
- Error'd
-
Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
\nFrist line
Edit Admin
It's usually called "comma separated values" because that's what it originally was, a file where values were separated by commas.
Then the localising folks working on Excel got hold of it, and it became ceparator-separated values.
Edit Admin
I had to test this out, because I was curious what csv.reader would do for line terminators with delimiter="\n". Turns out that the representative line does work, though there's one important difference. With something like f.readlines, each line in the list is a string, but with csv.reader, each line in the list is a list of strings (in this case, they're all one-element lists of strings).
Edit Admin
"Then the localising folks working on Excel got hold of it, and it became ceparator-separated values."
Once upon a time there was a system which used an Excel COM interop to read files from a third party and all was well until the company expanded across the border and some users just got garbage out of the import system because the CSV files where comma-separated but the new users had MS Office installed in Spanish and for whatever reason the geniuses at Microsoft decided that the default separator for COMMA-separation in Spanish was going to be a semicolon instead.
Admin
That reason is that in Spanish, like everywhere(?) in continental Europe, the decimal separator is a comma, and for some reason they thought that allowing that in the file was important. Users could see and edit numbers as they were used to. But who edits csv by hand? Not your average Joe. So, just like storing localized function names, and interpreting strings as dates, that lofty thought was a mistake.
Admin
and then unicode happened
Admin
Well, that's been fixed in 3.13, because when I tried it, it raised "ValueError: bad delimiter value" which is exactly what I would do if told to read a table where the cell and row delimiters were the same.
Interestingly, this change is not mentioned in either the csv module docs or the what's new in 3.13? so if they ever upgrade (hah) that's a breakage they won't see coming.
Edit Admin
Well, they had to do something to resolve the conflict between
4,7
being a4
and a7
separated by a VSC(1) and4,7
being 4.7 written European-style, with a decimal comma instead of a decimal point.And of course they chose the "the content of a ceparator-separated values file is for humans to read" version of the solution, which is 99.999% nonsense.
(1) Value-separating comma
Edit Admin
Everyone wants to blame Excel for CSV being garbage,[0] but localization is HARD, and CSV gives you the ILLUSION that it's easy. Plain CSV isn't even easy - you can't naively split on commas; You MUST handle quotes, especially the weird "escaped" quotes. CSV tricks you into thinking structured/tabular data is easy, but it's not.
[0]: Excel has a lot of flaws, like trying to say any number is a date, but that's unrelated to CSVs.
Admin
The other difference between the two versions is that the csv based version will support quoting of "line"s, so you could have a "line" that actually contains multiple lines if they were all wrapped in quote characters.
Edit Admin
"And of course they chose the "the content of a ceparator-separated values file is for humans to read" version of the solution, which is 99.999% nonsense."
And thanks to that CSV went from a very useful way to exchange data between systems to something nobody uses unless they absolutely have to, leaving most of us with the choice between XML and binary until JSON joined the party a while later.
Heck, the issue with the dot vs comma numbers was so anoying that some text-only systems dealt with it by just writing all numbers as integers with fixed decimals, i.e. 6.54 written as 65400. I still have to deal with one of those from time to time.
Admin
Brilliant! Or should I say Brillant?
Admin
Admin
So TRWTF is the existence of decimal commas and some people's stubborn insistence to keep using them thereby complicating everything.
Admin
0x1f. There. Fixed it.
Admin
Literally no CSV file I ever needed to open opened right in Excel. I thought it was just Excel being shit in general, it's bummer it's just specifically shit to me because I live in particular country. Thank you for thinking about me, valiant Microsoft localizers, please think of me less next time
Edit Admin
"some text-only systems dealt with it by just writing all numbers as integers with fixed decimals, i.e. 6.54 written as 65400." Tell me you've never programmed in COBOL without telling me you've never programmed in COBOL.
Admin
Some customers of ours got round the problem not by making everything decimals but simply by "quoting" every single field. Every. Single. Field.
Edit Admin
LOL. I haven't touched COBOL in a while. That example was from a 3rd party system that uses TXT files with fixed-length fields to exchange data. And you might not believe it but it's a relatively modern accounting system that uses a relational DB as backend with very well normalized tables, so no COBOL there.
That said, about two years ago one of our clients wanted us to reverse engineer and modify an old COBOL system but fortunately we managed to convinced them it would be simpler and cheaper to just implement the whole thing from scratch without the RE, since the old system is still functional.
Work as a dev long enough and anyone can open their own WTF site. :D