• Chris (unregistered) in reply to Franz Kafka
    Anonymous:
     

    The quote problem is a fundamental design flaw - you've got two separate standards for escaping things and the edge cases caused by this complexity mean that there are lots of incompatible csv standards, so you may as well chuck it all and start over.

    I propose the following standard:

    1. pipe separated values, one row per line

    2. all escaping is done with \, so \r, \n, \p (pipe) and \" behave as expected

    3. any line starting with | is a meta line. It can describe column headings or author info or whatever you like. Not much defined here

     

    The problem, as always, is getting people to agree on the same thing. 

     
    Finally, after so many "endorsements" for csv, somebody mentions what bothers me about it. I am only an amateur coder, therefore, usually, I hesitate to comment on the WTFs in order to avoid exposing my ignorance. However, this is an issue to me as a user rather than a hobby coder.

    There are a couple of problems:

    1. In its MS Office flavour, csv has a locale-specific delimiter (!!!) and number format (I blogged about it at http://christianflury.com/blog/2006/11/the_huge_csv_internationalizat.html - in fact my post is merely a more verbose way of saying WTF).
    2. CSV makes it easy to mess up. A lot of applications get it wrong. For example, log files in Trados, the leading CAT suite (computer-assisted translation) (a tool that would qualify for an entire week of exclusive TDWTF coverage by itself, btw) can be exported as csv. These log files only contain integers and file names, and supposedly, at the time, the developers thought that therefore they did not have to deal with escaping anyway because neither could contain the delimiter.
      Nowadays, file names sometimes do contain commas or semicolons under Windows, and guess what? The software still does not escape them, so the entire line is shifted by one column in this case (and I have seen erroneous translation quotes caused by this problem, it does happen and it does cost money). Of course, this would be picked up by a sanity check in the importing application (if it does not blindly trust the numbers provided), sometimes it isn't. Using (meaningful) XML, ideally along with a strict DTD, it's just much harder to produce this kind of WTFs.

     

  • Mr. A Nonny Mousse (unregistered) in reply to Jouni K. Seppänen
    Anonymous:
    This reminds me of Apple's XML representation of Property Lists. You see, OpenStep had a simple serialization format for simple data, roughly similar to JSON. [...]
    Of course, this wasn't buzzword compliant, so when OpenStep became Cocoa, Apple had to change the format:
     
    There are a couple nice things about the XML plist format. You can put a few more data types in (like NSDate). And there's no ambiguity about character encoding, like there is with the openstep-style plists. OTOH, they're hard to read and really bloaty. 
  • Eam (unregistered) in reply to Anonymous

    Whitespace may be code too, but I typically wouldn't think of this post as "words interspersed with code."

  • (cs) in reply to Chris

    It'd be much nicer if we could just use YAML.

    Easy to write by hand with a text editor, unambiguous representations of lists, tables, hashes, attributes, and nested versions thereof.

    Lots of bindings for lots of language, even a fledging one for .NET: 

    http://yaml-net-parser.sourceforge.net/

  • DF (unregistered)

    So they used XML as a message wrapper. Whoopee.

    I know XML is a bit of a whipping boy around here, but this is barely a "hrmmm", much less a "WTF".

  • Dave (unregistered) in reply to Anonymous
    Anonymous:
    John Bigboote:

    XML is markup, not code.


    What exactly would you call OpenLazlo or XSLT then? XML may be code.

    Captcha: clueless

    sed "s/XML/ASCII/;s/markup/text/;s/OpenLazlo/C/;s/XSLT/Java/"

    "ASCII is text, not code."

    "What exactly would you call C or Java then? ASCII may be code."

     Sure, XML may be code, but XML is not code. The XML in this WTF is not code.

  • (cs) in reply to Dave
    Anonymous:

    sed "s/XML/ASCII/;s/markup/text/;s/OpenLazlo/C/;s/XSLT/Java/"

    "ASCII is text, not code."

    "What exactly would you call C or Java then? ASCII may be code."

     Sure, XML may be code, but XML is not code. The XML in this WTF is not code.

    American Standard Code for Information Interchange

    QED

  • (cs) in reply to emurphy
    emurphy:

    OneFactor:

    Bravo. Justifying design decisions by appealing to performance. The last resort of the incompetent. Also known as: "turbo-might manure-ver".

    How is that FizzbinSQL project coming along, anyhow?

    Oh... it was outsourced to some developers on Beta Antares IV. Excuse me while I go explain to my science officer why knows of no such projects being developed in that sector.

  • A chicken passeth by (unregistered) in reply to enterprisey!

    Any particular reason why the code appears as "example@example.com" (yes, that's the full text) in my mobile phone's RSS reader?

    The one I'm using is this. http://pda.jasnapaka.com/prssr/

     

  • dgm (unregistered) in reply to Chris Travers
    Anonymous:
    notromda:
    Anonymous:

    Come on guys, who are the lucky ones who do not work in an environment filled with potential WTF around them?

     

    /me raises hand...   sheepishly.... 

     

    You aren't looking hard enough...

     

    Captcha:  Quality 

     

    I'm a self employed contractor... and I'll fire any client that gets that stupid.  :)

     

    BTW, the Nice WTF is the fact tha the forum software translates "slash me"  or /me  into the poster... which made  it appear that Chris Travers raised his hand too. And now, dgm...

  • makessense (unregistered) in reply to asuffield

    This is the most reasonable comment I have read on this page.

  • Hognoxious (unregistered) in reply to PseudoNoise

    Anonymous:
    Wait till he finds out about the "tab problem" in TSV. 
    At least tabs are always tabs.  In a French CSV file , the delimiter is actually a semicolon.

  • Gil32 (unregistered) in reply to JD

    No no no! Saying "XML is code" is the same as saying "HTML is code".

    Even though it may be a method of encoding data, it's not code the way we usually think of the term.

  • annoynimous (unregistered) in reply to OJ

    Why DOM ? there are systems working on infinite XML file (example: XMPP). They just use SAX-like parsers,

    DOM is only good when You need to do a number of random changes to XML and then save it, smth like MS XML Notepad. DOM however is bad for just reading the XML !

    CAPTC?A: stfu - Search the (freaking) Universe ?
    
  • JLuc (unregistered) in reply to JoeB

    But as soon as you try and send large amounts of data then it really is better to use csv or fixed width because of the speed to import the data.

    How so? Depends on what you mean by large. If you insist on using DOM, sure, things slow down real quick. But SAX-based parsing isn't that slow. I wrote quite a bit of those on Python, which isn't exactly assembly language, speed-wise, and my progs worked fine on 5-10 Mb files.

    IMHO XML is the way to go for not-totally-trivial data, though you should always try to keep it as simple as possible. And, CSV files with commas in the data start looking quite ugly if you are parsing it by hand.

  • Crusty Parser (unregistered) in reply to dan s.
    dan s.:
    Tim Gallagher:

    <?xml version="1.0" encoding="iso8859-1" ?><import tag="1stTEST" type="data" mode="update"><options> <dateformat mmddyyyy="true"/> <notification> <EMail>example@example.com</EMail> </notification> </options> <fields> <field name="name" type="char" mapsto="person.data"/> <field name="officeid" type="char" mapsto="custom.locationid"/> <field name="startyear" type="char" mapsto="person.yearstarted"/> <field name="personelid" type="int" mapsto="person.id"/> <field name="dob" type="date" mapsto="person.dateofbith"/> <field name="sex" type="char" mapsto="person.sex"/> <field name="modified" type="date" mapsto="record.modified"/> </fields> <csvdata columnheaders="false"> <![CDATA["Jack Wade",214,2002,111012,07/04/1975,"M",02/11/2006"Sam Davidson",214,1999,104841,10/15/1967,"M",02/10/2006"Denise V Law",214,1998,104660,01/21/1971,"F",02/17/2006"Lisa Blake",214,1989,100987,08/01/1982,"F",01/21/2006"Andrew Match",214,1991,101074,12/25/1980,"M",02/28/2006]]> </csvdata> </import>

    The real WTF is that it's not properly indented. 

    XML is a freeform format -- there is no right or wrong indentation.

  • Crusty Parser (unregistered) in reply to Crusty Parser

    ...and why doesn't the quoting mechanism work properly in this forum?

  • tinkerghost (unregistered)

    You know, with the addition of a 'position' attribute in the field tags, this would be pretty decent. <field name="name" position=1 type="char" mapsto="person.data"/> Parse the options section to generate your data mapping, followed by processing the CSV file. For large datasets, it should be both faster to run & smaller to transmit than the equivalent document done in fully expressed XLM. In addition, it would be much more flexible than a straight CSV file, because you can add or remove fields as needed, change the order to suit your whim, or alter date styles with a single tweak of the output format. So, overall, it's actually a good first use of the technology; 1 tweak & it's a technical improvement over both CSV & XML for this implimentation.

  • boogerfish (unregistered) in reply to Ishai Sagi
    Ishai Sagi:
    I didnt even blink. "well, we could use xpath, but we would lose a lot on preformance!" Untill today its my favorite personal WTF. Like my captcha says - perfection!

    prefectly stated, ford.

  • mp (unregistered) in reply to Crusty Parser

    I thought you were proving the point ad absurdum :)

    mp

    “It’s not enough that we do our best; sometimes we have to do what’s required.” ~ Sir Winston Churchill ~

Leave a comment on “XML vs CSV : The Choice is Obvious”

Log In or post as a guest

Replying to comment #:

« Return to Article