The Daily WTF: Curious Perversions in Information Technology

2010-06-08 Reply Admin

Dazed:
Jay:
I fail to understand the fascination with XML. Yes, there are cases where it's useful, but it seems to me that they are a very tiny number of cases.
To cut a longish story very short, XML is useful whenever the data you are transferring is hierarchically structured rather than a simple table. Which is hardly a tiny number of cases.

No, that's what JSON and YAML are for.

Dazed:
Though even so I will agree that it is somewhat over-used.

The number of XML documents which couldn't also have been encoded in JSON (or YAML if you want some pretty-printing) is most likely somewhere between tiny and non-existent.

2010-06-08 Reply Admin

The problem with CSV is: data is dirty.

Repeat it to yourself each time you develop a system.

The first time a clerk fills "Winston, Salem" instead of "Winston-Salem" in you system it will fail.

And trust me, it will happend more than once ... a day.

2010-06-08 Reply Admin

Why does noone know the quoting rules for CSV?

<person><name>Joe Bloggs</name>...</person> <person><name>Winston, Salem</name>...</person> <person><name>Joe "Big dog" McCoy</name>...</person> => name,... Joe Bloggs,... "Winston, Salem",... "Joe ""Big dog"" McCoy",...

The only real problem with CSV is that lots of editors for it don't like when the text is Unicode encoded (Excel will open UTF-16LE, but not UTF-8, and will not save without asking to convert to ISO-8859-1).

Arancaytar · 2010-06-08 Reply Admin

Neville Flynn:
I don't get it. Is the WTF that he searched for Jacob but got Marlene?
Or is it that although he can get a XML response, he still has to submit the request in a fixed width format?

Well, it's that the XML response is nothing but delimited data dressed up as XML. Better than fixed-width, but no better than a CSV file.

Good XML actually tells you what the data means, using a meaningful name for each element instead of a sequence of "<field>" elements where you have to remember which order they're supposed to be in.

2010-06-08 Reply Admin

CrazyBomber:
Actually, a certain "Enterprise" Application Integration software (whose name begins with a "T" and ends in a "IBCO"), don't know which version, has the healthy habit of parsing XML by field position.

Schemas do actually specify the ordering of elements. Sure, that might make generating new documents a bit harder, but I imagine it will also speed up validation (and possibly parsing) a whole lot.

2010-06-08 Reply Admin

Yardik:
Hah, I know a very large supplier of credit files (yes, one of the big 3) who had a very very similar request format to this. It was called fixed-input format and there were entire huge documents on how to request different products using it. That's what happens when you have old mainframe systems and no proper middleware to handle requests.

Could that be clieop3 ? That's a format still used by all Dutch banks e.g. for dancing schools with members agreeing that their subscription fee is automatically transfered. Clieop3 request files are build with fixed width lines. The service breaks without giving a reason when you offer it \r\n, one whitespace too many, etc.

Maybe XML is not so bad after all.

2010-06-08 Reply Admin

Seen worse. The names have been changed to protect the innocent, but:

<Customer> <StartDetails/> <StartName/> <StartFirst/> <Value>Bob</Value> <EndFirst/> <StartLast/> <Value>Smith</Value> <EndLast/> <EndName/>  <EndDetails> </Customer>

2010-06-08 Reply Admin

It's not entirely dis-similar to what DBUnit uses. Or what MySQL dumps if you ask it for "XML"

2010-06-08 Reply Admin

test

2010-06-08 Reply Admin

From page 58 of Growing Better Software: "From a maintenance perspective, using [...] fixed field length and fixed record length is asking for trouble, because field lengths keep changing, requiring us to change our [code] over and over again. "

2010-06-08 Reply Admin

yh:
test

failed

2010-06-08 Reply Admin

ITT: people who don't get XML, can't use it, and wave their collective crucifixes (crucifii?) at it whenever they see it. Get with the now, luddites, XML is here to stay, you might as well get used to it

dkf · 2010-06-08 Reply Admin

Dufus:
Schemas do actually specify the ordering of elements.

That depends on the schema. With:

    <xsd:sequence>...</xsd:sequence>

it is indeed in order, but with:

    <xsd:all>...</xsd:all>

you can use any order and it doesn't matter. (The DOM tree should reflect the order as actually in the document of course.) Mind you, some software is just plain broken. It's usually marked by the use of regular expressions for parsing XML…

2010-06-08 Reply Admin

Obie Server:
ITT: people who don't get XML, can't use it, and wave their collective crucifixes (crucifii?) at it whenever they see it. Get with the now, luddites, XML is here to stay, you might as well get used to it

XML is a solution in search of a problem that hasnt already been solved, repeatedly. But yeah, I have come to terms with that it isnt going anywhere anytime soon. That doesnt mean I have to like it.

2010-06-08 Reply Admin

meh:
Jay:
Looks to me like that output was somebody's attempt to simulate CSV with XML. And by the way, wouldn't CSV output be much simpler in a case like this:
It would be a whole lot simpler to parse than XML, take less bandwidth, and eliminate whole categories of possible errors to check for (missing tags, mis-spelled tags, extra tags, improper nesting, etc etc).

CSV is not easier to parse than XML. Just try to open a CSV file with Excel and find that it expects semicolons instead of commas. And how do you escape a comma/semicolon within fields? Do you use single or double-quote characters? And how do you escape those? Say, would that be an ANSI file or CP1251?

With XML, all you have to do is use your friendly neighborhood XML parser and hey presto.

XML has its clusterfucks (such as XSD, which negates the whole 'extensible' idea and is context insensitive) and you can argue a number of technologies which would be better to use in different circumstances (ASN.1 FTW), but CSV is almost always the worst possible choice.

Excel is not a good thing to use to prove that something is bad, with the justification that it does not work well in Excel. Excel itself does not work well on its own.

2010-06-08 Reply Admin

To be fair to XML, it is designed as a generalized text markup (hence the M), not for serializing an object model, as it is almost always abused for.

To be mean to it myself, though, I'm not really sure how useful a generalized text markup language is, as there are not that many different types of mainly text documents. Web pages, obviously, language specs, and legal documents, but I'm struggling now.

2010-06-08 Reply Admin

ObiWayneKenobi:
Enrique:
Man, you do not understand the concept behind that, it is easy to transform it to HTML
See the following algorithm

remove first line change <result> to
and </result> to

change <fields> to and </fields> to

change <field> to and </field> to

change <data> to and </data> to

change <row> to and </row> to

change <value> to and </value> to

And that's it!!! and HTML returned output!!!

What an interesting perspective of why this might have been done this way.

Friends, I think we found the lead developer for the vendor!

Um, no. There's a very good reason that won't work, if you look closely at the data

2010-06-08 Reply Admin

Luis Espinal:
da chicken:
You shouldn't be doing XML with the latter method. It's not proper XML. Tag properties are for metadata, not data.
Hmmm, I'm gonna have to disagree with you on this one.

Just for the record, I agree with the orginal poster. Attributes are for metadata, values are for data. There's nothing more annoying that a HUGE XML element that's actually empty, because the implementor decided to put all the actual data in a string of attributes. The main problem with this is that you end up with lots of data in a single XML element. This defeats many of the benefits of XML. Elements should be "single purpose", that is they should hold a single bit of information, as defined by the element name, or they should wrap further elements in the case of heirarchical data. So creating an XML element with a whole bunch of attributes, each containing a different piece of data, that's just wrong in my opinion. Obviously there is no rigidly defined way of using XML but I would also say that putting all the data in attributes is very much against the spirit of XML and SGML before it. Read the spec and you'll get a very good feel for how the authors intended it to be used.

2010-06-08 Reply Admin

swedish tard:
XML is a solution in search of a problem that hasnt already been solved, repeatedly. But yeah, I have come to terms with that it isnt going anywhere anytime soon. That doesnt mean I have to like it.

Can you explain to me what's wrong with having a solution to a problem that hasn't already been solved, repeatedly?

Putting aside your oh-so-Freudian slip, I put it to you that it solves several problems that have been solved already, but does so in an easy to communicate, easy to understand manner. Me, I quite like being able to glance at a snippet of XML, and already have a pretty good idea what it means, without having to look it up. Most of the problems with XML you can come up with, are problems with how someone else has used it. I hear bank robbers are using "roads" to make their "getaways" on. Let's ban these evil "roads" forthwith, and solve the problem of bank robbery forever

2010-06-08 Reply Admin

Anonymous:
Luis Espinal:
da chicken:
You shouldn't be doing XML with the latter method. It's not proper XML. Tag properties are for metadata, not data.
Hmmm, I'm gonna have to disagree with you on this one.
Just for the record, I agree with the orginal poster. Attributes are for metadata, values are for data. There's nothing more annoying that a HUGE XML element that's actually empty, because the implementor decided to put all the actual data in a string of attributes. The main problem with this is that you end up with lots of data in a single XML element. This defeats many of the benefits of XML. Elements should be "single purpose", that is they should hold a single bit of information, as defined by the element name, or they should wrap further elements in the case of heirarchical data. So creating an XML element with a whole bunch of attributes, each containing a different piece of data, that's just wrong in my opinion. Obviously there is no rigidly defined way of using XML but I would also say that putting all the data in attributes is very much against the spirit of XML and SGML before it. Read the spec and you'll get a very good feel for how the authors intended it to be used.

Oh but FYI, I don't agree that "it's not proper XML" as the original poster said; there's no such thing as improper XML as long as it's well formed. I do think that it is an unintelligent and illogical way to use XML and very much against the spirit of what XML is trying to achieve and the benefits it provides.

2010-06-08 Reply Admin

Larry:
bannedfromcoding:
Mainframe mindset. *sigh*
Hey, I f***ed your mother on a mainframe, sonny, let's see you do that with your pee cee.

Rule #34 never fails http://tinyurl.com/23u6dgu (NSFW if your policy is really strict)

2010-06-08 Reply Admin

Obie Server:
[...]I put it to you that it solves several problems that have been solved already, but does so in an easy to communicate, easy to understand manner.

"Easy to understand" usually stops as soon as you throw in namespaces. Also, there's a kind of verbosity inherent in XML that makes it grow beyond easily graspable sizes, e.g. without tool support, quite quickly. I think XML as a format might have actually benefitted if it did NOT pretend you could easily process it as a text file.

Me, I quite like being able to glance at a snippet of XML, and already have a pretty good idea what it means, without having to look it up.

Because you have some element names besides the actual content? Well, you get the exact same benefit from using JSON, whose notation is a lot more intuitive to (at least) everyone who used a scripting language in the last 10 years. It's not as powerful as XML, granted, but the point is that this is rarely needed.

Most of the problems with XML you can come up with, are problems with how someone else has used it. I hear bank robbers are using "roads" to make their "getaways" on. Let's ban these evil "roads" forthwith, and solve the problem of bank robbery forever

A better analogy would be railroad crossings. It looks simple, but people keep using them 'wrong'. In many cases the solution is in fact a completely different approach (a bridge, here).

2010-06-08 Reply Admin

There sure are a lot of XML haters in here today. For the record, I've seen just as much XML abuse as the rest of you guys. But in my mind that doesn't undermine the usefulness of XML per-se.

I've heard the music Britney Spears and it makes me want to stab myself in the face. But just because she's screwing it up so badly doesn't destroy my faith in music. There will always be a lot of folks who do a piss-poor job of it but the underlying concept is extremely good and has a lot of merit when done properly.

2010-06-08 Reply Admin

TRWTF is that I don't see why the story is WTF-worthy.

So they had some "proprietary TCP protocols". What's wrong with that? The story doesn't say they were bad.

I don't really understand the leap from "web service" to method call, but that being as it is, I reckon that the complaint is that the parameters have to be encoded in a single string, rather than passed as separate parameters. Ok, so write your own wrapper to do that encoding. It will still become a single string eventually, but that happens anyway - your HTTP request goes into TCP as a single string, too.

Then you get your response in fixed-width format. Is that bad? If it causes data to get corrupted or truncated, yes, that would be bad. But if that isn't the case, the format has its advantages. It's easy and quick to parse, for example.

Also, if you for some reason don't want the fixed format, you can get XML, too. Sure, perhaps you would have preferred XML with meaningful element names, but what you get is pretty easy to transform to a key-value map. Be happy. Most XML formats I've seen are much worse, especially the ones where people went out of their way to make it semantically meaningful with schemas and namespaces and whatnot.

Seriously, you're getting simple formats. Count yourself lucky.

I'm starting to think that perhaps Diego is TRWTF.

2010-06-08 Reply Admin

A. Coward:
Also, if you for some reason don't want the fixed format, you can get XML, too. Sure, perhaps you would have preferred XML with meaningful element names, but what you get is pretty easy to transform to a key-value map.

YOU ARE THE REASON WHY PEOPLE HATE XML! It's morons like you that screw it up for the rest of us decent developers. If you're not going to use it properly then don't bloody use it. You're complaining that you don't see the WTF in today's article, well congratulations - YOU ARE THE REAL WTF.

2010-06-08 Reply Admin

I replied to your comment, but fucking Akismet insists that every combination of it I used was spam, and I can't be bothered, but the net result was that I irrefutably and conclusively won. You buying this?

2010-06-08 Reply Admin

Anonymous:
There sure are a lot of XML haters in here today. For the record, I've seen just as much XML abuse as the rest of you guys. But in my mind that doesn't undermine the usefulness of XML per-se.
I've heard the music Britney Spears and it makes me want to stab myself in the face. But just because she's screwing it up so badly doesn't destroy my faith in music. There will always be a lot of folks who do a piss-poor job of it but the underlying concept is extremely good and has a lot of merit when done properly.

Meh. Music is over-rated http://www.theonion.com/articles/pitchfork-gives-music-68,2278/

2010-06-08 Reply Admin

The vendor must have mistaken Apple's FMPXMLRESULT Grammar (for, I'm ashamed to admit I know this) FileMaker Pro as "clever idea."

Except that Apple's implementation requires you to specify a bunch of values which it does nothing with (except throw an error if they're missing)...

ggeens · 2010-06-08 Reply Admin

Dirge:
FWIW, I've done development in everything from assembly to Java and C# for almost twenty years, and I've never had the need (or the desire) to mount an expedition into the Indiana Jones-esque ancient ruins of fixed-width fields.

One time I worked at a bank, and they required all internal data to be fixed-length formats. They had a lot of COBOL running and they decided everyone else had to suffer the same way.

2010-06-08 Reply Admin

[quote user=Anonymous] YOU ARE THE REASON WHY PEOPLE HATE XML! It's morons like you that screw it up for the rest of us decent developers. If you're not going to use it properly then don't bloody use it. You're complaining that you don't see the WTF in today's article, well congratulations - YOU ARE THE REAL WTF. [/quote]

Thank you for the rare honor. I've always wanted to be the Real WTF.

For the record, I do follow your suggestion: I avoid XML, but when I do use it, I make an effort to do it properly.

Usually that means I end up pointing out things that don't validate or errors in processing, both in our code and in what we get from parties we communicate with. This has helped resolve numerous interoperability issues.

It is from this experience that I am saying you should be happy if you're getting simple formats. My experience is that the simpler the format, the fewer interoperability issues will arise. The formats in the article are pretty simple. Some of the XML schemas I've worked with were behemoths, and implementors invariably got things wrong - schemas and meaningful element names notwithstanding.

Better yet, more often than not, we have been given errorneous schemas and/or WSDLs. Usually the problem is that people know what they want, but they don't know all the ins and outs of XML, so you end up with things like generators spitting out cruddy schemas, restrictions that are assumed by the system you're talking to but not expressed in the schema, or things that aren't allowed by the schema still being sent from the system talking to you. I'd honestly rather have the simple formats from the article.

davedavenotdavemaybedave · 2010-06-08 Reply Admin

swedish tard:
meh:
Jay:
Looks to me like that output was somebody's attempt to simulate CSV with XML. And by the way, wouldn't CSV output be much simpler in a case like this:
It would be a whole lot simpler to parse than XML, take less bandwidth, and eliminate whole categories of possible errors to check for (missing tags, mis-spelled tags, extra tags, improper nesting, etc etc).

CSV is not easier to parse than XML. Just try to open a CSV file with Excel and find that it expects semicolons instead of commas. And how do you escape a comma/semicolon within fields? Do you use single or double-quote characters? And how do you escape those? Say, would that be an ANSI file or CP1251?

With XML, all you have to do is use your friendly neighborhood XML parser and hey presto.

XML has its clusterfucks (such as XSD, which negates the whole 'extensible' idea and is context insensitive) and you can argue a number of technologies which would be better to use in different circumstances (ASN.1 FTW), but CSV is almost always the worst possible choice.

Excel is not a good thing to use to prove that something is bad, with the justification that it does not work well in Excel. Excel itself does not work well on its own.

I never understand why otherwise computer-literate geeks fall over and turn all helpless like the users when confronted with MS software. Is it cultivated incompetence? If this was a unix system, you'd google for the problem you're having and find the answer, so why not with Excel? Are people just not aware that this is so far beneath the capabilities of Excel that it's trivial?

Of course Excel can split on any character you want. Being Excel, of course it's also incredibly obscure as to how you actually do so - rename your CSV to .txt and 'import' - but it's very good once you find the option.

2010-06-08 Reply Admin

Do not give that developer SQL

2010-06-08 Reply Admin

Anonymous:
Luis Espinal:
da chicken:
You shouldn't be doing XML with the latter method. It's not proper XML. Tag properties are for metadata, not data.
Hmmm, I'm gonna have to disagree with you on this one.
Just for the record, I agree with the orginal poster. Attributes are for metadata, values are for data. There's nothing more annoying that a HUGE XML element that's actually empty, because the implementor decided to put all the actual data in a string of attributes. The main problem with this is that you end up with lots of data in a single XML element.

I can see this being a problem in some contexts, but not all. Are we talking about potentially large documents with nested structures? Then you are right.

Are we talking about very large numbers of homogeneous flat n-tuples (with n being relatively small) which are automatically marshalled and unmarshalled? Then I fail to see the problem. Putting x or y thing on a tag or attribute almost have no consequence at all unless you have a concrete context in which does not work.

This defeats many of the benefits of XML. Elements should be "single purpose", that is they should hold a single bit of information, as defined by the element name, or they should wrap further elements in the case of heirarchical data.

This is an excellent point... for the type of data this is applicable.

So creating an XML element with a whole bunch of attributes, each containing a different piece of data, that's just wrong in my opinion. Obviously there is no rigidly defined way of using XML but I would also say that putting all the data in attributes is very much against the spirit of XML and SGML before it.

SGML was never intended for data transfer (it was simply intended for implementing markup languages), and XML was purely intended for ease of parsing for electronic exchange. Now we use it for all types of data transfer, for storage, graphics representation, and even programmatic configuration of systems. Each of these will call for exceptions to the original "spirit" of things...

... you just have to hope the engineers/designers know what they are doing (and this is true of every other piece of reusable, extensible technology.)

There has never been a set of rules indicating what type of semantics to implement for a document written in XML. Their "spirit" can only hope to be a generalized one; otherwise it would constrain many legitimate uses of it.

You raise good points, and just as we can bring an example that shows it is a bad idea to put all data in attributes, you can also find examples were doing is legitimate. When it comes to designing systems, the decisions will be running through one of the other of that spectrum.

Read the spec and you'll get a very good feel for how the authors intended it to be used.

I've read it, been working with XML in many ways for over a decade. I've seen good uses of it and bad. I respect the original spirit, but if a case elicits a deviation from it, then so be it (so long as it gets documented and the trade-offs of deviating are well-understood.) That's how we do engineering.

Just to make sure since in the Internet, it is easy for people to take it the wrong way. I'm disagreeing with you, but I'm enjoying the exchange.

2010-06-08 Reply Admin

[quote user="A. Coward"][quote user=Anonymous] YOU ARE THE REASON WHY PEOPLE HATE XML! It's morons like you that screw it up for the rest of us decent developers. If you're not going to use it properly then don't bloody use it. You're complaining that you don't see the WTF in today's article, well congratulations - YOU ARE THE REAL WTF. [/quote]

Thank you for the rare honor. I've always wanted to be the Real WTF.

For the record, I do follow your suggestion: I avoid XML, but when I do use it, I make an effort to do it properly[/quote]

In my experience, the people who decry XML, and avoid it wherever they can, seldom actually use it properly on the rare occasions they do use it. If they were using it properly, they wouldn't be so averse to it. As it stands, they've seen some abuses of it, blamed that on the tool itself, and avoid it. How one can profess to be proficient with something one admittedly doesn't use, is a bit of a mystery. In certain respects, XML is like driving: everybody assumes it's everyone else who's Doing It Wrong

2010-06-08 Reply Admin

Another Coward:
In my experience, the people who decry XML, and avoid it wherever they can, seldom actually use it properly on the rare occasions they do use it. If they were using it properly, they wouldn't be so averse to it. As it stands, they've seen some abuses of it, blamed that on the tool itself, and avoid it. How one can profess to be proficient with something one admittedly doesn't use, is a bit of a mystery.

I absolutely agree. Hence why I said that when I use XML, I make an effort to do it properly. I don't claim to know the One Proper Way to use XML. All I can do is follow the specifications, point out errors with respect to the specifications, and point out things that I know from experience to often cause interoperability issues.

As for blaming the tool for the errors of its users, I have a somewhat different view on that. When I think about XML, I am thinking not only about the language proper, but also about the culture that surrounds it. From where I sit, it seems there is a lot of propaganda and dogma for using XML and using it in certain ways, and that is the stuff of which WTFs are made: applying the same solution to every problem in the same way, without considering if there is a better solution or a better approach.

Certainly, applying XML when it isn't the best solution isn't the fault of the technology, and I'm not blaming the tool for being applied in cases where another tool would have been better. However, the reason that XML isn't always the best tool has everything to do with the technology, and, in my consideration, XML has never been the best tool for any job I have taken on (except, of course, interfacing with a system that uses XML). The reason I avoid XML, then, is not that I don't like it or don't understand it, but rather that I think another format would be better for the task at hand.

2010-06-08 Reply Admin

davedavenotdavemaybedave:
swedish tard:
meh:
Jay:
Looks to me like that output was somebody's attempt to simulate CSV with XML. And by the way, wouldn't CSV output be much simpler in a case like this:
It would be a whole lot simpler to parse than XML, take less bandwidth, and eliminate whole categories of possible errors to check for (missing tags, mis-spelled tags, extra tags, improper nesting, etc etc).

CSV is not easier to parse than XML. Just try to open a CSV file with Excel and find that it expects semicolons instead of commas. And how do you escape a comma/semicolon within fields? Do you use single or double-quote characters? And how do you escape those? Say, would that be an ANSI file or CP1251?

With XML, all you have to do is use your friendly neighborhood XML parser and hey presto.

XML has its clusterfucks (such as XSD, which negates the whole 'extensible' idea and is context insensitive) and you can argue a number of technologies which would be better to use in different circumstances (ASN.1 FTW), but CSV is almost always the worst possible choice.

Excel is not a good thing to use to prove that something is bad, with the justification that it does not work well in Excel. Excel itself does not work well on its own.

I never understand why otherwise computer-literate geeks fall over and turn all helpless like the users when confronted with MS software. Is it cultivated incompetence? If this was a unix system, you'd google for the problem you're having and find the answer, so why not with Excel? Are people just not aware that this is so far beneath the capabilities of Excel that it's trivial?

Of course Excel can split on any character you want. Being Excel, of course it's also incredibly obscure as to how you actually do so - rename your CSV to .txt and 'import' - but it's very good once you find the option.

I think a lot of otherwise capable geeks simply fall into this e-rage/mental block when it comes to a tool (any tool) that comes from M$. They are very apt at whipping an awk script for it, but when it comes to point and click on a check box for delimiters (which Excel shows it first thing it opens a CSV file), they paralyze, like a dumb shark put belly up.

Even hacking a vba script that makes Excel programmatically load a CSV, selecting commas and nothing else for delimiters, it is as trivial as using a DOM parser.

I don't understand either why people keep saying Excel "doesn't work on its own". What does that mean anyways?

Boggles the mind:)

2010-06-08 Reply Admin

meh:
Jay:
Looks to me like that output was somebody's attempt to simulate CSV with XML. And by the way, wouldn't CSV output be much simpler in a case like this:
It would be a whole lot simpler to parse than XML, take less bandwidth, and eliminate whole categories of possible errors to check for (missing tags, mis-spelled tags, extra tags, improper nesting, etc etc).

CSV is not easier to parse than XML. Just try to open a CSV file with Excel and find that it expects semicolons instead of commas. And how do you escape a comma/semicolon within fields? Do you use single or double-quote characters? And how do you escape those? Say, would that be an ANSI file or CP1251?

With XML, all you have to do is use your friendly neighborhood XML parser and hey presto.

XML has its clusterfucks (such as XSD, which negates the whole 'extensible' idea and is context insensitive) and you can argue a number of technologies which would be better to use in different circumstances (ASN.1 FTW), but CSV is almost always the worst possible choice.

Hmm, CSV stands for "Comma Separated Values". If someone creates a file format where they separate values with semi-colons, it is not "CSV" but, I guess "SCSV" (Semi-Colon Separated Values). By the same reasoning, I could say that XML is unreliable because someone might decide to use square brackets instead of angle brackets and still call it XML, and then your standard tools couldn't successfully parse it.

In standard CSV, if a field contains a comma, you enclose the entire field in quotes. If a field contains quotes, you enclose the field in quotes and double any embedded quotes. This is simple and unambiguous. Again, the same problem arises with XML: What if someone wants to include a quote inside a tag parameter? What if someone wants a left angle bracket in text? They must be escaped. Every data storage format has to deal with the issue that there must be a way to escape special characters. In CSV, this can be handled with two simple rules, which I just completely explained above. It's more complex in XML.

There are off-the-shelf CSV parsers just as there are off-the-shelf XML parsers. They are less well-known and less widely used because they have only marginal value. CSV is so easy to parse that an off-the-shelf parser is barely necessary. To say that this is an argument against CSV is a little strange. It's bad because it's so easy to use that there's no big market for tools to help use it?

I'm not saying that XML is useless. I'm just saying that XML is complex and way over-used. There are far simpler formats that make a lot more sense for simple data requirements.

2010-06-08 Reply Admin

Dazed:
Jay:
I fail to understand the fascination with XML. Yes, there are cases where it's useful, but it seems to me that they are a very tiny number of cases.
To cut a longish story very short, XML is useful whenever the data you are transferring is hierarchically structured rather than a simple table. Which is hardly a tiny number of cases.
Though even so I will agree that it is somewhat over-used.

I'd say XML is good for cases where data can come in essentially unpredictable order, especially if there can be complex nesting. Like a word-processing document, where at any time the user could insert a heading or switch to italics, or could have italics within a footnote or a footnote within italics. Etc. XML is great for cases like that.

But how often do you deal with such cases? For me, almost never. I've been familiar with XML and related formats (HTML, SGML) for, let's see, at least 15 years. In that time I've had exactly one case where it was up to me what format to use for data storage and I choose XML. That was an application that required a mildly complex configuration file with hierarchical data.

Most of the time, my data requirements fall into two types:

(a) A set of values each of which will occur exactly once. In this case, my favorite format is a Java properties file, where each line says "name=value" and parsing simply means looking for the equal sign, a little extra complexity if names can include equals signs or values can span multiple lines. (b) A set of records with one or more fields each. In this case, I prefer CSV.

(Well, I'm not considering cases where you'd use a database here. I'd consider that a slightly different category.)

Maybe your experience is different. Maybe you routinely create applications that have complex, hierarchical data. Personally, I don't. I certainly don't have any objection to using XML in cases with such complex data. My objection is to using it in cases where the data is simpler, and XML just adds more complexity and potential for errors.

For example, say I have a set of records all of which have the same fields. Sure, I could put that in XML like:

<customers>
<customer>
<name>Fred Smith</name>
<city>New York</city>
<state>NY</state>
<balance>147.23</balance>
</customer>
...etc...

Compare this to the CSV representation. With the XML, I have to deal with whole categories of potential problems that don't exist in CSV. What if there is no "<name>" tag? What if there are two "<name>" tags? What if there's a tag I don't recognize? What if the "<customers>" tag is missing? What if the city tag is nested within the name tag, what does that mean? Etc etc. These questions don't even come up with a simpler format like CSV. Sure, there could be too many fields on a line, or too few. Those are the only problems that are even possible.

Sure, an off-the-shelf parser can handle decyphering all the complexity for me, I don't need to rewrite that. But I still need to analyze the results. I have to check for missing tags and unexpected nesting and many other categories of problems.

In cases where the data is complex, this is just how it is. You have no choice. But why add complexity when you don't need it?

2010-06-08 Reply Admin

iagorubio:
The problem with CSV is: data is dirty.
Repeat it to yourself each time you develop a system.

The first time a clerk fills "Winston, Salem" instead of "Winston-Salem" in you system it will fail.

And trust me, it will happend more than once ... a day.

And of course, XML-based application never have to deal with data that includes brackets or quotes, so there's no issue there.

CSV provides a method for escaping commas: Enclose the fields in quotes. So no, the system will not fail. I do this more than once ... per day.

Wow, did I suddenly become the "CSV warrior" here? I didn't realize this was such a controversial topic!

Satanicpuppy · 2010-06-08 Reply Admin

Sylver:
Outch!
Still I would prefer dealing with the "XML" than with the..., hum what can we call this thing anyway?

"Fixed Width Data." There are a number of systems in the world that still store their data that way, mostly old crusty COBOL.

The example doesn't convey the true horror. Let me give an example: ASDGAEVDVCIOSDF723N812NSDASD912NDASC812DCA82EDQAXNX8N12EDA82

Now, position one sets security group, position 2-8 are filler, but only if position 1 isn't numeric. Position 32-44 is...Etc. Now the only thing that could make it more fun is some binary coded decimals, yum.

2010-06-08 Reply Admin

You think that was bad? I have received hand-written invalid XML saved as a word-document (yes, that's right, someone typed in XML using word)

2010-06-08 Reply Admin

Obie Server:
Larry:
bannedfromcoding:
Mainframe mindset. *sigh*
Hey, I f***ed your mother on a mainframe, sonny, let's see you do that with your pee cee.

Rule #34 never fails http://tinyurl.com/23u6dgu (NSFW if your policy is really strict)

Oh mom, I'm so ashamed....

2010-06-08 Reply Admin

Jay:

Compare this to the CSV representation. With the XML, I have to deal with whole categories of potential problems that don't exist in CSV. What if there is no "<name>" tag? What if there are two "<name>" tags? What if there's a tag I don't recognize? What if the "<customers>" tag is missing? What if the city tag is nested within the name tag, what does that mean? Etc etc.

These questions don't even come up with a simpler format like CSV. Sure, there could be too many fields on a line, or too few.

Yeah, with CSV at this point you will be totally fucked up

2010-06-08 Reply Admin

The <field>/<value> response is a standard mainframe pattern, I've seen it several times. It's a Microfocus/IBM product for converting Cobol data definitions to XML.

2010-06-08 Reply Admin

beldred:
I'm not an expert, but I believe the xml return should have been more like this:
  <name>MARLENE</name>
  <last name>RUTH</last name>
  <mother maiden name>DE MARCO</mother maiden name>
  <birthdate>1973-02-24 00:00:00</birthdate>
And not with the descriptors of each field as their own values.

No. No you're not an expert. You forgot to include values for the attributes name and maiden.

2010-06-09 Reply Admin

I sincerely hope, for the sake of the people who would otherwise have been involved, that this story is fictional...

2010-06-09 Reply Admin

The real WTF is that Diego didn't just write a wrapper around both the function calls and the return value and get on with his life.

2010-06-09 Reply Admin

Jay:
Hmm, CSV stands for "Comma Separated Values". If someone creates a file format where they separate values with semi-colons, it is not "CSV" but, I guess "SCSV" (Semi-Colon Separated Values).

Actually it would be a DSV - "delimiter separated values". Any format that compiles data as a delimted list of values is a DSV. Naturally, this means that a CSV is a DSV in which the delimiter is a comma.

2010-06-09 Reply Admin

FIA:
Heh, I can do better, I've recently had to deal with:
<content> .... Base 64 encoded XML file... </content>

Annoyingly, both the containing XML and the XML it contains are well defined and well thought out.

I wonder if we work at the same place :)

<DataObject>Base 64 encoded and compressed xml </DataObject> oh yea, i went there

2010-06-10 Reply Admin

bl@h:
The WTF is that Diego did not ask Dora to get her handy dandy backpack and map to find directions to the vendors location and beat the living snot out of them.

M An... You are now my new daily internet hero! ;D

Well-Formed XML

Leave a comment on “Well-Formed XML”