- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
No, that's what JSON and YAML are for.
The number of XML documents which couldn't also have been encoded in JSON (or YAML if you want some pretty-printing) is most likely somewhere between tiny and non-existent.
Admin
The problem with CSV is: data is dirty.
Repeat it to yourself each time you develop a system.
The first time a clerk fills "Winston, Salem" instead of "Winston-Salem" in you system it will fail.
And trust me, it will happend more than once ... a day.
Admin
Why does noone know the quoting rules for CSV?
<person><name>Joe Bloggs</name>...</person> <person><name>Winston, Salem</name>...</person> <person><name>Joe "Big dog" McCoy</name>...</person> => name,... Joe Bloggs,... "Winston, Salem",... "Joe ""Big dog"" McCoy",...
The only real problem with CSV is that lots of editors for it don't like when the text is Unicode encoded (Excel will open UTF-16LE, but not UTF-8, and will not save without asking to convert to ISO-8859-1).
Admin
Well, it's that the XML response is nothing but delimited data dressed up as XML. Better than fixed-width, but no better than a CSV file.
Good XML actually tells you what the data means, using a meaningful name for each element instead of a sequence of "<field>" elements where you have to remember which order they're supposed to be in.
Admin
Admin
Could that be clieop3 ? That's a format still used by all Dutch banks e.g. for dancing schools with members agreeing that their subscription fee is automatically transfered. Clieop3 request files are build with fixed width lines. The service breaks without giving a reason when you offer it \r\n, one whitespace too many, etc.
Maybe XML is not so bad after all.
Admin
Seen worse. The names have been changed to protect the innocent, but:
<Customer> <StartDetails/> <StartName/> <StartFirst/> <Value>Bob</Value> <EndFirst/> <StartLast/> <Value>Smith</Value> <EndLast/> <EndName/> <!-- and so it went on --> <EndDetails> </Customer>Admin
It's not entirely dis-similar to what DBUnit uses. Or what MySQL dumps if you ask it for "XML"
Admin
test
Admin
From page 58 of Growing Better Software: "From a maintenance perspective, using [...] fixed field length and fixed record length is asking for trouble, because field lengths keep changing, requiring us to change our [code] over and over again. "
Admin
failed
Admin
ITT: people who don't get XML, can't use it, and wave their collective crucifixes (crucifii?) at it whenever they see it. Get with the now, luddites, XML is here to stay, you might as well get used to it
Admin
Admin
XML is a solution in search of a problem that hasnt already been solved, repeatedly. But yeah, I have come to terms with that it isnt going anywhere anytime soon. That doesnt mean I have to like it.
Admin
Excel is not a good thing to use to prove that something is bad, with the justification that it does not work well in Excel. Excel itself does not work well on its own.
Admin
To be fair to XML, it is designed as a generalized text markup (hence the M), not for serializing an object model, as it is almost always abused for.
To be mean to it myself, though, I'm not really sure how useful a generalized text markup language is, as there are not that many different types of mainly text documents. Web pages, obviously, language specs, and legal documents, but I'm struggling now.
Admin
Um, no. There's a very good reason that won't work, if you look closely at the data
Admin
Admin
Can you explain to me what's wrong with having a solution to a problem that hasn't already been solved, repeatedly?
Putting aside your oh-so-Freudian slip, I put it to you that it solves several problems that have been solved already, but does so in an easy to communicate, easy to understand manner. Me, I quite like being able to glance at a snippet of XML, and already have a pretty good idea what it means, without having to look it up. Most of the problems with XML you can come up with, are problems with how someone else has used it. I hear bank robbers are using "roads" to make their "getaways" on. Let's ban these evil "roads" forthwith, and solve the problem of bank robbery forever
Admin
Admin
Rule #34 never fails http://tinyurl.com/23u6dgu (NSFW if your policy is really strict)
Admin
Admin
There sure are a lot of XML haters in here today. For the record, I've seen just as much XML abuse as the rest of you guys. But in my mind that doesn't undermine the usefulness of XML per-se.
I've heard the music Britney Spears and it makes me want to stab myself in the face. But just because she's screwing it up so badly doesn't destroy my faith in music. There will always be a lot of folks who do a piss-poor job of it but the underlying concept is extremely good and has a lot of merit when done properly.
Admin
TRWTF is that I don't see why the story is WTF-worthy.
So they had some "proprietary TCP protocols". What's wrong with that? The story doesn't say they were bad.
I don't really understand the leap from "web service" to method call, but that being as it is, I reckon that the complaint is that the parameters have to be encoded in a single string, rather than passed as separate parameters. Ok, so write your own wrapper to do that encoding. It will still become a single string eventually, but that happens anyway - your HTTP request goes into TCP as a single string, too.
Then you get your response in fixed-width format. Is that bad? If it causes data to get corrupted or truncated, yes, that would be bad. But if that isn't the case, the format has its advantages. It's easy and quick to parse, for example.
Also, if you for some reason don't want the fixed format, you can get XML, too. Sure, perhaps you would have preferred XML with meaningful element names, but what you get is pretty easy to transform to a key-value map. Be happy. Most XML formats I've seen are much worse, especially the ones where people went out of their way to make it semantically meaningful with schemas and namespaces and whatnot.
Seriously, you're getting simple formats. Count yourself lucky.
I'm starting to think that perhaps Diego is TRWTF.
Admin
Admin
I replied to your comment, but fucking Akismet insists that every combination of it I used was spam, and I can't be bothered, but the net result was that I irrefutably and conclusively won. You buying this?
Admin
Meh. Music is over-rated http://www.theonion.com/articles/pitchfork-gives-music-68,2278/
Admin
The vendor must have mistaken Apple's FMPXMLRESULT Grammar (for, I'm ashamed to admit I know this) FileMaker Pro as "clever idea."
Except that Apple's implementation requires you to specify a bunch of values which it does nothing with (except throw an error if they're missing)...
Admin
One time I worked at a bank, and they required all internal data to be fixed-length formats. They had a lot of COBOL running and they decided everyone else had to suffer the same way.
Admin
[quote user=Anonymous] YOU ARE THE REASON WHY PEOPLE HATE XML! It's morons like you that screw it up for the rest of us decent developers. If you're not going to use it properly then don't bloody use it. You're complaining that you don't see the WTF in today's article, well congratulations - YOU ARE THE REAL WTF. [/quote]
Thank you for the rare honor. I've always wanted to be the Real WTF.
For the record, I do follow your suggestion: I avoid XML, but when I do use it, I make an effort to do it properly.
Usually that means I end up pointing out things that don't validate or errors in processing, both in our code and in what we get from parties we communicate with. This has helped resolve numerous interoperability issues.
It is from this experience that I am saying you should be happy if you're getting simple formats. My experience is that the simpler the format, the fewer interoperability issues will arise. The formats in the article are pretty simple. Some of the XML schemas I've worked with were behemoths, and implementors invariably got things wrong - schemas and meaningful element names notwithstanding.
Better yet, more often than not, we have been given errorneous schemas and/or WSDLs. Usually the problem is that people know what they want, but they don't know all the ins and outs of XML, so you end up with things like generators spitting out cruddy schemas, restrictions that are assumed by the system you're talking to but not expressed in the schema, or things that aren't allowed by the schema still being sent from the system talking to you. I'd honestly rather have the simple formats from the article.
Admin
I never understand why otherwise computer-literate geeks fall over and turn all helpless like the users when confronted with MS software. Is it cultivated incompetence? If this was a unix system, you'd google for the problem you're having and find the answer, so why not with Excel? Are people just not aware that this is so far beneath the capabilities of Excel that it's trivial?
Of course Excel can split on any character you want. Being Excel, of course it's also incredibly obscure as to how you actually do so - rename your CSV to .txt and 'import' - but it's very good once you find the option.
Admin
Do not give that developer SQL
Admin
I can see this being a problem in some contexts, but not all. Are we talking about potentially large documents with nested structures? Then you are right.
Are we talking about very large numbers of homogeneous flat n-tuples (with n being relatively small) which are automatically marshalled and unmarshalled? Then I fail to see the problem. Putting x or y thing on a tag or attribute almost have no consequence at all unless you have a concrete context in which does not work.
This is an excellent point... for the type of data this is applicable.
SGML was never intended for data transfer (it was simply intended for implementing markup languages), and XML was purely intended for ease of parsing for electronic exchange. Now we use it for all types of data transfer, for storage, graphics representation, and even programmatic configuration of systems. Each of these will call for exceptions to the original "spirit" of things...
... you just have to hope the engineers/designers know what they are doing (and this is true of every other piece of reusable, extensible technology.)
There has never been a set of rules indicating what type of semantics to implement for a document written in XML. Their "spirit" can only hope to be a generalized one; otherwise it would constrain many legitimate uses of it.
You raise good points, and just as we can bring an example that shows it is a bad idea to put all data in attributes, you can also find examples were doing is legitimate. When it comes to designing systems, the decisions will be running through one of the other of that spectrum.
I've read it, been working with XML in many ways for over a decade. I've seen good uses of it and bad. I respect the original spirit, but if a case elicits a deviation from it, then so be it (so long as it gets documented and the trade-offs of deviating are well-understood.) That's how we do engineering.
Just to make sure since in the Internet, it is easy for people to take it the wrong way. I'm disagreeing with you, but I'm enjoying the exchange.
Admin
[quote user="A. Coward"][quote user=Anonymous] YOU ARE THE REASON WHY PEOPLE HATE XML! It's morons like you that screw it up for the rest of us decent developers. If you're not going to use it properly then don't bloody use it. You're complaining that you don't see the WTF in today's article, well congratulations - YOU ARE THE REAL WTF. [/quote]
Thank you for the rare honor. I've always wanted to be the Real WTF.
For the record, I do follow your suggestion: I avoid XML, but when I do use it, I make an effort to do it properly[/quote]
In my experience, the people who decry XML, and avoid it wherever they can, seldom actually use it properly on the rare occasions they do use it. If they were using it properly, they wouldn't be so averse to it. As it stands, they've seen some abuses of it, blamed that on the tool itself, and avoid it. How one can profess to be proficient with something one admittedly doesn't use, is a bit of a mystery. In certain respects, XML is like driving: everybody assumes it's everyone else who's Doing It Wrong
Admin
I absolutely agree. Hence why I said that when I use XML, I make an effort to do it properly. I don't claim to know the One Proper Way to use XML. All I can do is follow the specifications, point out errors with respect to the specifications, and point out things that I know from experience to often cause interoperability issues.
As for blaming the tool for the errors of its users, I have a somewhat different view on that. When I think about XML, I am thinking not only about the language proper, but also about the culture that surrounds it. From where I sit, it seems there is a lot of propaganda and dogma for using XML and using it in certain ways, and that is the stuff of which WTFs are made: applying the same solution to every problem in the same way, without considering if there is a better solution or a better approach.
Certainly, applying XML when it isn't the best solution isn't the fault of the technology, and I'm not blaming the tool for being applied in cases where another tool would have been better. However, the reason that XML isn't always the best tool has everything to do with the technology, and, in my consideration, XML has never been the best tool for any job I have taken on (except, of course, interfacing with a system that uses XML). The reason I avoid XML, then, is not that I don't like it or don't understand it, but rather that I think another format would be better for the task at hand.
Admin
I think a lot of otherwise capable geeks simply fall into this e-rage/mental block when it comes to a tool (any tool) that comes from M$. They are very apt at whipping an awk script for it, but when it comes to point and click on a check box for delimiters (which Excel shows it first thing it opens a CSV file), they paralyze, like a dumb shark put belly up.
Even hacking a vba script that makes Excel programmatically load a CSV, selecting commas and nothing else for delimiters, it is as trivial as using a DOM parser.
I don't understand either why people keep saying Excel "doesn't work on its own". What does that mean anyways?
Boggles the mind:)
Admin
Hmm, CSV stands for "Comma Separated Values". If someone creates a file format where they separate values with semi-colons, it is not "CSV" but, I guess "SCSV" (Semi-Colon Separated Values). By the same reasoning, I could say that XML is unreliable because someone might decide to use square brackets instead of angle brackets and still call it XML, and then your standard tools couldn't successfully parse it.
In standard CSV, if a field contains a comma, you enclose the entire field in quotes. If a field contains quotes, you enclose the field in quotes and double any embedded quotes. This is simple and unambiguous. Again, the same problem arises with XML: What if someone wants to include a quote inside a tag parameter? What if someone wants a left angle bracket in text? They must be escaped. Every data storage format has to deal with the issue that there must be a way to escape special characters. In CSV, this can be handled with two simple rules, which I just completely explained above. It's more complex in XML.
There are off-the-shelf CSV parsers just as there are off-the-shelf XML parsers. They are less well-known and less widely used because they have only marginal value. CSV is so easy to parse that an off-the-shelf parser is barely necessary. To say that this is an argument against CSV is a little strange. It's bad because it's so easy to use that there's no big market for tools to help use it?
I'm not saying that XML is useless. I'm just saying that XML is complex and way over-used. There are far simpler formats that make a lot more sense for simple data requirements.
Admin
I'd say XML is good for cases where data can come in essentially unpredictable order, especially if there can be complex nesting. Like a word-processing document, where at any time the user could insert a heading or switch to italics, or could have italics within a footnote or a footnote within italics. Etc. XML is great for cases like that.
But how often do you deal with such cases? For me, almost never. I've been familiar with XML and related formats (HTML, SGML) for, let's see, at least 15 years. In that time I've had exactly one case where it was up to me what format to use for data storage and I choose XML. That was an application that required a mildly complex configuration file with hierarchical data.
Most of the time, my data requirements fall into two types:
(a) A set of values each of which will occur exactly once. In this case, my favorite format is a Java properties file, where each line says "name=value" and parsing simply means looking for the equal sign, a little extra complexity if names can include equals signs or values can span multiple lines. (b) A set of records with one or more fields each. In this case, I prefer CSV.
(Well, I'm not considering cases where you'd use a database here. I'd consider that a slightly different category.)
Maybe your experience is different. Maybe you routinely create applications that have complex, hierarchical data. Personally, I don't. I certainly don't have any objection to using XML in cases with such complex data. My objection is to using it in cases where the data is simpler, and XML just adds more complexity and potential for errors.
For example, say I have a set of records all of which have the same fields. Sure, I could put that in XML like:
Compare this to the CSV representation. With the XML, I have to deal with whole categories of potential problems that don't exist in CSV. What if there is no "<name>" tag? What if there are two "<name>" tags? What if there's a tag I don't recognize? What if the "<customers>" tag is missing? What if the city tag is nested within the name tag, what does that mean? Etc etc. These questions don't even come up with a simpler format like CSV. Sure, there could be too many fields on a line, or too few. Those are the only problems that are even possible.
Sure, an off-the-shelf parser can handle decyphering all the complexity for me, I don't need to rewrite that. But I still need to analyze the results. I have to check for missing tags and unexpected nesting and many other categories of problems.
In cases where the data is complex, this is just how it is. You have no choice. But why add complexity when you don't need it?
Admin
And of course, XML-based application never have to deal with data that includes brackets or quotes, so there's no issue there.
CSV provides a method for escaping commas: Enclose the fields in quotes. So no, the system will not fail. I do this more than once ... per day.
Wow, did I suddenly become the "CSV warrior" here? I didn't realize this was such a controversial topic!
Admin
"Fixed Width Data." There are a number of systems in the world that still store their data that way, mostly old crusty COBOL.
The example doesn't convey the true horror. Let me give an example: ASDGAEVDVCIOSDF723N812NSDASD912NDASC812DCA82EDQAXNX8N12EDA82
Now, position one sets security group, position 2-8 are filler, but only if position 1 isn't numeric. Position 32-44 is...Etc. Now the only thing that could make it more fun is some binary coded decimals, yum.
Admin
You think that was bad? I have received hand-written invalid XML saved as a word-document (yes, that's right, someone typed in XML using word)
Admin
Admin
Yeah, with CSV at this point you will be totally fucked up
Admin
The <field>/<value> response is a standard mainframe pattern, I've seen it several times. It's a Microfocus/IBM product for converting Cobol data definitions to XML.
Admin
Admin
I sincerely hope, for the sake of the people who would otherwise have been involved, that this story is fictional...
Admin
The real WTF is that Diego didn't just write a wrapper around both the function calls and the return value and get on with his life.
Admin
Admin
I wonder if we work at the same place :)
<DataObject>Base 64 encoded and compressed xml </DataObject> oh yea, i went there
Admin
M An... You are now my new daily internet hero! ;D