- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
Hmm ... okay, that's going to cost &50000 to fix.
Admin
Isn't that how the light cycles broke out of the grid?
Admin
The real problem is that since humans can kind of read XML they think that it is a string. It isn't. XML is usually encoded into a string, and you have to follow those rules, even if it doesn't "read" nicely to you.
It was never meant for you, it is for transferring data between computers first, whether or not you like the way the encoding looks means nothing.
Admin
Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.
Admin
Admin
The ERP system our company bought (written in VB.NET with random parts in Access) uses XML all over the place, for little reason other than enterprisiness, but only the way they generate it is by simple string concatenation, with no escaping whatsoever, completely ignoring all the classes in .NET for working with XML.
My ugly workaround for parsing some of it:
Admin
Is anyone suggest change name of the company?
Admin
One of the guys here who wrote the company's first bit of XML data transfer code didn't bother changing "&" into "&" or anything like that. That was far too difficult to do, naturally.
There was simply a massive warning on the input screen (which was the source for the data being transferred) that said, in big, flashing, yellow and red characters "Don't type "&" into any of the boxes!!!!"
Admin
Admin
Yes, the sequence ]]> is disallowed inside a CDATA section, but there is an official way to escape it. You replace the string "]]>" with the string "]]]]><![CDATA[>". This works because there is no prohibition on consecutive CDATA sections, and the escape sequence effectively breaks the CDATA section into two while splitting the illegal sequence across the boundary between the two.
Admin
FTFY :-)
Admin
Change the schema to use an attribute instead of an element node.
You do not need to escape the ampersands in attribute values.
Admin
Also, I did not say my right, I said your right, and quite deliberately I might add. Now pay attention!
Admin
Or maybe he just died!
Admin
Admin
Anything that is not supposed to have linebreaks or other XML element nodes in its content should be written as attributes, not as as element nodes.
In fact if there are no element nodes that are supposed to have linebreaks or other XML element nodes in it, the file should not be written in XML format at all.
Admin
So does that mean & is now converted to $amp; on the server?
Admin
Admin
They would not replace & with <[CDATA[&]]> but rather add <[CDATA[ to the beginning of all fields in whatever app is creating the XML. Thus no parsing and no replacing.
Captcha: nulla. That is a female null, like -0.
Admin
Admin
What the guy should have done is fixed the XML Parser to handle the invalid state.
Future proof against idiot vendors.
Admin
I have yet to see a real-world example in which the real data contains such a string. The only contrived example that I can think of is an article about XML (like your comment), and if someone is writing one then he already knows about the issue. In that case, how should ]]> be escaped?
Admin
<Foo]]>Co/>'); drop table vendors;--
Admin
Admin
Admin
קיימים משפטים עם הנקודה בצד שמאל.
There do exist sentences with the period on the left. Most of mine are like that.
Admin
Bob, my condolences. Unfortunately many members of this group seem proud do display complete insensitivity ven on serious matters. As someone who has dealt with similar issues (although not family members)in various situations [I used to be very involved with an AHRC youth group among other things], I have seen firsthand (and too many times) a complete lack of understanding by so many.
Admin
+1
Admin
Admin
His timestamp: 2012-01-11 11:00 Your timestamp: 2012-01-11 11:05
You are a damn fast poster.
Admin
I was once called out publicly, in a legal deposition, for being at fault for "interpreting the specification literally."
Admin
XML is simply a data format - it is never a WTF. How it is used and viewed by people is the WTF.
Admin
Admin
Admin
Admin
If I had a dime for every time this type of problem had popped up in my career.... Lets just say I could buy a country, bankrupt it and then EU will ask me for a bailout and I'll have millions left to buy an island.
Admin
DING! DING! DING! DING! DING! We have a WINNER!
Let the vendor know that their inability to keep up with late 20th century computing specifications will cause you to re-evaluate the use of their services.
Admin
Using it as a data format is TRWTF!
Admin
Yeah, we're a bunch of retards, aren't we?
Admin
My God that customer conversation just summed up the last 7 years of my life; I need a new job...
Admin
Simple, then you add a check to see which client is sending the data and only replace the dollar signs with ampersands for them. Hard coded of course.
Or extracted from an XML file containing replacement pairs per client. Properly escaped of course, and base64 encoded for good measure. But then gzipped to compensate for the extra space required, and then escaped again.
Admin
Can I have you BOTH for a &?
Admin
I'd go for something along the lines of </html> or just
.Admin
I was working with a vendor to send us records in real-time in an agreed upon XML format. It was very obvious they were hand crafting the XML. Open tag casing and spelling didn't match the close tag, unescaped characters, etc.
After going rejecting several drafts over a few weeks, they still were not getting that it had to first be VALID XML before we could use it or process it, and they did not understand what valid XML was.
I finally told them to save the contents as a .xml file, double click it so that their internet explorer would open and try to parse it. If it showed an error, we would reject it.
Two days later I recieved a valid XML file.
Admin
That reminds me of the day I tried to subscribe to the Phishtank RSS-Feed for my employer:
http://rss.phishtank.com/rss/asn/?asn=8560
You'll need the raw view like wget to see the bug as most browsers will happily let you subscribe and than fail to update.
Obviously, I opened a trouble ticket but it must have goten lost with all the Post-Its and emails
Next thing I will try to convince my employer to change the compyna name ;)
Admin
"The real problem is that since humans can kind of read XML"
Then why the f..k does so much java software use xml as human required configuration files. I'm looking at tomcat, jetty and solr right now.
Worse than trying to deal with old time sendmail configs.
Admin
BTW: am I the only one to see XML-Errors in the error-console while editing tdwtf-Comments?
Admin
I dealt with this crap on a project last year.
Don't blame the "EDI team", blame BizTalk, which is probably what they're using to convert your pseudo-XML into their crusty-ass format. EDI was probably "great" back in the late seventies when it congealed out of primordial goo. The rest of the world has moved on, leaving that abomination of a file format behind.
For those that haven't had the misfortune of dealing with EDI, it's an ASCII-only text format with character delimiters and field widths (both!). The "spec" allows anyone to modify said "spec" in any way imaginable. It's less of a spec and more of a set of parsing rules and some rough guidelines about what data should be collected. Except it claims to be the exact data format "everyone" uses for a given industry's needs.
In a note closely related to the WTF today, I went through similar shenanigans converting XML output to EDI 210. The EDI 210 has a sub-field delimiter, usually a colon. It's not actually used anywhere in this particular variation of the 210, but it has to be defined. So NONE of the fields can contain a colon, even things that are data passed through from imported EDI 211 files (which apparently doesn't disallow colons). So anytime someone using a third-party system enters a colon into a text field, that colon eventually screws up the invoicing process.
Captcha: nulla. EDI's benefit to humanity adds up to nulla.
Admin
you do only one pass, there is no infinite recursion problem here.