• cox (unregistered) in reply to Ken B.
    Ken B.:
    Noread:
    Blue Leader:
    Sweet. I'm naming my next company <FooCo/> -- I wonder how many XML apps will choke on that.
    Why not go all the way and name your company <Foo]]>Co/>.
    ITYM ]]><![CDATA[</b>. And, of course, Bobby Tables would be the CEO.
    Joe:
    Noread:
    Blue Leader:
    Sweet. I'm naming my next company <FooCo/> -- I wonder how many XML apps will choke on that.
    Why not go all the way and name your company <Foo]]>Co/>.
    <Foo]]-->Co;/>'or 1=1;drop table customers

    And yes, we've all seen the XKCD reference, no need to repeat it.

    --Joe

  • (cs) in reply to Andrei Rinea
    Andrei Rinea:
    What if the content legitimately contains a dollar sign? Wouldn't that be wrongly converted to an ampersand? Bunch of losers..

    Hmm ... okay, that's going to cost &50000 to fix.

  • (cs) in reply to Joe
    Joe:
    <Foo]]-->Co;/>'or 1=1;drop table customers

    And yes, we've all seen the XKCD reference, no need to repeat it.

    --Joe

    Isn't that how the light cycles broke out of the grid?

  • me (unregistered)

    The real problem is that since humans can kind of read XML they think that it is a string. It isn't. XML is usually encoded into a string, and you have to follow those rules, even if it doesn't "read" nicely to you.

    It was never meant for you, it is for transferring data between computers first, whether or not you like the way the encoding looks means nothing.

  • Omego2K (unregistered)

    Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.

  • Ken B. (unregistered) in reply to cox
    cox:
    Ken B.:
    ... an XKCD reference ...
    Joe:
    And yes, we've all seen the XKCD reference, no need to repeat it.
    Check the timestamps. His "no need to repeat it" hadn't yet been posted when I started to type mine.
  • (cs)

    The ERP system our company bought (written in VB.NET with random parts in Access) uses XML all over the place, for little reason other than enterprisiness, but only the way they generate it is by simple string concatenation, with no escaping whatsoever, completely ignoring all the classes in .NET for working with XML.

    My ugly workaround for parsing some of it:

    Public Shared Sub LoadXMLFixed(doc As XmlDocument, text As String)
    	Try
    		doc.LoadXml(text)
    	Catch
    		doc.LoadXml(text.Replace("&","&"))
    	End Try
    End Sub
  • (cs) in reply to Omego2K
    Omego2K:
    Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.

    Is anyone suggest change name of the company?

  • (cs)

    One of the guys here who wrote the company's first bit of XML data transfer code didn't bother changing "&" into "&" or anything like that. That was far too difficult to do, naturally.

    There was simply a massive warning on the input screen (which was the source for the data being transferred) that said, in big, flashing, yellow and red characters "Don't type "&" into any of the boxes!!!!"

  • Nagesh (unregistered) in reply to Nagesh
    Nagesh:
    Omego2K:
    Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.

    Is anyone suggest change name of the company?

    you sem to strugle to much with this simpal concept. visa revoke!

  • harperska (unregistered) in reply to Noread

    Yes, the sequence ]]> is disallowed inside a CDATA section, but there is an official way to escape it. You replace the string "]]>" with the string "]]]]><![CDATA[>". This works because there is no prohibition on consecutive CDATA sections, and the escape sequence effectively breaks the CDATA section into two while splitting the illegal sequence across the boundary between the two.

  • (cs) in reply to QJo
    QJo:
    Hmm ... okay, that's going to cost &50000 to fix.

    FTFY :-)

  • Cbuttius (unregistered)

    Change the schema to use an attribute instead of an element node.

    You do not need to escape the ampersands in attribute values.

  • (cs) in reply to Ken B.
    Ken B.:
    boog:
    Your right, I forgot the period at the end of my sentence, silly me.
    Yes, you're supposed to have the period placed so that it's on your right, not my right.
    Unless you look at your monitor from an odd angle or have your head on backwards, I fail to see the difference.

    Also, I did not say my right, I said your right, and quite deliberately I might add. Now pay attention!

  • (cs) in reply to boog
    boog:
    Bob:
    Please attempt some sensitivity: I had a son who was retarded...
    Did he get better?
    I am become Death, destroyer of worlds...

    Or maybe he just died!

  • (cs) in reply to no laughing matter
    no laughing matter:
    boog:
    Bob:
    Please attempt some sensitivity: I had a son who was retarded...
    Did he get better?
    I am become Death, destroyer of worlds...

    Or maybe he just died!

    Nope, still retarded...

  • (cs) in reply to Cbuttius
    Cbuttius:
    Change the schema to use an attribute instead of an element node.

    You do not need to escape the ampersands in attribute values.

    THIS!

    Anything that is not supposed to have linebreaks or other XML element nodes in its content should be written as attributes, not as as element nodes.

    In fact if there are no element nodes that are supposed to have linebreaks or other XML element nodes in it, the file should not be written in XML format at all.

  • corroded (unregistered)

    So does that mean & is now converted to $amp; on the server?

  • Bob (unregistered) in reply to C-Octothorpe
    C-Octothorpe:
    no laughing matter:
    boog:
    Bob:
    Please attempt some sensitivity: I had a son who was retarded...
    Did he get better?
    I am become Death, destroyer of worlds...

    Or maybe he just died!

    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?

  • Dotan Cohen (unregistered) in reply to Anon
    Anon:
    CDATA to the rescue!
    Exactly. I came to say this.

    They would not replace & with <[CDATA[&]]> but rather add <[CDATA[ to the beginning of all fields in whatever app is creating the XML. Thus no parsing and no replacing.

    Captcha: nulla. That is a female null, like -0.

  • (cs) in reply to Bob
    Bob:
    C-Octothorpe:
    no laughing matter:
    boog:
    Bob:
    Please attempt some sensitivity: I had a son who was retarded...
    Did he get better?
    I am become Death, destroyer of worlds...

    Or maybe he just died!

    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!

  • Jim (unregistered)

    What the guy should have done is fixed the XML Parser to handle the invalid state.

    Future proof against idiot vendors.

  • Dotan Cohen (unregistered) in reply to dkf
    dkf:
    Anon:
    CDATA to the rescue!
    Which is fine as long as your data doesn't contain the sequence “]]>”…

    I have yet to see a real-world example in which the real data contains such a string. The only contrived example that I can think of is an article about XML (like your comment), and if someone is writing one then he already knows about the issue. In that case, how should ]]> be escaped?

  • Dotan Cohen (unregistered) in reply to Noread
    Noread:
    Blue Leader:
    Sweet. I'm naming my next company <FooCo/> -- I wonder how many XML apps will choke on that.
    Why not go all the way and name your company <Foo]]>Co/>.

    <Foo]]>Co/>'); drop table vendors;--

  • Gadfly (unregistered) in reply to anonymous_coder()
    anonymous_coder():
    I've torn out like 10 different ones in our codebase, all written differently. It seems like a rite of passage for crap programmers to try and reimplement an XML parser or generator. Badly.
    That is because most XML libraries suck as badly as XML itself...
  • Bananas (unregistered) in reply to Pim
    Pim:
    I'll buy that for a &.
    Thanks for sharing your &0.02 worth.
  • Dotan Cohen (unregistered) in reply to boog
    boog:
    Ken B.:
    boog:
    Your right, I forgot the period at the end of my sentence, silly me.
    Yes, you're supposed to have the period placed so that it's on your right, not my right.
    Unless you look at your monitor from an odd angle or have your head on backwards, I fail to see the difference.

    Also, I did not say my right, I said your right, and quite deliberately I might add. Now pay attention!

    קיימים משפטים עם הנקודה בצד שמאל.‏

    There do exist sentences with the period on the left. Most of mine are like that.

  • (cs) in reply to Bob
    Bob:
    What part of "attempt some sensitivity" don't you understand?

    Bob, my condolences. Unfortunately many members of this group seem proud do display complete insensitivity ven on serious matters. As someone who has dealt with similar issues (although not family members)in various situations [I used to be very involved with an AHRC youth group among other things], I have seen firsthand (and too many times) a complete lack of understanding by so many.

  • Jupiter (unregistered) in reply to MeesterTurner
    MeesterTurner:
    One of the guys here who wrote the company's first bit of XML data transfer code didn't bother changing "&" into "&" or anything like that. That was far too difficult to do, naturally.

    There was simply a massive warning on the input screen (which was the source for the data being transferred) that said, in big, flashing, yellow and red characters "Don't type "&" into any of the boxes!!!!"

    +1

  • Bananas (unregistered) in reply to boog
    boog:
    frits:
    boog:
    frits:
    But what about the infinite recursion problem when replacing the leading character in "&" with "&"? Did you think about that?
    I'm pretty sure you'd replace the leading character in "&" with ".
    You're lack of understanding of English Language syntax surprises me.
    Your right, I forgot the period at the end of my sentence, silly me. Sorry.

    Sorry, everyone.

    If this symmetry is intentional, I applaud you. If not, back to grammar school for both of you!

  • tox (unregistered) in reply to Ken B.
    Ken B.:
    cox:
    Ken B.:
    ... an XKCD reference ...
    Joe:
    And yes, we've all seen the XKCD reference, no need to repeat it.
    Check the timestamps. His "no need to repeat it" hadn't yet been posted when I started to type mine.

    His timestamp: 2012-01-11 11:00 Your timestamp: 2012-01-11 11:05

    You are a damn fast poster.

  • StefonTheMadman (unregistered) in reply to Jaime

    I was once called out publicly, in a legal deposition, for being at fault for "interpreting the specification literally."

  • (cs) in reply to Nagesh
    Nagesh:
    String Nagesh = "<customer_name>Brandon & Sons</customer_name>";
    System.out.println("Substitution string = " + Nagesh.replace('&', '&');
    
    I take it your editor doesn't highlight omitted parenthesis?
    Smitt-Tay:
    The real WTF is XML.
    Incorrect. TRWTF is people who don't understand how XML works (or the importance of well-formed, parseable XML), people who won't validate their XML against an agreed schema before data interchange, or people that believe XML is magical gold dust that solves all interoperability issues when sprinkled over disparate systems.

    XML is simply a data format - it is never a WTF. How it is used and viewed by people is the WTF.

  • (cs) in reply to C-Octothorpe
    C-Octothorpe:
    Bob:
    C-Octothorpe:
    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!
    Please show some sensitivity. I had a son who was a sociopath and... oh, never mind. Apparently, this meme will never die.
  • (cs) in reply to Dotan Cohen
    Dotan Cohen:
    boog:
    Ken B.:
    Yes, you're supposed to have the period placed so that it's on your right, not my right.
    Unless you look at your monitor from an odd angle or have your head on backwards, I fail to see the difference.
    קיימים משפטים עם הנקודה בצד שמאל.‏

    There do exist sentences with the period on the left. Most of mine are like that.

    I'm not sure how that's relevant to the discussion. Maybe you can explain?

  • Nagesh (unregistered) in reply to Cassidy
    Cassidy:
    Nagesh:
    String Nagesh = "<customer_name>Brandon & Sons</customer_name>";
    System.out.println("Substitution string = " + Nagesh.replace('&', '&');
    
    I take it your editor doesn't highlight omitted parenthesis?
    Java 1.3 is not permision for string replacement, only character. This is stil being used at my comp for version of max stabality.
  • Daniil S. (unregistered)

    If I had a dime for every time this type of problem had popped up in my career.... Lets just say I could buy a country, bankrupt it and then EU will ask me for a bailout and I'll have millions left to buy an island.

  • WthyrBendragon (unregistered) in reply to BentFranklin
    BentFranklin:
    Instead of replacing the ampersand, try replacing the vendor.

    DING! DING! DING! DING! DING! We have a WINNER!

    Let the vendor know that their inability to keep up with late 20th century computing specifications will cause you to re-evaluate the use of their services.

  • (cs) in reply to Cassidy
    Cassidy:
    XML is simply a data format - it is never a WTF. How it is used and viewed by people *is* the WTF.
    XML is a document markup language!

    Using it as a data format is TRWTF!

  • (cs) in reply to TheCPUWizard
    TheCPUWizard:
    Bob:
    What part of "attempt some sensitivity" don't you understand?

    Bob, my condolences. Unfortunately many members of this group seem proud do display complete insensitivity ven on serious matters. As someone who has dealt with similar issues (although not family members)in various situations [I used to be very involved with an AHRC youth group among other things], I have seen firsthand (and too many times) a complete lack of understanding by so many.

    Yeah, we're a bunch of retards, aren't we?

  • Kerrash (unregistered)

    My God that customer conversation just summed up the last 7 years of my life; I need a new job...

  • Tractor (unregistered) in reply to Andrei Rinea
    Andrei Rinea:
    What if the content legitimately contains a dollar sign? Wouldn't that be wrongly converted to an ampersand? Bunch of losers..

    Simple, then you add a check to see which client is sending the data and only replace the dollar signs with ampersands for them. Hard coded of course.

    Or extracted from an XML file containing replacement pairs per client. Properly escaped of course, and base64 encoded for good measure. But then gzipped to compensate for the extra space required, and then escaped again.

  • Darth Paula (unregistered) in reply to Pim

    Can I have you BOTH for a &?

  • Darkstar (unregistered) in reply to Blue Leader
    Blue Leader:
    Sweet. I'm naming my next company <FooCo/> -- I wonder how many XML apps will choke on that.

    I'd go for something along the lines of </html> or just

    .

  • (cs)

    I was working with a vendor to send us records in real-time in an agreed upon XML format. It was very obvious they were hand crafting the XML. Open tag casing and spelling didn't match the close tag, unescaped characters, etc.

    After going rejecting several drafts over a few weeks, they still were not getting that it had to first be VALID XML before we could use it or process it, and they did not understand what valid XML was.

    I finally told them to save the contents as a .xml file, double click it so that their internet explorer would open and try to parse it. If it showed an error, we would reject it.

    Two days later I recieved a valid XML file.

  • [email protected] (unregistered)

    That reminds me of the day I tried to subscribe to the Phishtank RSS-Feed for my employer:

    http://rss.phishtank.com/rss/asn/?asn=8560

    You'll need the raw view like wget to see the bug as most browsers will happily let you subscribe and than fail to update.

    Obviously, I opened a trouble ticket but it must have goten lost with all the Post-Its and emails

    Next thing I will try to convince my employer to change the compyna name ;)

  • caper (unregistered)

    "The real problem is that since humans can kind of read XML"

    Then why the f..k does so much java software use xml as human required configuration files. I'm looking at tomcat, jetty and solr right now.

    Worse than trying to deal with old time sendmail configs.

  • [email protected] (unregistered)

    BTW: am I the only one to see XML-Errors in the error-console while editing tdwtf-Comments?

    Fehler: unexpected end of XML source Quelldatei: http://thedailywtf.com/tizes/a.aspx?ZoneID=2&Task=Get&IFR=False&PageID=59266&SiteID=2&Random=1326308675548 Zeile: 1, Spalte: 24 Quelltext: <body bgcolor="#FFFFFF">
  • A Real Fart Smeller (unregistered) in reply to Jaime
    Jaime:
    Our EDI team did exactly this to us a few years ago. We negotiated XML as an interchange format and in testing we started getting files with unescaped ampersands.

    Also, when we sent them data, they choked on all of our data. The hand-built test files had each element on a separate line, and our chose parser didn't add line breaks between elements. We were blamed for "changing the format".

    I sent them a link to the XML spec and they responded "we can't do all that". To this day, we exchange pseudo-XML with a pre-processor on our end.

    I dealt with this crap on a project last year.

    Don't blame the "EDI team", blame BizTalk, which is probably what they're using to convert your pseudo-XML into their crusty-ass format. EDI was probably "great" back in the late seventies when it congealed out of primordial goo. The rest of the world has moved on, leaving that abomination of a file format behind.

    For those that haven't had the misfortune of dealing with EDI, it's an ASCII-only text format with character delimiters and field widths (both!). The "spec" allows anyone to modify said "spec" in any way imaginable. It's less of a spec and more of a set of parsing rules and some rough guidelines about what data should be collected. Except it claims to be the exact data format "everyone" uses for a given industry's needs.

    In a note closely related to the WTF today, I went through similar shenanigans converting XML output to EDI 210. The EDI 210 has a sub-field delimiter, usually a colon. It's not actually used anywhere in this particular variation of the 210, but it has to be defined. So NONE of the fields can contain a colon, even things that are data passed through from imported EDI 211 files (which apparently doesn't disallow colons). So anytime someone using a third-party system enters a colon into a text field, that colon eventually screws up the invoicing process.

    Captcha: nulla. EDI's benefit to humanity adds up to nulla.

  • seriously? (unregistered) in reply to frits
    frits:
    Fr3nchie:
    ParkinT:
    Why not replace all ampersands with the the three characters AND ?!
    If they could have done that, they also could have replaced all ampersands with the five characters "&".
    That's cool. But what about the infinite recursion problem when replacing the leading character in "&" with "&"? Did you think about that?

    you do only one pass, there is no infinite recursion problem here.

Leave a comment on “The XML Escape”

Log In or post as a guest

Replying to comment #:

« Return to Article