• (cs) in reply to A Real Fart Smeller
    A Real Fart Smeller:
    Jaime:
    Our EDI team did exactly this to us a few years ago. We negotiated XML as an interchange format and in testing we started getting files with unescaped ampersands.

    Also, when we sent them data, they choked on all of our data. The hand-built test files had each element on a separate line, and our chose parser didn't add line breaks between elements. We were blamed for "changing the format".

    I sent them a link to the XML spec and they responded "we can't do all that". To this day, we exchange pseudo-XML with a pre-processor on our end.

    Don't blame the "EDI team", blame BizTalk, which is probably what they're using to convert your pseudo-XML into their crusty-ass format.

    I blame the EDI Team for accepting XML as an interchange format when they knew they couldn't support it. If they had said Fixed Field, CSV, X12, or pretty much anything else, I would have simply done it. BTW, they're using Cyclone.

  • (cs) in reply to PedanticCurmudgeon
    PedanticCurmudgeon:
    C-Octothorpe:
    Bob:
    C-Octothorpe:
    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!
    Please show some sensitivity. I had a son who was a sociopath and... oh, never mind. Apparently, this meme will never die.
    Please show some sensitivity. I had a son who would never die, and once the Twilight series comes to an end he'll be unemployable for the rest of eternity.
  • (cs) in reply to Bob
    Bob:
    C-Octothorpe:
    no laughing matter:
    boog:
    Bob:
    Please attempt some sensitivity: I had a son who was retarded...
    Did he get better?
    I am become Death, destroyer of worlds...

    Or maybe he just died!

    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?

    Who gives a fuck? Has he fixed his fucking fence yet?

  • (cs) in reply to PedanticCurmudgeon
    PedanticCurmudgeon:
    C-Octothorpe:
    Bob:
    C-Octothorpe:
    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!
    Please show some sensitivity. I had a son who was a sociopath and... oh, never mind. Apparently, this meme will never die.

    That's because it never fails to be funny.

    Okay, it's not the meme itself which is funny so much as the reaction of all the people complaining about it. Warms the heart of my cockles.

  • (cs) in reply to trtrwtf
    trtrwtf:
    TheCPUWizard:
    Bob:
    What part of "attempt some sensitivity" don't you understand?

    Bob, my condolences. Unfortunately many members of this group seem proud do display complete insensitivity ven on serious matters. As someone who has dealt with similar issues (although not family members)in various situations [I used to be very involved with an AHRC youth group among other things], I have seen firsthand (and too many times) a complete lack of understanding by so many.

    Yeah, we're a bunch of retards, aren't we?

    Speak for yourself, retard.

  • (cs) in reply to Demo
    How about:
    • U+FE60 ﹠​ small ampersand
    • U+FF06 &​ fullwidth ampersand
    That's mighty optimistic of you to think the tool that couldn't emit proper XML and couldn't substitute one character with more than one replacement would be able to emit extra-ASCII characters. =)
  • Bert Glanstron (unregistered)

    Dear vendor,

    In case you can’t tell, this is a grown-up place. The fact that you insist on using your invalid XML characters clearly shows that you’re too young and too stupid to be doing logistical analysis.

    Go away and grow up.

    Sincerely, Bert Glanstron

  • me (unregistered) in reply to PedanticCurmudgeon
    PedanticCurmudgeon:
    C-Octothorpe:
    Bob:
    C-Octothorpe:
    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!
    Please show some sensitivity. I had a son who was a sociopath and... oh, never mind. Apparently, this meme will never die.
    Please show some sensitivity. I was a meme once, then I took an arrow to the knee.
  • coyo (unregistered)

    Manpower-wise, technological development must have been held back 2-5 years due to XML not even counting the bandwidth bloat.

  • Jim (unregistered) in reply to renewest
    renewest:
    The only proper way out of this mess is to change the name of the customer.
    That's it, kill Dad - then you can rename "Brandon's Sons" oh wait, now the apostrophe's a problem, right?
  • Mick (unregistered) in reply to Smitt-Tay
    Smitt-Tay:
    The real WTF is XML.

    The whole 'self-describing' thing is stupid, unnecessary, and nearly impossible to implement, so, avoid the hassle. Write data protocols which match your data, don't squeeze your data into a generic protocol.

    Although I think you're a troll, I actually tend to agree. We seem to obsess over generic solutions to things to the point where we have significantly increased development effort to keep things generic. I know people like to believe that the awesome app they make today will survive forever, but here's a newsflash: It won't, no matter how you try to future proof it. Changes in technology mean that our jobs will always exist as we constantly have to replace existing software to suit new hardware (how many people 10 years ago could see that there would be a reasonably large shift toward mobile devices? who knows what the next big move will be?), or to support business functions that noone ever thought would be needed in an application. Desipte the best future-proofing efforts now it is amazing come the time to expand (potentially years down the track), reconfigure, redesign, improve, (whatever) the future-proofing rarely seems to have worked the way it was intended, and the effort to add new functionality is not really any less than it would have been had such standards not been followed.
  • Nagesh (unregistered) in reply to The Bytemaster
    The Bytemaster:
    ... I finally told them to save the contents as a .xml file, double click it so that their internet explorer would open and try to parse it. If it showed an error, we would reject it. ...
    Here in Hyderabad, you are having convenient to aces Anil who having side-busines in XML validator. [image]
  • Boogie Boo (unregistered) in reply to Ken B.
    Ken B.:
    cox:
    Ken B.:
    ... an XKCD reference ...
    Joe:
    And yes, we've all seen the XKCD reference, no need to repeat it.
    Check the timestamps. His "no need to repeat it" hadn't yet been posted when I started to type mine.
    I certainly enjoyed it. What's nicer than being flooded with the same XKCD references daily? Watching people get upset about it. Please, PLEASE....kepp them coming!!
  • (cs)
    Why is dumbass fake-nagesh posting photo of Kolkata and trying to palm it off as Hyderabad?

    Fake-nagesh big geogprahy fail.

  • Nagesh (unregistered)

    here is my profesor in office at Hyderabad unitversiry. he teche that XML is advance understanding only to be lerned at ms degree or above.

    [image]
  • MArtijn (unregistered) in reply to DonaldK

    but then again ä or ë is perfectly valid unescaped in an UTF-8 encoded XML document

  • Joe (unregistered)

    Question: Why is it that everyone is surprised at how badly people deal with XML, but don't seem to realise that people are equally bad at (the simpler) HTML?

    BTW it seems on StackOverflow there's a few chappies who parse XML using regex (mentioned StackOverflow, XML and regex that should get the wars started as sure as my name's not Joe).

  • (cs) in reply to Jaime
    Jaime:
    A Real Fart Smeller:
    Don't blame the "EDI team", blame BizTalk, which is probably what they're using to convert your pseudo-XML into their crusty-ass format.
    I blame the EDI Team for accepting XML as an interchange format when they knew they couldn't support it. If they had said Fixed Field, CSV, X12, or pretty much anything else, I would have simply done it. BTW, they're using Cyclone.
    WTF is wrong with so-called "developers" who can't properly implement a parser. I mean, come on, it's in every intro compiler text. A reasonable XML parser would be implementable in an automaton that can be designed completely by hand on a couple pages of 11x17 or A3 paper, at most. It's not magic, even if you pretend that there are no tools available to produce the automaton's state transition tables for you (in reality, there's plenty of them). You should be able to make a nicely working prototype over a weekend -- by "nicely working" I mean something that should mostly pass a test suite or two.

    If you use any half-sane scripting language, even something from a weirdo system, you can still do state machines in it. Heck, I've done a state-machine (automaton) based parser in ANSYS scripting -- that's a horrible scripting language used by a dinosaur finite element modelling package. To give you an idea of how bad it is: originally, it was a FORTRAN package with only a command-line interface using FORTRAN IO syntax. Lived on mainframes. Then they tacked on a GUI (what's an undo?), and the scripting grew apparently "organically", with no single mind giving its growth any sense of direction. I'm sure that in spite of whatever monstrosity that Cyclone is, it can't be worse to implement parsers in it than in ANSYS.

    PS. Yes, it was for XML-driven FEM testing. And it worked just fine.

  • (cs) in reply to Nagesh
    Nagesh:
    here is my profesor in office at Hyderabad unitversiry. he teche that XML is advance understanding only to be lerned at ms degree or above.

    [image]

    That's a very depressing picture.

    "Books, books everywhere - nor any words to read."

  • usoer (unregistered) in reply to Matt Westwood
    Matt Westwood:
    Nagesh:
    here is my profesor in office at Hyderabad unitversiry. he teche that XML is advance understanding only to be lerned at ms degree or above.

    [image]

    That's a very depressing picture.

    "Books, books everywhere - nor any words to read."

    I want to be a probationary Officer

  • (cs) in reply to Jim
    Jim:
    renewest:
    The only proper way out of this mess is to change the name of the customer.
    That's it, kill Dad - then you can rename "Brandon's Sons" oh wait, now the apostrophe's a problem, right?
    Apostrophe's never a problem. Replace it with ` and you don't even have to convert it back on the flip side.

    Or you change it to double quotes like the closed captions on The Big Bang Theory do.

  • Sigh - (unregistered)

    TRWTF is that I've got customer who really asked for that, damn!

  • Uhh (unregistered)

    ...is there some reason they couldn't just escape the XML before parsing it? Aside from the fix-my-problem-for-me-i'm-a-god-programmer-that's-better-than-you arrogance? They should be verifying the data before parsing it anyway, you'd have to have no sense of security at all to just blindly process an external data source without a thought for vulnerabilities, even if it's from a trusted source. XML is a terrible format anyway, why any vendor would insist on using it in 2012 is just baffling 0_o Unless its a 2001-vintage application.

  • Earp (unregistered) in reply to Bob

    So was my brother, are you saying I should not say someone is retarded?

    What about if we called them a bunch of stupids? isn't that offensive to stupid people? Or perhaps called them a bunch of wankers? Isn't that offensive to masturbators?

    You have every right to be offended, however, don't expect other people to give a damn about you being so.

    Calling a person with intellectual disabilities a retard is cruel. Doing the same to your mate when he does something dumb, is not. It does NOT mean that you think of all intellectually handicapped people as retards, or that you would ever call one of them that.

  • usoer (unregistered) in reply to Uhh
    Uhh:
    ...is there some reason they couldn't just escape the XML before parsing it? Aside from the fix-my-problem-for-me-i'm-a-god-programmer-that's-better-than-you arrogance? They should be verifying the data before parsing it anyway, you'd have to have no sense of security at all to just blindly process an external data source without a thought for vulnerabilities, even if it's from a trusted source. XML is a terrible format anyway, why any vendor would insist on using it in 2012 is just baffling 0_o Unless its a 2001-vintage application.
    uhm.....
  • Anon (unregistered) in reply to DonaldK

    What is M$? Is it some dumb as fuck way to write MS or MicroSoft that retards use?

  • Zhimp (unregistered) in reply to Anon
    Anon:
    What is M$? Is it some dumb as fuck way to write MS or MicroSoft that retards use?
    all the people who think Linux is gret because it's free and noone has any right to make money, so Microsoft must be a bunch of greedy somethings write M$, Micro$oft and Windoze

    Frankly, I'm obviously not enough of geek to appreciate the deep level of humour here....

  • (cs) in reply to Smitt-Tay
    Smitt-Tay:
    The real WTF is XML.

    The whole 'self-describing' thing is stupid, unnecessary, and nearly impossible to implement, so, avoid the hassle. Write data protocols which match your data, don't squeeze your data into a generic protocol.

    The problem with that idea in this instance is that it means you'll be developing to a protocol written by a company whose own developers don't know how to do string substitutions.
  • (cs) in reply to Cbuttius
    Cbuttius:
    Change the schema to use an attribute instead of an element node.

    You do not need to escape the ampersands in attribute values.

    I have a couple of lines in the XML specification that would disagree with you.
  • Nick (unregistered) in reply to Blue Leader
    Blue Leader:
    Sweet. I'm naming my next company <FooCo/> -- I wonder how many XML apps will choke on that.
    Great. Mine's going to be called "NickCo'); DROP TABLE Invoices;--" With a little luck and some bad programming, I'll never have to pay a dime.
  • (cs) in reply to ParkinT

    I got burned on this one also. I don't remember seeing any restriction in the XML specs I have read, but using the Java DOM Object reader it barfs, apparently thinking every XML file is a web document. My solution? The XML file writer replaces "&" with "&" and "<" with "<" (in that order); the XML file processor does the reverse (in the reverse order). Now I can pass an ASCII table 0x20 - 0x7E through the data channel.

  • (cs) in reply to Earp
    Earp:
    So was my brother, are you saying I should not say someone is retarded?

    What about if we called them a bunch of stupids? isn't that offensive to stupid people? Or perhaps called them a bunch of wankers? Isn't that offensive to masturbators?

    You have every right to be offended, however, don't expect other people to give a damn about you being so.

    Calling a person with intellectual disabilities a retard is cruel. Doing the same to your mate when he does something dumb, is not. It does NOT mean that you think of all intellectually handicapped people as retards, or that you would ever call one of them that.

    When I was a teenager in the UK, the usual insult on those lines was "spastic": "Oh good grief, Ponsonby-Smythe, you're such a spastic." Frequently abbreviated to "spazz" or "spacko. Lovely.

  • box (unregistered) in reply to Boogie Boo
    Boogie Boo:
    Ken B.:
    cox:
    Ken B.:
    ... an XKCD reference ...
    Joe:
    And yes, we've all seen the XKCD reference, no need to repeat it.
    Check the timestamps. His "no need to repeat it" hadn't yet been posted when I started to type mine.
    I certainly enjoyed it. What's nicer than being flooded with the same XKCD references daily? Watching people get upset about it. Please, PLEASE....kepp them coming!!

    http://xkcd.com/386/

  • (cs) in reply to da Doctah
    da Doctah:
    PedanticCurmudgeon:
    C-Octothorpe:
    Bob:
    C-Octothorpe:
    Nope, still retarded...
    What part of "attempt some sensitivity" don't you understand?
    The "sensitivity" part... It's not my fault I'm a sociopath!
    Please show some sensitivity. I had a son who was a sociopath and... oh, never mind. Apparently, this meme will never die.
    Please show some sensitivity. I had a son who would never die, and once the Twilight series comes to an end he'll be unemployable for the rest of eternity.
    Please show some sensitivity. I have a wife who likes Twilight, and compared to all those extremely good-looking and well-built strapping young men, I'm not much of a comparison.
  • Planar (unregistered) in reply to philz
    Jaime:
    Plus you cannot use shorthands like <field/> for an empty field.

    Why would you ever want to do that in an XML generator? Do you enjoy writing redundant code?

  • +9 (unregistered) in reply to BentFranklin
    BentFranklin:
    Instead of replacing the ampersand, try replacing the vendor.
    Better try replacing some Brillant IT-guys at vendor side.
  • (cs) in reply to Kuba
    Kuba:
    A reasonable XML parser would be implementable in an automaton that can be designed completely by hand on a couple pages of 11x17 or A3 paper, at most.
    XML is surprisingly difficult to parse correctly, not if you're going to support all of it. If you don't comprehend just how much mischief you can get into with DTDs and external entities, you should count yourself lucky and leave parsing XML to a specialist library that Gets It Right. Heck, that's also the lazy thing to do.

    Generating XML is simpler (since you don't ever spit out the nasty parts if you don't actually need their exotic bizarreness) but you've got to deal with quoting certain key characters in string content and attribute values. Write it correctly once and reuse thereafter.

  • ThomasX (unregistered) in reply to Uhh
    Uhh :
    ...is there some reason they couldn't just escape the XML before parsing it? Aside from the fix-my-problem-for-me-i'm-a-god-programmer-that's-better-than-you arrogance? They should be verifying the data before parsing it anyway, you'd have to have no sense of security at all to just blindly process an external data source without a thought for vulnerabilities, even if it's from a trusted source. XML is a terrible format anyway, why any vendor would insist on using it in 2012 is just baffling 0_o Unless its a 2001-vintage application.
    Are you really that stupid?
  • TheJonB (unregistered)

    "When I reported the problem to them, they responded rather curiously: technically, we are sending back valid XML as the ampersand does not necessarily need to be escaped."

    Technically they're right, it depends on the DTD.

  • (cs) in reply to TheJonB
    TheJonB:
    > "When I reported the problem to them, they responded rather curiously: technically, we are sending > back valid XML as the ampersand does not necessarily need to be escaped."

    Technically they're right, it depends on the DTD.

    No, ampersand is the escape character for entities and so it always needs to be escaped itself. The DTD can't change that (unlike with SGML, which is much more complex).

  • (cs) in reply to TheJonB
    TheJonB:
    > "When I reported the problem to them, they responded rather curiously: technically, we are sending > back valid XML as the ampersand does not necessarily need to be escaped."

    Technically they're right, it depends on the DTD.

    You're wrong. Look up the definition of XML. These entities and the way to handle them are defined in the SGML declaration that makes up XML, not in an extra DTD. < and & need to be escaped but > and ' don't.
  • Brendan (unregistered)

    XML: A technology designed to be almost as readable as binary data to humans, while also being almost as readable as plain English to computers; whose primary purpose is to reduce the number of incompetent programmers on unemployment benefits.

  • Ton (unregistered) in reply to Boeboe
    Boeboe:
    We once asked a customer to properly xmlencode special characters when sending over xml. This is what we got: <customer_name>Brandon & Sons</customer_name>

    FTW!

  • Daniel Migowski (unregistered) in reply to Demo
    Demo:
    How about:
    • U+FE60 ﹠​ small ampersand
    • U+FF06 &​ fullwidth ampersand
    • U+214B ⅋​ inverted ampersand

    This is just great! However, I doubt they can send something else than Win1250 encoded characters...

  • (cs) in reply to Daniel Migowski

    And what makes you think Brandon & Sons are an Eastern European company?

  • TheJonB (unregistered) in reply to Pim
    Pim:
    TheJonB:
    > "When I reported the problem to them, they responded rather curiously: technically, we are sending > back valid XML as the ampersand does not necessarily need to be escaped."

    Technically they're right, it depends on the DTD.

    You're wrong. Look up the definition of XML. These entities and the way to handle them are defined in the SGML declaration that makes up XML, not in an extra DTD. < and & need to be escaped but > and ' don't.

    http://www.w3.org/TR/REC-xml/#syntax

    2.4 Character Data and Markup

    Text consists of intermingled character data and markup. [Definition: Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations, and any white space that is at the top level of the document entity (that is, outside the document element and not inside any other markup).]

    [Definition: All text that is not markup constitutes the character data of the document.]

    The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section.

    If the DTD says it's CDATA then ampersands needn't be escaped.
  • (cs)

    They should just use a RegEx-based approach to parse this.

  • (cs) in reply to TheJonB
    TheJonB:
    If the DTD says it's CDATA then ampersands needn't be escaped.
    You're cute, but wrong. It doesn't need to be escaped in a CDATA section, which is a construct that explicitly begins with <![CDATA[</b> and finishes with ]]>. You can't omit those bits in XML. (Normal HTML is not XML.)
  • Mike (unregistered) in reply to BentFranklin
    BentFranklin:
    Instead of replacing the ampersand, try replacing the vendor.

    My first thought. If their solution was legacy last century I could understand the quick and dirty fix, but that is where that vendor should have died, in the last century.

    Captcha = ideo

  • eMBee (unregistered) in reply to harperska
    harperska:
    Yes, the sequence ]]> is disallowed inside a CDATA section, but there is an official way to escape it. You replace the string "]]>" with the string "]]]]><![CDATA[>". This works because there is no prohibition on consecutive CDATA sections, and the escape sequence effectively breaks the CDATA section into two while splitting the illegal sequence across the boundary between the two.
    nice, that was the most educational i have read in any comment here. i'll apply that for the cdata sections our code generates. thanks!

    greetings, eMBee.

Leave a comment on “The XML Escape”

Log In or post as a guest

Replying to comment #:

« Return to Article