• eMBee (unregistered) in reply to The Bytemaster
    The Bytemaster:
    [...] they did not understand what valid XML was.

    I finally told them to save the contents as a .xml file, double click it so that their internet explorer would open and try to parse it. If it showed an error, we would reject it.

    Two days later I recieved a valid XML file.

    there is nothing like helping someone by giving them proper tools to work with.

    greetings, eMBee.

  • Omego2K (unregistered) in reply to Nagesh
    Nagesh:
    Omego2K:
    Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.

    Is anyone suggest change name of the company?

    It was actually in the description field. Not sure what was suggested during the interviews.

  • (cs) in reply to Demo
    Demo:
    How about:
    • U+FE60 ﹠​ small ampersand
    • U+FF06 &​ fullwidth ampersand
    • U+214B ⅋​ inverted ampersand

    Trust me: If they can't do "&", they can't do unicode either.

  • Jay (unregistered) in reply to frits
    frits:
    Fr3nchie:
    ParkinT:
    Why not replace all ampersands with the the three characters AND ?!
    If they could have done that, they also could have replaced all ampersands with the five characters "&".
    That's cool. But what about the infinite recursion problem when replacing the leading character in "&" with "&"? Did you think about that?

    I believe that problem can be solved with the recent advanced IT technique called, "Not calling the encode function again on the string that you just encoded." It's listed in many software development textbooks under the heading, "Foot, shooting yourself in the, not".

    There are related technique called "Updating the variable you intended to update rather than another one with a similar name" and "Using the plus sign when you want to add two numbers instead of using the ampersand".

  • (cs) in reply to dkf
    dkf:
    Kuba:
    A reasonable XML parser would be implementable in an automaton that can be designed completely by hand on a couple pages of 11x17 or A3 paper, at most.
    XML is surprisingly difficult to parse correctly, not if you're going to support all of it. If you don't comprehend just how much mischief you can get into with DTDs and external entities, you should count yourself lucky and leave parsing XML to a specialist library that Gets It Right. Heck, that's also the lazy thing to do.
    When you generate an app-specific XML parser, it makes sense to embed the DTD in the parser itself. External entities are probably a non-issue if all you deal with is a stand-alone XML file. Sure you're leaving stuff out, but those are sane things that you can agree to between vendors. You simply agree to exactly what clauses of the specs are left out of the subset. It'd be obviously and at first sight braindead if you tried to enforce particular white-space sensitivity (or ignored entities entirely).
  • Pedantry are us (unregistered) in reply to eMBee
    The Bytemaster:
    [...] they did not understand what valid XML was.

    I finally told them to save the contents as a .xml file, double click it so that their internet explorer would open and try to parse it. If it showed an error, we would reject it.

    Two days later I recieved a valid XML file.

    You mean "Well Formed".

    Internet Explorer can't tell if it's valid unless you've got a DTD or schema to validate against. I would be stunned if they had that, as very few users of XML bother with those.

    • Certainly the data sources I process have no DTD or schema, they just have a long Word document describing what everything really means (with diagrams) and a few example files ("use all features" and "use minimum of features").

    Which is fine by me, as DTD/schema only contain a structure and no "meaning", while the Word document and "maximal example" files defines both structure and meaning.

    My parsers for the data file from them have the effective "schema" hard-coded (based on a streaming XML parser. DOM is too slow for this, and SAX is just awful.)

  • undefined (unregistered) in reply to DonaldK
    DonaldK:
    Wait until they get other special characters like "ë" etc... they also sometimes throw off home-grown (and M$, I mean M& developed) XML parsers.

    "ё" is not special character, it's regular Cyrillic letter. It's need to be escaped only in non-unicode XML. And if you use non-unicode XML then it's real WTF.

  • Gibbon1 (unregistered) in reply to Omego2K
    Omego2K:
    Wow, we have the same issue with our vendor. In fact my boss used it as interview question to see what the applicants thought process in resolving it is.

    I'd get the aspie neckbread down the hall to write a pearl script to deal with the unescaped ampersands.

    Do I get the job?

  • tego (unregistered) in reply to Smitt-Tay
    Smitt-Tay:
    The real WTF is XML.

    The whole 'self-describing' thing is stupid, unnecessary, and nearly impossible to implement, so, avoid the hassle. Write data protocols which match your data, don't squeeze your data into a generic protocol.

    Almost. XML does have its uses and it's good for some things. But, like most things, it is not a panacea.

    The real WTF is being so f...ing stupid as to use XML for everything, everywhere, regardless of it's applicability -- just because XML is so cool and everyone else uses it. Thus if you use XML, you show your customers what a knowledgeable professional you are.

    It doesn't matter whether XML is well-suited or whether your customers who may not have an understanding of the non-obvious rules run into problems like the one mentioned here. Clearly, if someone has a & character in their company name, and they don't realize that the underlying encapsulation chokes on that, they must be total idiots.

    A similar issue comes from the human-readable property of XML. In many cases it is not necessary or even desirable that data be human-readable and editable (e.g. config files). Nevertheless, XML is still used, because it's there and hey, it's cool. It then comes as a total surprise when end-users actually edit data (some people feel COMPELLED to do that, because it's XML) and programs exhibit unexplainable behaviour.

  • (cs) in reply to Pedantry are us
    Pedantry are us:
    [...] My parsers for the data file from them have the effective "schema" hard-coded (based on a streaming XML parser. DOM is too slow for this, and SAX is just awful.)

    Would you please show us that code? May I suggest using the "Submit your wtf" button above?

  • Shinobu (unregistered) in reply to harperska
    harperska:
    "]]]]><![CDATA[>"
    What pre-cambrian horror is that?!
    Dotan Cohen:
    There do exist sentences with the period on the left. Most of mine are like that.
    So you're from D'ni? Pull the other one.
    Cassidy:
    XML is simply a data format - it is never a WTF.
    XML tries to be two things: human readable and machine readable. In an attempt to tackle this, it uses a swathe of meta-characters and a convoluted file format. As it happens, for most non-trivial datasets XML isn't human-readable. Finding your way in a big XML file is a constant hunt for corresponding tags and trying to guess your nesting level. And in the cases where it is human-readable, it thus encourages human XML writing, which is usually disastrous due to the aforementioned meta-characters and special cases. Like mines, waiting for someone to step on them. And because of aforementioned meta-characters and lovecraftian syntax, it is also tremendously hard for computers to read. (If you don't believe me, you haven't been here long enough. Or you could read up on the number of security vulnerabilities found in standard XML parsing libraries.) It's hard to the point that if you don't have aforementioned library, you might as well give up. And if you do have one, you'll discover that the code necessary to interface between the library and the rest of the project is a bug-farm in itself, especially in the hands of less experienced programmers. That is why XML is a WTF. XML is not just a data format - it's a data format so bad, it has to be an April Fool's joke spun out of control.
    Brendan:
    XML: A technology designed to be almost as readable as binary data to humans, while also being almost as readable as plain English to computers; whose primary purpose is to reduce the number of incompetent programmers on unemployment benefits.
    QFT
  • fool (unregistered)

    Yes, actually, there is something you could replace ampersands with "on your side". Please replace all ampersands with & Thank you.

  • RN (unregistered) in reply to harperska

    That is sick and ingenious at the same time. Could XML be any uglier?

  • Luiz Felipe (unregistered) in reply to harperska
    harperska:
    Yes, the sequence ]]> is disallowed inside a CDATA section, but there is an official way to escape it. You replace the string "]]>" with the string "]]]]><![CDATA[>". This works because there is no prohibition on consecutive CDATA sections, and the escape sequence effectively breaks the CDATA section into two while splitting the illegal sequence across the boundary between the two.

    Whooot! Careful with this. Its dangerous.

  • Luiz Felipe (unregistered) in reply to coyo
    coyo:
    Manpower-wise, technological development must have been held back 2-5 years due to XML not even counting the bandwidth bloat.

    mtom is what xml enconding should have been.

    http://docs.oracle.com/cd/E15523_01/web.1111/e15184/mtom.htm

    no "human readable" bullshit and bloat, its like binary xml.

    i use it, because i develop both client and server programs (not same language).

  • Luiz Felipe (unregistered) in reply to Zhimp
    Zhimp:
    Anon:
    What is M$? Is it some dumb as fuck way to write MS or MicroSoft that retards use?
    all the people who think Linux is gret because it's free and noone has any right to make money, so Microsoft must be a bunch of greedy somethings write M$, Micro$oft and Windoze

    Frankly, I'm obviously not enough of geek to appreciate the deep level of humour here....

    Fools, open source only exist because hardware manufacturers like IBM wants software to be commodity, but Microsoft has turned hardware into commodity. Live with it. When all software is free, you will need to work in hardware, and everyone knows the price of hardware only drops. Because software is complement of hardware. We are really screwed. Thanks Microsoft for stopping this nonsense. In little time, we (developers) arent anymore enginers, we will be artists that need donations to be sustained, like painters.

  • Ram Ed (unregistered)

    TRWTF is the writers inability to convince.

  • Roger (unregistered) in reply to Jim
    Jim:
    What the guy should have done is fixed the XML Parser to handle the invalid state.

    Future proof against idiot vendors.

    Not sure if this was a joke, but I had to do that. So many vendors connecting to the system could not master sending valid XML that a cleanup process was added to the stream that would sanitize the input before parsing the "XML".

    Let's not even get started on the problems when the vendors should implement valid HTTP cache headers.

  • (cs) in reply to RN
    RN:
    That is sick and ingenious at the same time. Could XML be any uglier?

    Yep. Thanks to a kind offer from the W3C. The advantage to it is that it gets rid of the "bloat" that seems to be the chief complaint of most XML critics.

    (inb4 Akismet says out)

  • a highly-placed source (unregistered) in reply to Matt Westwood
    Matt Westwood:
    Earp:
    So was my brother, are you saying I should not say someone is retarded?

    What about if we called them a bunch of stupids? isn't that offensive to stupid people? Or perhaps called them a bunch of wankers? Isn't that offensive to masturbators?

    You have every right to be offended, however, don't expect other people to give a damn about you being so.

    Calling a person with intellectual disabilities a retard is cruel. Doing the same to your mate when he does something dumb, is not. It does NOT mean that you think of all intellectually handicapped people as retards, or that you would ever call one of them that.

    When I was a teenager in the UK, the usual insult on those lines was "spastic": "Oh good grief, Ponsonby-Smythe, you're such a spastic." Frequently abbreviated to "spazz" or "spacko. Lovely.

    I think the more accurate description of Ponsonby-Smythe is that he is a Joey.

    Any biters?

  • F**kingMike (unregistered)

    Many comments seem to flirt with sarcarsm and sound like jokes, but also too many suggestion sound so serious, it make me feels there is way too much inspiration for all those WTF articles! :-)

  • gabs (unregistered)

    why not just run a regex with this pattern: &(?![\w]*;)

    would find the &s in bold: & & &hello daily & wtf daily & wtf this < > &; & abc &abc

  • Paul Neumann (unregistered) in reply to gabs
    gabs:
    why not just run a regex with this pattern: &(?![\w]*;)
    But fails for &this;

    &this; is an invalid entity and should therefore be failed.

    Code review: pass!

  • gabs (unregistered) in reply to Paul Neumann

    Then I'd be more concerned about the guy who writes hard-coded invalid ampersand entities, but I think that this would do the job in this case.

  • psc (unregistered)

    So... what do you do if there's an actual dollar-sign in the company name?

  • psc (unregistered) in reply to psc
    psc:
    So... what do you do if there's an actual dollar-sign in the company name?

    Nvm, asked before...

  • Okay then? (unregistered) in reply to ThomasX
    ThomasX:
    Uhh :
    ...is there some reason they couldn't just escape the XML before parsing it? Aside from the fix-my-problem-for-me-i'm-a-god-programmer-that's-better-than-you arrogance? They should be verifying the data before parsing it anyway, you'd have to have no sense of security at all to just blindly process an external data source without a thought for vulnerabilities, even if it's from a trusted source. XML is a terrible format anyway, why any vendor would insist on using it in 2012 is just baffling 0_o Unless its a 2001-vintage application.
    Are you really that stupid?
    So it >is< the arrogance?

Leave a comment on “The XML Escape”

Log In or post as a guest

Replying to comment #:

« Return to Article