• Brianary (unregistered) in reply to blindman
    blindman:
    XML is a virus that needs to be stamped out.
    So, you're volunteering to do all the work it's saved me?
  • AdT (unregistered) in reply to Brianary
    Brianary:
    blindman:
    XML is a virus that needs to be stamped out.
    So, you're volunteering to do all the work it's saved me?

    Nice shot!

  • blindman (unregistered) in reply to Brianary
    Brianary:
    blindman:
    XML is a virus that needs to be stamped out.
    So, you're volunteering to do all the work it's saved me?
    If you are willing to make up all the time it has cost me, sure.
  • Brianary (unregistered) in reply to blindman
    blindman:
    Brianary:
    blindman:
    XML is a virus that needs to be stamped out.
    So, you're volunteering to do all the work it's saved me?
    If you are willing to make up all the time it has cost me, sure.
    What's the problem?
  • Steve (unregistered)

    What happened to using a real text editor for programming tasks? emacs, vi, or even edlin, fer Pete's sake, all show you the actual text. Word processors are for writing letters and manuscripts (and even there, I'll take emacs and LATeX any day -- better math layout).

    Now, if you'll excuse me, I need to go out and tend to my herd of brontosaurs. . .

  • AnonHacker (unregistered) in reply to FredSaw
    FredSaw:
    Bejesus:
    Dear Client

    It is with regret that we note our ironic humour did not hit the mark when we replied to your idiotic bug report with an equally idiotic response.

    I wish you luck finding a vendor willing to provide you with a system that automagically works around your obviously flawed software that is incapable of parsing correct XML.

    regards Competent People

    Dear Clueless People,

    you are overhead. I am profit. You are unwilling to edit two characters in your XML, thereby supplying a document with a consistent rather than random delimiter, in order to keep my business. In addition to your priorities, your concept of competence is strangely skewed. Good luck in the soup kitchen line.

    Former Client

    This "conversation" is an excellent example of why programmers rarely get to choose the vendors for their company

  • Victim (unregistered) in reply to Nelle

    I actually think I know who these people are.

    I have at least one of these types of conversations every week with them.

  • Shinobu (unregistered) in reply to Brianary

    The first two are .NET things, and as such irrelevant. Also, they could easily have used better formats, but Microsoft (the publisher of this "why XML rules" list) chose not to do so. The second is almost a canonical example of why XML sucks. The third is also a pretty good example of something that would have been better without XML. Perhaps using XML will pay back when XHTML becomes common, but then again, using XML for that is also a historical curiosity. The answer to the last two involve databases.

    For some people, like me, speed and compactness is important. We firstly don't understand why you'd want to give up these things when the only thing you get in return is the ability to say "It uses XML!" But what I really detest is that, of all conceivable hierarchical data formats, people standardized on something as horrid as XML. Of course, I know why it happened, but I still hate it, and I will never use it in any protocol or file format of my design.

  • Brianary (unregistered) in reply to Shinobu
    Shinobu:
    The first two are .NET things, and as such irrelevant. Also, they could easily have used better formats, but Microsoft (the publisher of this "why XML rules" list) chose not to do so. The second is almost a canonical example of why XML sucks. The third is also a pretty good example of something that would have been better without XML. Perhaps using XML will pay back when XHTML becomes common, but then again, using XML for that is also a historical curiosity. The answer to the last two involve databases.

    For some people, like me, speed and compactness is important. We firstly don't understand why you'd want to give up these things when the only thing you get in return is the ability to say "It uses XML!" But what I really detest is that, of all conceivable hierarchical data formats, people standardized on something as horrid as XML. Of course, I know why it happened, but I still hate it, and I will never use it in any protocol or file format of my design.

    * Irrelevant until your boss asks for them!

    • You can't just call something a "cannonical example" without some support of this viewpoint.
    • Databases are a backend, not an interchange solution.

    Look, XML is not a bullet point, it's the lingua franca of the Internet. If you want someone to be able to access your data, XML is the best option available. Your sacrifice of speed and compactness is a false dichotomy, since most of the time there isn't an ideally fast, compact, binary solution available. Can you point to a competitor to XML for general-purpose data exchange? You can't just completely ignore a technology because you think there might have been a better alternative. That's just hubris.

    If you tell people they have to learn a new binary format, and your competitor offers SOAP or POX, you lose. And barring it entirely? Mention that in all of your future interviews, and let me know how that goes. Have you told your current employer that you are an anti-XML extremist (not namecalling here, just by definition)?

  • Anon (unregistered) in reply to Brianary
    Brianary:
    So you're the one I keep having to learn new pet formats because of!
    Right tool for the right job. Just because you don't want to learn how to use a screwdriver doesn't mean you should use a hammer.
    Here's an exercise: show me how to do the following without XML. * ASP.NET config files
    Choose one: Windows INI files, or Java properties files - basically the same solution, but both pre-date XML, and both have actual error recovery if you screw up a setting.
    * .NET serialization
    Wait, seriously, they use XML for serialization?! I'll say JSON, but you really want a binary solution for something like that.
    * SOAP
    REST with JSON, to add some more meaningless XTLAs (eXtended Three Letter Acronym) to the mix.
    * RSS/Atom feeds
    JSON would work.
    * transforms of heirarchical data
    JSON again.
    * transferring complex data between systems, some of which you may not control
    Who knows? Depends on the specifics. I'll go with JSON for no good reason.

    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner. It's a far better "platform agnostic" format, although it has some problems itself. (Mostly dealing with character encodings, something XML never really solved in any case.)

    The real answer to anything you bring up is "there's a better solution, but it depends on the specifics." XML is like taking a hammer to everything, because it's big and heavy and it does sort of work.

  • (cs) in reply to anonymous
    anonymous:
    Zylon:
    When the XML document is tiny and/or guaranteed to have a static structure, it's perfectly reasonable to decline involving a lumbering XML library.

    I've written XML by hand, but never parsed it that way, as our poster is obviously doing. That's stupid, and this single quote bug is an excellent example of why. There are other problems, of course: as soon as there's an encoded entity or unusual character encoding, the homebrew parser will barf. Using a homebrew parser shows a total lack of understanding of XML's design goals. Rick should either be using a standard parser or not using XML.

    Depends on the library / language... if this was .net and you wanted to write XML to some kind of stream it's simply a case of doing something like.

    XmlWriter xw = new XmlWriter(stringbuilder)
    
    xw.WriteElement("orders");
    
    foreach (Order o in Orders)
    {
       xw.WriteElement("order");
       xw.WriteAttribute("id");
       xw.WriteAttributeValue(o.OrderId);
    
       xw.WriteEndElement();
    }
    xw.WriteEndElement();
    

    or just use XmlSerializer

    which for is preferable to string manipulation... whatever floats ya boat really...

    On the parsing side, once got my hands on some vendor code written in c#, where the developer decided to parse very large xml files using a combination of regular expressions and substring.. <shudder>

  • SNF (unregistered) in reply to Heron
    Heron:
    SNF:
    Shinobu:
    About the Notepad-does-change-text issue: Here you go.

    How does that explain this?

    You obviously didn't read the MSDN article Shinobu linked to. It's quite clear.

    It is? The MSDN article states that Notepad uses IsTextUnicode(), which is just a punt. The documentation for IsTextUnicode() isn't very helpful either, but does point out that given its statistical methods "some ASCII strings can slip through", providing the somewhat pathological example "A\n\r^Z".

    Bottom line, I still don't see a reasonable explanation of how a 4-word, 16-letter sentence, all in 7-bit ASCII, with no control characters, can fail to be recognized as such.

  • (cs) in reply to Brianary
    Brianary:
    http://www.google.com/search?q=microsoft%20bets%20xml So you're the one I keep having to learn new pet formats because of! Not sure how you missed this, but nearly ever major, successful corporation is using XML pretty extensively. Every (living) programming language supports XML, some natively. Here's an exercise: show me how to do the following without XML. * ASP.NET config files * .NET serialization * SOAP * RSS/Atom feeds * transforms of heirarchical data * transferring complex data between systems, some of which you may not control
    ASN.1 with good editor - you can't break the schema, format or encoding - especially with hierarchical data and data exchange (binary too). SNMP, H323 and LDAP are there already. They could use xml as well.
  • The Masked Director of Development (unregistered)

    Sigh. Windows "programmers". (There's an appropriate use of doublequotes!)

  • (cs) in reply to AnonHacker
    AnonHacker:
    This "conversation" is an excellent example of why programmers rarely get to choose the vendors for their company
    TRWTF(tm) is you thinking this conversation would ever take place in the real world.
  • Bejesus (unregistered) in reply to FredSaw
    FredSaw:
    Bejesus:
    Dear Client

    It is with regret that we note our ironic humour did not hit the mark when we replied to your idiotic bug report with an equally idiotic response.

    I wish you luck finding a vendor willing to provide you with a system that automagically works around your obviously flawed software that is incapable of parsing correct XML.

    regards Competent People

    Dear Clueless People,

    you are overhead. I am profit. You are unwilling to edit two characters in your XML, thereby supplying a document with a consistent rather than random delimiter, in order to keep my business. In addition to your priorities, your concept of competence is strangely skewed. Good luck in the soup kitchen line.

    Former Client

    Customers that insist a vendor break standards based interfaces to work around their shortcomings do not represent profit in the longer term.

    But the point was, WTF was that idiot doing raising any sort of defect with his vendor in the first place? Exactly what sort of answer was he expecting to such a stupid question?

  • CoyneT (unregistered) in reply to bpsm
    ... Insisting otherwise is just standardization-by-hearsay ...
    No kidding. There's a lot of that, too. I ran across some edit code in our system that insisted that email addresses containing apostrophes (john.o'[email protected]) are invalid. I thought that was odd (since RFC 2822 clearly permits this).

    Turns out the reason is that there are approximately one-bazillion web-based "email name references", regexes, edit code snips, and etc., that incorrectly say this character is not legal.

    Apparently someone made an error a decade or two ago and "all" that's left are hearsay references.


    For the curious:

    The characters permitted within an email name without quoting are:

    A-Z a-z 0-9 ! # $ % & ' * + - / = ?  _ ` { | } ~

    (If you quote the name, you can basically use any printable character.)

  • Mike (unregistered) in reply to Nacho
    Nacho:
    The WTF is that they wrote "away" instead of "a way".

    Nope, that's just another notepad translation error.

  • Brianary (unregistered) in reply to Anon
    Anon:
    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner.
    I'll buy it. JSON's often a good fit, though I'm not sure how you'd support namespaces and external includes in advanced applications. Now, there's the chicken-and-egg problem of market support, though JSON support is clearly growing. I don't mind learning how to use a screwdriver, I just don't need a different screwdriver for every screw. Of course the funny thing is, JSON would also take either quote type. :)
  • AdT (unregistered) in reply to Anon
    Anon:
    Here's an exercise: show me how to do the following without XML. * ASP.NET config files
    Choose one: Windows INI files

    Minimal structure, no support for encodings, no standardization effort. Probably does not support all character data supported by XML.

    Anon:
    or Java properties files

    Which is not standardized at all making Sun's implementation the only reliable one.

    Anon:
    * SOAP
    REST with JSON, to add some more meaningless XTLAs (eXtended Three Letter Acronym) to the mix.

    JSON does not handle encodings at all and does not offer support for a concept similar to namespaces, making extensibility difficult (something which is essential for web services).

    Anon:
    * RSS/Atom feeds
    JSON would work.

    Again, the lack of encoding and extensibility support makes JSON a poor contender in this area.

    Anon:
    * transforms of heirarchical data
    JSON again.

    Please elaborate on the stylesheet transformation technologies available for JSON.

    Anon:
    * transferring complex data between systems, some of which you may not control
    Who knows? Depends on the specifics. I'll go with JSON for no good reason.

    Good luck processing the Japanese and Korean files and the ones containing incompatible extensions by three different vendors.

    Anon:
    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner.

    Speaking of error prone, could you please elaborate on the validation solutions available for JSON?

    Anon:
    It's a far better "platform agnostic" format, although it has some problems itself.

    Right - it's so platform agnostic it does not even standardize any mapping from character to octet stream...

    Anon:
    Mostly dealing with character encodings, something XML never really solved in any case.

    That, my friend, is complete and utter bullshit. I'll stop short of calling it a shameless lie. XML encodings are well-supported by every major implementation. If you had trouble with the i18n, the problem was probably in front of the keyboard.

  • Franz Kafka (unregistered) in reply to Brianary
    Brianary:
    Anon:
    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner.
    I'll buy it. JSON's often a good fit, though I'm not sure how you'd support namespaces and external includes in advanced applications. Now, there's the chicken-and-egg problem of market support, though JSON support is clearly growing. I don't mind learning how to use a screwdriver, I just don't need a different screwdriver for every screw. Of course the funny thing is, JSON would also take either quote type. :)

    JSON is a fairly decent lightweight interchange format, but I want named values and other fancy things for some of what I use xml for - named parameters beat positional notation with dealing with anything complex.

  • (cs) in reply to Brianary
    Brianary:
    Zylon:
    Grovesy:
    The worrying thing is, I can't think of any xml library in any language that would generate some attribute values enclosed in ' and others in "

    Which can only mean they are hand crafting the xml... errk

    When the XML document is tiny and/or guaranteed to have a static structure, it's perfectly reasonable to decline involving a lumbering XML library.

    In fact, that's why XML is text, and not binary.

    No it is not. The choice is logically arbitrary, and an accident of XML's inherited predecedents.

    The only possible reason to make a computer protocol human-readable is that it will be read by humans, not written by humans. And, as Terrence Parr argues so eloquently (and correctly), XML is a disastrous choice for a human-readable API.

    Brianary:
    Anon:
    Although the ultimate WTF can be obtained by asking people what they think XML is appropriate for. Any answer that isn't "nothing" will be a WTF, and some will just be priceless.

    Yeah, you're smarter than the whole industry. XML, it's totally useless, yesiree. People that use it shure are stoopid.

    Thank you. I was, until now, unaware that XML had achieved true world dominance. (Unless, by "industry," you refer to the "XML industry," which is too self-referential for me.)

    I think that, given the assumed universality of the protocol, it is perfectly reasonable to suggest that Anon might be significantly smarter than a substantial proportion thereof. That's the thing about universality.

    I imagine that Anon, like me, would grit his teeth and use XML in many situations, because it's just there. Which doesn't make it "appropriate." And it is true: XML zealots do tend to fly off the handle at even mild criticism of their ugly, warty little baby.

    Priceless.

    On a slightly more reasoned note:

    mikecd:
    Grovesy:
    The worrying thing is, I can't think of any xml library in any language that would generate some attribute values enclosed in ' and others in "
    Only reason I can think of would be a library smart enough to change the quote type to avoid entity encoding embedded quotes. This would cost a little more at runtime for the extra pass but would result in slightly smaller XML in some cases.
    I hardly think that this is in the spirit of the alarmingly verbose XML. What's a few megabytes between friends? Can you say "ludicrous premature optimisation?"

    Well, hitting on XML loonies is dull. Let's try:

    Franz Kafka:
    AnotherVictim:
    This reminds me of an incident I faced a few months ago... <snip/>

    It's not because she's a tester. She's just a moron. Testers are just as necessary as developers, but the mindset is totally diffrent.

    Ipso facto, testers are not "just as necessary as developers." Without developers, there would be nothing to test.

    I think what you mean is something like "a truly competent tester is at least as valuable as a competent developer, and perhaps more so." The mindset isn't particularly important: it's the honey-pot instinct. All sorts of idiots flock to both development and QA jobs because, let's face it, they're comfortable. I used to think that you get more incompetent, dangerous idiots in testing that you do in development, although these days I'm not so sure. However, it remains a truism that a significant proportion of truly great testers will aspire to be developers, whereas the opposite is just so never going to happen. (Unit-testing excluded for the purposes of this rant.)

    I do think, however, that you miss one further point.

    Just as testers need developers as the essential feed to their work-flow, so developers need morons in turn.

    Imagine a world with a tragically short supply of morons. How would developers spend their time? (Open source, I know.) Who would fund our opulent existence? If morons could do simple programming jobs, we'd all be in deep shit.

    And this, I think, is the shining case for supporting XML in all its many, unnecessary usages.

    Viva Los Morones!

  • Brianary (unregistered) in reply to Brianary
    Brianary:
    Anon:
    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner.
    I'll buy it. JSON's often a good fit, though I'm not sure how you'd support namespaces and external includes in advanced applications. Now, there's the chicken-and-egg problem of market support, though JSON support is clearly growing. I don't mind learning how to use a screwdriver, I just don't need a different screwdriver for every screw. Of course the funny thing is, JSON would also take either quote type. :)
    Hmm... also not sure how to do a formal specification to indicate the expected format (DTD/XSD).
  • Maarten (unregistered) in reply to Nelle

    Notepad actually DOES parsing and/or conversion, but I doubt this was such a case. An example of a string that will be parsed and causes automated insertion of text by notepad is the string ".LOG" on the beginning of the first line of a file.

  • (cs) in reply to Bejesus
    Bejesus:
    But the point was, WTF was that idiot doing raising any sort of defect with his vendor in the first place? Exactly what sort of answer was he expecting to such a stupid question?
    As I understand it, he wasn't expecting an answer; rather, he was expecting an XML document which consistently used double-quotes for delimiters. This would involve editing two characters in the originating document. Argue all day long about how stupid he is; the bottom line is, he represents the company's source of revenue. Given that the company has already engaged in a tug of war with him over this, their resistance has already cost them more in terms of profit than simply making the change would have.

    Sometimes you have to step back from your personal ideals and remind yourself exactly why it is you're in business and what it is you're trying to accomplish.

    Addendum (2007-11-29 18:09): Sorry, forgot to include this: since double-quotes work just fine as delimiters, I'd like to hear how it is that using them in place of single-quotes would "break standards based interfaces".

  • (cs) in reply to Anon
    Anon:
    Wait, seriously, they use XML for serialization?! I'll say JSON, but you really want a binary solution for something like that. ... REST with JSON, to add some more meaningless XTLAs (eXtended Three Letter Acronym) to the mix. ... JSON would work. ... JSON again. ... Who knows? Depends on the specifics. I'll go with JSON for no good reason.

    For almost any real use of XML, JSON provides a simpler syntax for encoding the same data in a less verbose and less error-prone manner.

    Well I guess I know who I'll be meeting at the next JSON convention...

    I can't quite bring myself to agree with you though... As far as debugging goes, it's a lot easier to find a missing xml tag than a missing brace, and a lot easier for me to explain to non-tech folks how to update an XML file than JSON notation (believe me, I've tried both approaches), so... would JSON work? Yes. Is it a better tool in all cases? An emphatic 'No'.

  • (cs) in reply to SNF
    SNF:
    Bottom line, I still don't see a reasonable explanation of how a 4-word, 16-letter sentence, all in 7-bit ASCII, with no control characters, can fail to be recognized as such.
    The file as such is not letters and words, it's bytes. You don't get letters and words UNTIL you've chosen an encoding. Choosing an encoding based on the bytes is guesswork, and the shorter the file is, the less information you have to base your guess on.

    If you have Chinese language support installed, you'll see that in the encoding that Notepad guessed wrongly, the file contains a string of perfectly valid Chinese characters.

  • Brianary (unregistered) in reply to real_aardvark
    real_aardvark:
    No it is not. The choice is logically arbitrary, and an accident of XML's inherited predecedents.
    Yeah, I bet these specs are just randomly generated. Text vs. binary seems like the very first decision when creating a format. Your assertion is that, what, it was an "accident"? Explain.
    real_aardvark:
    The only possible reason to make a computer protocol human-readable is that it will be read by humans, not written by humans.
    Read: yes, written: no (fallacy).
    real_aardvark:
    And, as Terrence Parr argues so eloquently (and correctly), XML is a disastrous choice for a human-readable API.
    Must be true, then. Can you contrast it to other choices? You know, to show what it *should* look like?
    real_aardvark:
    Brianary:
    Anon:
    Although the ultimate WTF can be obtained by asking people what they think XML is appropriate for. Any answer that isn't "nothing" will be a WTF, and some will just be priceless.
    Yeah, you're smarter than the *whole* industry. XML, it's totally useless, yesiree. People that use it shure are stoopid.
    Thank you. I was, until now, unaware that XML had achieved true world dominance. (Unless, by "industry," you refer to the "XML industry," which is too self-referential for me.)
    I'm sure that's not all you're unaware of. Seriously, what alternative are you suggesting? The only alternative I've heard so far is JSON, which is OK though incomplete, and has very little market presence currently.
    real_aardvark:
    I think that, given the assumed universality of the protocol, it is perfectly reasonable to suggest that Anon might be significantly smarter than a substantial proportion thereof. That's the thing about universality.
    I thought you didn't like self-referential arguments?
    real_aardvark:
    I imagine that Anon, like me, would grit his teeth and use XML in many situations, because it's just there. Which doesn't make it "appropriate." And it is true: XML zealots do tend to fly off the handle at even mild criticism of their ugly, warty little baby.
    Oh goody, semantics. Call me a pragmatist, but I do tend to prefer tools that exist to ones that do not. They seem more "approppriate". Oh, and "XML is appropriate for" "nothing" isn't mild criticism, BTW. But you are right: XML zealots (i.e. anti-XML extremists like yourself) do tend to fly off the handle.
    real_aardvark:
    Priceless.

    On a slightly more reasoned note:

    We certainly are full of ourselves, aren't we?

    real_aardvark:
    mikecd:
    Grovesy:
    The worrying thing is, I can't think of any xml library in any language that would generate some attribute values enclosed in ' and others in "
    Only reason I can think of would be a library smart enough to change the quote type to avoid entity encoding embedded quotes. This would cost a little more at runtime for the extra pass but would result in slightly smaller XML in some cases.
    I hardly think that this is in the spirit of the alarmingly verbose XML. What's a few megabytes between friends? Can you say "ludicrous premature optimisation?"
    Bull. Typical XML transactions (SOAP, e.g.) have just about as much HTTP overhead as XML overhead. So, if you really have this problem, rather than yammering what you read on somebody's blog, compress the data or submit your own standard. If it's good, and has reasonable market support, I'll use it.
    real_aardvark:
    Well, hitting on XML loonies is dull. <snip/>

    Viva Los Morones!

    Wow, you are truly the greatest commentator ever!
  • Huh? (unregistered)

    Given that single quotes are as valid as double quotes for XML attribute values, does anyone else get the impression that Rick's XML "parsing" consisted of simple string searches?

  • jeffcityjon (unregistered)

    The funny thing is that there are translation issues in some versions of notepad...

    The Windows NT version of Notepad, installed by default on Windows 2000 and Windows XP, has the ability to detect Unicode files even when they are missing a byte order mark. To do this, it utilizes a Windows API function called IsTextUnicode()[2]. This function is, however, imperfect, incorrectly identifing some all-lowercase ASCII text as UTF-16. In result, Notepad interprets a file containing a phrase like "aaaa aaa aaa aaaaa" as two-byte Unicode text file and attempts to display it as such. If a font with support for Chinese is installed, Chinese characters are displayed.

    Few people misinterpreted this issue for an easter egg. Many phrases, which fit the pattern (including "this app can break" and "bush hid the facts") appeared on the web as hoaxes. Experts correctly attributed it to the Unicode detection algorithm.

    This issue has been resolved in Windows Vista version of Notepad.

  • (cs) in reply to real_aardvark

    I gotta agree with Brianary, real_aardvark... you're kinda being a dink about the whole XML thing.

    XML bugs me for its verbosity, but makes up for it in its widespread (native) acceptance and accessibility to the average person.

    Being an anti-XML jihadist is no different from being an XML zealot. You believe in black and white where there are many more shades of grey.

  • igitur (unregistered)

    Sounds a bit like my problem...

    Been arguing for days with the webmaster. I'm trying to prove that https://ib.absa.co.za/ib/mb.do isn't valid XHTML.

    No wonder it doesn't display on my mobile phone's Opera browser. Their response: Get a new phone.

  • (cs) in reply to SNF
    SNF:

    Bottom line, I still don't see a reasonable explanation of how a 4-word, 16-letter sentence, all in 7-bit ASCII, with no control characters, can fail to be recognized as such.

    How do you know it is 7bit ascii? HOW?

    I give you 19 bytes (the spaces count too!): 74 68 69 73 20 6F 6E 65 20 63 61 6E 20 62 72 65 61 6B 00

    QUICK! What does that say? What? Are you sure its English? I didn't tell you what format it was in. Does this start with two ascii characters 74 and 68, or the unicode character 6874?

    Now I don't know how the algorithm works exactly, that makes it thinks this stream of characters is unicode, and a slightly different stream is ansi. But I can see that it can happen. And without something explicitly saying what the format is, it will always be possible to misinterpret the data.

  • (cs) in reply to FredSaw

    [quote user="FredSawAs I understand it, he wasn't expecting an answer; rather, he was expecting an XML document which consistently used double-quotes for delimiters. This would involve editing two characters in the originating document. [/quote]

    This isn't the first time someone said this. HOW do you know it is as simple as editing two characters in a file? Looking at the data in the file, I am guessing this file is generated. Changing the file means it will be "broken" next time it is generated.

  • AdT (unregistered) in reply to Brianary
    Brianary:
    Brianary:
    Of course the funny thing is, JSON would also take either quote type. :)
    Hmm... also not sure how to do a formal specification to indicate the expected format (DTD/XSD).

    I would like to know from the "XML suxx" crowd whether they can provide any alternative that provides the feature set of XML (including namespaces and encodings), XML Schema, (E)XSLT, and XPath in an all interoperable way, but I honestly don't expect too much of an answer. All I hear is that JSON or S-Expressions or binary (it's so easy to be vague!) are much better than XML, but upon closer examination they do not even offer the feature set of the XML standard itself, not to mention XSD, XSLT, XPath and related standards.

    Because the XML haters have obvious problems coming up with any alternative that is

    a) actually superior to XML in at least a majority of applications and b) available now

    they will frequently resort to name-calling and other sorts of ad hominem attacks like this:

    real_aardvark:
    XML zealots do tend to fly off the handle at even mild criticism of their ugly, warty little baby
    real_aardvark:
    XML loonies

    It takes a gullible fool to fall to the suggestion that braggadocio and baseless insults make a rational argument.

  • Bejesus (unregistered) in reply to FredSaw
    FredSaw:
    As I understand it, he wasn't expecting an answer; rather, he was expecting an XML document which consistently used double-quotes for delimiters.
    Well since he agreed on XML he should have been expecting XML not XML with additional constraints placed on it by his shitty implementation.
    FredSaw:
    This would involve editing two characters in the originating document. Argue all day long about how stupid he is; the bottom line is, he represents the company's source of revenue. Given that the company has already engaged in a tug of war with him over this, their resistance has already cost them more in terms of profit than simply making the change would have.
    It's an unsupportable assumption that it would involve merely editing two characters. It would seem to me equally likely that part of the document is generated by another system, or that there are numerous places where delimiter use is inconsistent.

    It shouldn't matter - it's still XML. It only needs to be consistent with the standard, that's what standards are for.

    The chances are that if such a basic bug exists in his implementation then a myriad of other ones do. Using a compliant parser would have avoided them.

    Consider why they are using XML. Most of the arguments for that will be based on making numerous systems (not just these two) compatible with the same message format. Broken parsers that introduce extra constraints completely undermine that.

    FredSaw:
    Sometimes you have to step back from your personal ideals and remind yourself exactly why it is you're in business and what it is you're trying to accomplish.
    It's got nowt to do with personal ideals. I have never seen using a standard correctly cost more than not doing so for either party.
    FredSaw:
    Addendum (2007-11-29 18:09): Sorry, forgot to include this: since double-quotes work just fine as delimiters, I'd like to hear how it is that using them in place of single-quotes would "break standards based interfaces".
    They are supposed to be communicating based on a standard, that standard being XML. That's what they agreed. Both delimiters are legal in the standard, coping with them isn't optional. If you can't cope with everything in that standard that is mandatory you have a broken interface.

    The cause is that Rick was ignorant of the standard. He seems to be making the usual basic error of thinking that XML just sorta looks like this a bit like HTML.

    Ironically, Terry's first response "that they were processing the file wrong" was correct, though probably not in the way he meant it.

  • Bejesus (unregistered) in reply to SNF
    SNF:
    Bottom line, I still don't see a reasonable explanation of how a 4-word, 16-letter sentence, all in 7-bit ASCII, with no control characters, can fail to be recognized as such.
    How are you supposed to know these bytes are 7-bit ASCII?
  • Bejesus (unregistered) in reply to igitur
    igitur:
    Sounds a bit like my problem...

    Been arguing for days with the webmaster. I'm trying to prove that https://ib.absa.co.za/ib/mb.do isn't valid XHTML.

    No wonder it doesn't display on my mobile phone's Opera browser. Their response: Get a new phone.

    It's not valid anything without a DOCTYPE.

  • UTU (unregistered) in reply to chrismcb
    chrismcb:
    How do you know it is 7bit ascii? HOW?

    I check if any of the bytes has their 8th bit set. If not, I'd assume it to be 7bit.

    chrismcb:
    I give you 19 bytes (the spaces count too!): 74 68 69 73 20 6F 6E 65 20 63 61 6E 20 62 72 65 61 6B 00

    QUICK! What does that say?

    No idea, don't really read hex that well. However, all of the values (thanks for providing them in hex; determining the byte order from pure binary representation would've been more difficult) are equal to or less than 116 in decimal, it would appear that the bytes only use the bottom 7 bits.

    chrismcb:
    What? Are you sure its English?

    No. I didn't try to figure out the actual words (if there are any).

    chrismcb:
    I didn't tell you what format it was in.

    All I figured out was that it only uses 7 bits. Whether that tells it to be ASCII or not, I don't know. Only control-character appears to be the null-character at the end of the line, so I would be plausible that this is indeed ASCII.

    chrismcb:
    Does this start with two ascii characters 74 and 68, or the unicode character 6874?

    With two distinct "characters" 74 and 68. If it were unicode, every other byte would've been null. If it were UTF8, it still would've been two distinct characters, since multibyte characters are constructed in a way that prevents two singlebyte characters from being mistaken as one multibyte character. Of other encodings I don't know enough, but I'd still hazard a guess of 7bit ASCII. Oh, and I don't really know how it could be unicode character 6874; are the multibyte characters really read in that order?

    chrismcb:
    Now I don't know how the algorithm works exactly, that makes it thinks this stream of characters is unicode, and a slightly different stream is ansi. But I can see that it can happen. And without something explicitly saying what the format is, it will always be possible to misinterpret the data.

    Everything is always possible.

    And now that I took my hexeditor out and looked at the provided string, it seems like it reads "this one can break". As said previously, null terminated.

  • fletch (unregistered)
    AdT:
    He must have been right because you'd expect a browser that actually costs money to understand the damn standard application/xhtml+xml MIME type instead of requiring server administrators to come up with the most insane workarounds.
    Haha! Awesome I thought the same thing about the undue IE praise. IIRC, all IE needed was for the URL to end with '.xml'. Did not matter that it was in the CGI query portion of the url, so http://somewhere.com/get/the/thing?.xml worked ;)

    It's because they are (or were, I have not checked in IE7) using their file extension mechanism, mapping their internal thing to MIME. Brilliant.

  • (cs) in reply to UTU
    UTU:
    ... If it were unicode, every other byte would've been null ...

    Um. Assuming by "unicode" you mean UTF-16, you're saying that UTF-16 inserts a null against every single character for the sheer fun of it? Just because it's nice to double the size of every file?

    Those nulls are there to indicate that the characters are in the Latin range (range 00). If the characters are not Latin, such as Greek or Chinese, the range bytes will be of different values, and then you're back to not being able to easily tell whether it's UTF-16 or not.

  • Colyn (unregistered)

    I have a similar XML story.

    We use a third-party vendor for certain kinds of data and I had requested that they send me a list of ids via HTTP request. They sent the ids as a comma separated list as the response:

    1,2,3,4,5,6,...

    I requested that they send it in XML instead. They changed the response to the following:

    <xml> <ids attr"1,2,3,4,5,6,..."> </xml>

    Gee thanks!!

  • John Cowan (unregistered)

    A possibility nobody seems to have considered is that the single quotes were curly quotes, which are not permitted by the XML standard (which does indeed permit either ASCII single quotes or ASCII double quotes). This might happen, for example, if someone inserted these quotes (and it is unusual, though perfectly legal, for them to be the only single quotes in the whole document) using a quote-replacing editor such as WordPad.

  • Gollum (unregistered)

    You punks, we never used to have any problems with double quotes and single quotes, we would just open a socket and send a byte stream across captcha: ewww (someone get me out from under this desk)

  • Hinek (unregistered)

    The XML shown in the article is valid XML. Single quotes are in the spec. Please add this to the article.

    Still, mixing single and double quotes is not considered best practise and the answer of the vendor is hilarious. So, it stays a WTF.

  • amazed (unregistered)

    i love dumb people

  • (cs)

    Gwaaaaaaahhh.... (Dilbert scream style's)

  • (cs) in reply to AdT
    AdT:
    I would like to know from the "XML suxx" crowd whether they can provide any alternative that provides the feature set of XML (including namespaces and encodings), XML Schema, (E)XSLT, and XPath in an all interoperable way, but I honestly don't expect too much of an answer.

    Ah, but the thing is that how often are those all needed?

    Maybe some people need them. But, I've come across lots of XML, and the vast majority of it could be encoded in half the space, and parsed far more easily using something like ASN.1 or JSON or even INI file format. In fact, I can't think of anything I've seen which couldn't have been done better in an alternative way.

    I'm not, in any way, saying I've seen all possible uses of XML, but I am saying that I believe it's vastly overused. ISTM that some people see a data interchange problem, and immediately use XML without even considering alternatives. That is what I don't like. To me XML should be the last resort because it's so verbose and complex to parse.

    I suspect that if the LDAP standard was written today, it'd use XML rather than ASN.1 - as a result LDAP queries would be more than double the size they are now - with no benefit other than to be able to say it uses XML...

  • (cs) in reply to Daniel Beardsmore
    Daniel Beardsmore:
    UTU:
    ... If it were unicode, every other byte would've been null ...

    Um. Assuming by "unicode" you mean UTF-16, you're saying that UTF-16 inserts a null against every single character for the sheer fun of it? Just because it's nice to double the size of every file?

    Those nulls are there to indicate that the characters are in the Latin range (range 00). If the characters are not Latin, such as Greek or Chinese, the range bytes will be of different values, and then you're back to not being able to easily tell whether it's UTF-16 or not.

    Who cares what happens to people who don't use English? They should just adapt and use the standard already!

    Seriously though, it would be more sensible to base the choice of encoding on the OS's locale settings, i.e. if someone has configured their OS to en_US, assume that text files are ASCII encoded unless something in the file directly contradicts this.

  • (cs) in reply to Zygo
    Zygo:
    iToad:
    A hex dump shows reality.

    OK, but is that a big-endian hex dump, or a little-endian hex dump?

    Endianness is about converting between numbers bigger than one byte, and bytes.

    A hex dump displays one byte at a time. There is no endianness involved.

Leave a comment on “Notepad Translation Error”

Log In or post as a guest

Replying to comment #:

« Return to Article