• Mhendren (unregistered) in reply to /b Zap Brannigan b/ (Deleted as Troll)
    /b Zap Brannigan b/ (Deleted as Troll):
    SpasticWeasel:
    Wouldn't that be backslashdot
    That's the place where everyone is evil and wears a goatee.

    No, that's slashdot your thinking of there. backslashdot would be the place where everyone is helpful and informative, and never condescending. It is a scary, scary place.

  • Beej (unregistered) in reply to fruey

    However, the XML spec says that you may use an empty-element tag for any empty element regardless of whether it can contain anything else, so it looks like browsers are in violation if they don't allow <script/>.

    (The spec also notes that you SHOULD only use empty-element tags for tags specifically declared EMPTY so you can be compatible with old parsers.)

  • ButtomCod3r (unregistered) in reply to TopCod3r
    TopCod3r:
    One BIG reason I can think for them making this design decision is for performance. I mean if you have to look for closed tags two different ways, then that is twice as much processing time. Another reason might be cost for development, I mean why would you want to write extra unnecessary code.

    So you might be saying, .NET (or Java) already has an XML library included standard. But again this has distinct disadvantages. First, you have to ask can you trust this implementation. I personally have found more bugs in the .NET framework than I can remember. I wish I had kept a list of them so I could tell you. Second, .NET gives you only a general-purpose XML implementation, so the developer still has to write a lot of processing code... or use XPath, but don't get me started on XPath, it has burned us on multiple occasions, so we have banned it.

    I agree with you that performance can really be an issue, but they should have kept the possibility to understand an empty tag.

    Last week, the company I'm working at had performance problem with their web service - it took 4-5 seconds to perform when not having much clients, and slowed down considerably under load. We used a lot of XML for data transfer between the tiers, so I decided that the .Net XML parser was way too slow. I built a custom parser in a night just for testing. My use of hash-tables and stack-buffers did not only improved the speed, but also made it much more scalable in a multi-cpu environment. Also, when I encountered a "/>", some custom goto-table hijacked the whole process of parsing the empty tag and went into a lightning fast ASM subroutine that pops characters out of the buffer, making using <ABC/> MUCH MUCH more performant than reading <ABC></ABC>. You can't imagine how much time parsers lose on those...

    Turns out, in only one night of intense coding, I've saved almost 50 ms from the 10-20 parsing I did in my tests (against .Net parser), on only one CPU (my machine isn't dual-core yet). That's 1 percent right there, in one day, and it will scale way better on a multi-core processor so maybe even more than 1% in the production environment.

  • Sitten Spynne (unregistered) in reply to Machtyn
    Machtyn:
    Simple notepad++ regexp find and replace for those pesky empty tags... Find what: <([a-zA-Z0-9]*) \/> Replace with: <\1><\\1>
    Excellent, now they don't have to change their program code to comply with the non-standard XML parser!

    I'm sure they'll be eager to hire you hand-edit dozens (possibly hundreds) of files daily! It's so easy!

  • Paula (unregistered)
    <bean wtf="paula" />
  • Machtyn (unregistered) in reply to Sitten Spynne

    oops... I forgot my handy dandy /sarcasm tag

  • Tanj (unregistered) in reply to Keith

    know!

  • (cs) in reply to Robert S. Robbins
    Robert S. Robbins:
    I can think of a valid reason not to use a XML parser. A malicious DTD can cause a DoS attack. It is shocking how many web services and how much software are vulnerable. Any code that parses XML and attempts to process the DTD can be bogged down with the million laughs exploit. For example, SVG is an XML image format and the Opera browser is vulnerable to malicious SVG images.

    So either restrict the DTD that you accept or don't validate the xml in prod (in the parser, anyway). I generally take the second approach, but I also tend to have a limited set of clients or servers to contend with.

  • js (unregistered) in reply to Nerf Herder
    Nerf Herder:
    Kermos:
    Dood:
    Michael responds to let them no that their software hasn't changed in a month

    no?

    You know what is sad? I've gotten so used to people writing "no" for "know" that I don't even notice anymore...

    Its the same with people writing "you" instead of "your". I realize its a different case because they are not homophones, but so many people for whatever reason cannot type "your" correctly.

    I cannot count how many times a week I read something like: "Don't forget to submit you timesheets by Friday"

    Drives me nuts

    How about seperate and then instead of separate and than. "My idea to seperate them was better then his" shudder

  • (cs) in reply to mauhiz
    mauhiz:
    <Carrier> <Interceptor /> <Interceptor /> <Interceptor /> <Interceptor /> <Interceptor /> </Carrrier>

    Crap, now I have to go play some starcraft; carrier rush is a beautiful thing.

  • (cs) in reply to TopCod3r
    TopCod3r:
    Look. The way I understand XML, the way I was taught, it is up to the receiving party to define the standard they are willing to accept, so this means things like which conventions from the spec will apply with any XML conversation you have with them.

    One BIG reason I can think for them making this design decision is for performance. I mean if you have to look for closed tags two different ways, then that is twice as much processing time. Another reason might be cost for development, I mean why would you want to write extra unnecessary code.

    So you might be saying, .NET (or Java) already has an XML library included standard. But again this has distinct disadvantages. First, you have to ask can you trust this implementation. I personally have found more bugs in the .NET framework than I can remember. I wish I had kept a list of them so I could tell you. Second, .NET gives you only a general-purpose XML implementation, so the developer still has to write a lot of processing code... or use XPath, but don't get me started on XPath, it has burned us on multiple occasions, so we have banned it.

    But the bottom like is any decent developer should be able to write a specialized XML library that is easier to use and out-performs the generic XML library included in .NET.

    <snort /> I love you.
  • (cs) in reply to js
    js:
    How about seperate and then instead of separate and than. "My idea to seperate them was better then his" *shudder*
    Can you be more pacific? Never mind, I could care less. And you can take that for granite.
  • (cs) in reply to nikki9696
    nikki9696:
    TopCod3r:
    Look. The way I understand XML, the way I was taught, it is up to the receiving party to define the standard they are willing to accept, so this means things like which conventions from the spec will apply with any XML conversation you have with them.

    One BIG reason I can think for them making this design decision is for performance. I mean if you have to look for closed tags two different ways, then that is twice as much processing time. Another reason might be cost for development, I mean why would you want to write extra unnecessary code.

    So you might be saying, .NET (or Java) already has an XML library included standard. But again this has distinct disadvantages. First, you have to ask can you trust this implementation. I personally have found more bugs in the .NET framework than I can remember. I wish I had kept a list of them so I could tell you. Second, .NET gives you only a general-purpose XML implementation, so the developer still has to write a lot of processing code... or use XPath, but don't get me started on XPath, it has burned us on multiple occasions, so we have banned it.

    But the bottom like is any decent developer should be able to write a specialized XML library that is easier to use and out-performs the generic XML library included in .NET.

    <snort /> I love you.

    All hail topcod3r, king of the trolls!

    [image]
  • John (unregistered) in reply to Addison
    Addison:
    akatherder:
    That probably means they wrote their own homebrew XML parser. That's a smart idea because no languages have any built-in XML support.

    lol you said "homebrew". I love that word.

    Wrong thread, you want the Root beer one

    CAPTCHA: nibh National Institute of Beer Homes?

  • Andrew (unregistered) in reply to JD
    JD:
    You see, this is exactly why we have standards - so idiot vendors can break them and make our lives ten times harder.

    But Bill built an empire by . . .

  • Andrew (unregistered) in reply to js

    Dudes come on

    Its "u" and "ur". Get with the now!

  • (cs)

    I just found out the other day that we have data like this:

    <Element id="whatever"> <Name "some value"/> <OtherName "some other value"/> </Element>

    And, yes, the guy that came up with that also made sure we used our own parser, built on boost::spirit.

    My head nearly exploded.

  • tbrown (unregistered) in reply to Steve
    Steve:
    Ledward:
    In Soviet Russia, tags close you!
    Win.

    Thread over.

    Seconded, this had me rolling when I heard it in my mind with a Boris Badinoff accent!

  • tbrown (unregistered) in reply to NM

    [quote user="NM"][quote]I've seen official parsers barf, or behave in ways you wouldn't expect, on empty tags.[quote]

    I call <bs/>

    Empty tags have been part of XML from the beginning. There is no reason but ignorance / stupidity to barf on them.

    It's even more stupid than that braindead developer in the other room who doesn't understand that you should use 'void *' and not 'char *' when a pointer goes to various types. At least, there was a time (1979?) when that wasn't completely retarded, because 'void' wasn't a part of the C language yet. [/quote]

    Sorry, but <bs> is a content required tag. You must call <bs></bs>!! :-)

  • tbrown (unregistered) in reply to Code Dependent
    Code Dependent:
    Kermos:
    You no what is sad? I've gotten so used to people writing "no" for "know" that I don't even knowtice anymore...
    Fixed that for you.

    Forgot to fix the first one! But, this was well done, I might even go so far as to label it a clbuttic!

  • (cs) in reply to jonnyq
    jonnyq:
    You really shouldn't be allowed to use XML unless you at least know the DOM Level 1 API (that's implemented in every language I've seen) and maybe even some XPath.
    This is not nearly stringent enough. In my ideal development environment, nobody would be allowed to use XML unless they know all of this and also only have one testicle.

    There are many advantages to this stricture. It's gender-neutral, because monorchism is a physical anomaly either way. It would create a fun diversion in the office when crazed XML developers (there is no other kind) either hack off one of their testicles with a pruning fork or else superglue one on. And it would keep the fearsome troll in HR happy. ("Drop those pants! Now! It's Company Policy!")

    But mostly, the advantage is that it would stop people using XML.

  • peterb (unregistered) in reply to fruey
    fruey:
    Native speakers are much more likely to make errors based on things being the same phonetically speaking. Non native speakers are likely to have learned, via translation with their language.

    Example in French:

    their = leur they're = ils sont there = là

    If they translate back, they know which is which without working it out.

    Example of English person speaking French as foreign language

    French mistake phonetically identical "noté" and "noter"

    noté = wrote down noter = to write down

    An Englishman would translate back, and realise which is which by the same token.

    When you translate while speaking or writing a foreign language, you're simply not doing it right.

  • (cs)

    I worked on a project where a vendor was supplying an "XML" feed. I put XML in quotes because it wasn't actually well-formed at all, but it looked kinda like XML so that was good enough for them.

    We actually had a conference call with them and our client where we agreed that the feed would be well-formed XML -- and to verify the standard we all agreed that expat would be the parser of choice.

    I'll never understand how people make invalid XML, given all of the open source and built-in solutions for all modern development platforms.

    I can only imagine they're writing code like this:

    $xml = "<name>$name</name>

    $address
    ";

  • voyou (unregistered) in reply to myname
    myname:
    DES:
    HTML is not XML, it's SGML. XML is (mostly) a subset of SGML. The XML version of HTML, which practically noone uses, is called XHTML.

    from this site's soure:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    From this site's HTTP headers:

    Content-Type: text/html; charset=utf-8

    This site uses HTML, not XHTML. This is just as well, because it's not well-formed, and, if it were XHTML, conforming browsers would have to stop processing the document at the first error.

  • iToad (unregistered)

    This may be a silly question, but is there a single standard set of test case XML files somewhere, that could be used to test XML parsers for compliance to standards. For example, NIST has test files for MD5 and SHA1 hash generators. If your md5sum utility generates the correct hash for each of these test files, then it is correct.

    (http://csrc.nist.gov/archive/ipsec/papers/rfc2202-testcases.txt)

    Use of some sort of industry-standard XML test suite would end a lot of arguments. If your parser handles all of the test cases correctly, it is by definition, correct. If a parser can't handle an XML file that a compliant parser can handle, then the problem is in the parser, and vice versa.

  • tbrown (unregistered) in reply to ThingGuy McGuyThing
    ThingGuy McGuyThing:
    Shouldn't that be:

    Michael responds to them "no, that there software hasn't changed in a month"?

    They're never going to forgive your not using "their" there.

    (captcha: erat -- indeed)

  • Paweł (unregistered) in reply to Me

    When parsing in XML mode they are exactly the same. The fact that XHTML is often served and parsed as plain HTML (or the fact that IE is broken and always parses as HTML) doesn't mean it's according to the spec.

  • kl (unregistered) in reply to Me

    You've confused XHTML with text/html tagsoup sprinkled with slashes.

    Set proper MIME type and try again.

  • G (unregistered)

    XML on its own is useless, there has to be a schema that defines the structure. And the schema might require that instead of <tag /> you write <tag></tag>. Also that someTag must come before otherTag and no other way.

    So the real WTF is they changed the schema without notice (even though it's more probable they have an inbreed something-like-XML parser)

  • Blah (unregistered) in reply to G
    G:
    And the schema might require that instead of <tag /> you write <tag></tag>.

    Is this possible to require in standard XSD? I've never seen it done...

  • Xythar (unregistered)

    Does this mean I have to start writing

    in my HTML? :(

  • (cs) in reply to RTFM
    RTFM:
    I have to hand it to TopCod3r... This guy is the best long-term troll (in the best sense of the term) I've seen since Usenet in the 90s.

    He knows exactly how far to go to get people riled up but not going so far as to give the game away blatantly.

    So, thank you, TopCod3r, for reviving an art that I thought was practically dead!

    I second that!

  • gak (unregistered) in reply to Keith
    Keith:
    I now no why their is know way to no that there are know intelligent programmers working four you're vendor until its to late.

    Does you're head hurt?

    you missed one:

    "... until it's to late."

    I work with people who write like this every day. Its going to kill me at some point.

  • Dave (unregistered)

    So Somebody implemented another <XML Pronounced Less than eXtensible Markup Language

  • voyou (unregistered) in reply to Xythar
    Xythar:
    Does this mean I have to start writing

    in my HTML? :(

    That would be ill-formed;
    is defined in HTML as an empty element, so no end tag is allowed. OTOH,

    is valid XHTML.

  • G (unregistered) in reply to Blah

    XSD doesn't allow that restriction to be defined. Ever heard of DTD?

  • voyou (unregistered) in reply to G
    G:
    XSD doesn't allow that restriction to be defined. Ever heard of DTD?

    I'm pretty sure a DTD can't specify either that you must use <tag></tag>, or that you must use <tag />. How do you think you'ld express that?

  • Blank (unregistered)

    I'm working on a project right now with two subcontractors. Our API basically says "We exchange data with protocol <X> and the data satisfies xml schema <Y>". How many of our subcontractors do you think can actually deliver a client capable of satisfying both these requirements at, say, the third attempt? (Hint: it's not a positive integer.)

  • captain obvious (unregistered)

    Fuck the vendor for not realising <Carrier /> is in fact invalid XML on two counts. Typical developer incompetence

    And to a lower extent, fuck the client for thinking they know enough to mention their "solution" which is also invalid.

  • (cs) in reply to voyou
    voyou:
    G:
    XSD doesn't allow that restriction to be defined. Ever heard of DTD?
    I'm pretty sure a DTD can't specify either that you must use <tag></tag>, or that you must use <tag />. How do you think you'ld express that?
    You could do it if it was an SGML DTD, as that allows control over that sort of thing. But an XML DTD does not (nor does a schema; support for that sort of thing is required as part of the well-formedness requirements which DTDs and schemas can't touch). The only thing anywhere that says "use <tag></tag>" is a bit of the spec that is not normative and which is just a bunch of recommendations on how to write XHTML that old school HTML parsers can probably do something sensible with. All of which means nothing at all to any code that is not doing XHTML.

    Hence it's a genuine WTF, as the tags being talked about most certainly weren't part of the XHTML definition.

  • Insider (unregistered)

    Having worked in the Logistics software business for a while, let me just say that there is a good bit too little understanding of IT standards. In fact, most smaller carriers still seem to think of "this computer thing" as a nuisance that they hope will go away soon again.

    The carriers are more concerned that you don't put any spray cans into your shipment than they are with ensuring their data is consistent. Security? Their drivers get security training, that should do.

    I just wait for the day that somebody is going to badly misuse this. I won't give you any help here, but shipment data submission (via FTP) might be a good place to start looking... ;-)

  • (cs) in reply to John
    John:
    Umm, what's the problem with whitespace before the slash?

    That it doesn't add anything - I understand people doing it for the br tag in XHTML (although I don't understand why people would want to use XHTML), but in XML it really has no value whatsoever (it doesn't even add to the readability).

    (All IMHO, of course.)

  • (cs) in reply to peterb
    peterb:
    fruey:
    Native speakers are much more likely to make errors based on things being the same phonetically speaking. Non native speakers are likely to have learned, via translation with their language.

    Example in French:

    their = leur they're = ils sont there = là

    If they translate back, they know which is which without working it out.

    Example of English person speaking French as foreign language

    French mistake phonetically identical "noté" and "noter"

    noté = wrote down noter = to write down

    An Englishman would translate back, and realise which is which by the same token.

    When you translate while speaking or writing a foreign language, you're simply not doing it right.

    Yes, but while you're learning you generally translate. Even now, when I'm proofreading, I might translate back & forth just to check this kind of error. Mostly it comes naturally, but I'm now fluent in French.

    So when you translate while speaking or writing a foreign language, you're just not fluent yet.

  • (cs) in reply to gak
    gak:
    Keith:
    I now no why their is know way to no that there are know intelligent programmers working four you're vendor until its to late.

    Does you're head hurt?

    you missed one:

    "... until it's to late."

    I work with people who write like this every day. Its going to kill me at some point.

    Actually "its to late" would be correctly incorrect, given that it's supposed to be a contraction of "it is", and should therefore have an apostrophe. FWIW my pet hates at the moment are:

    • 'plez' (no, not 'plz' though that would be just as annoying)
    • 'u' for 'you'
    • lower-case 'i' for 'I' (suggesting u dont think much of urself or ur just 2 lazy 2 press shift).
  • (cs) in reply to fruey
    fruey:
    peterb:
    When you *translate* while speaking or writing a foreign language, you're simply not doing it right.

    Yes, but while you're learning you generally translate. Even now, when I'm proofreading, I might translate back & forth just to check this kind of error. Mostly it comes naturally, but I'm now fluent in French.

    So when you translate while speaking or writing a foreign language, you're just not fluent yet.

    Quite. This process probably helps you to use the right version of "there" etc. reliably too. Although you're no longer consciously translating once you have become fluent, the extra brain-links are still there, not just the links between spellings and sounds that you form when learning your mother tongue.

  • Jobsworth (unregistered) in reply to Nerf Herder
    Nerf Herder:
    Kermos:
    Dood:
    Michael responds to let them no that their software hasn't changed in a month

    no?

    You know what is sad? I've gotten so used to people writing "no" for "know" that I don't even notice anymore...

    Its the same with people writing "you" instead of "your". I realize its a different case because they are not homophones, but so many people for whatever reason cannot type "your" correctly.

    I cannot count how many times a week I read something like: "Don't forget to submit you timesheets by Friday"

    Drives me nuts

    Don't get me started on people that don't know the difference between there/their/they're. And English is only my third language!

    Captcha: praesent

  • Jimmy Wiper (unregistered)

    Wow, dont you just love Substandard, standards! LOL

    Jiff www.anonweb.eu.tc

  • Erik (unregistered)

    Wait until you're worked with companies that think it's a good idea to not just make their own XML-Parser in Java, but also throw in one made in JavaScript for good measure... none of which handles all the aspects of XML of course... Oh, and both mixed freely...

    I can only imagine what they'll come up with when they upgrades to Java 6 (where you can run JavaScript inside of a Java program, wee hoo!)

    Well, well, one day when the period for prosecution has expired on those crimes, I might just post it to this site...

  • Dan (unregistered) in reply to Keith

    That was actually painful to read

  • (cs)

    I fear that some code over here would break in a similar manner. One of my coworkers seemed to be oblivious to the fact that Java has XML parsers, so his genious (sic) solution was worthy of the main page. Long story short, lets say that even changing the tag order will break his code. Or adding a new attribute.

    Hell, I might actually submit the whole thing as a WTF!

Leave a comment on “The Substandard Standard”

Log In or post as a guest

Replying to comment #:

« Return to Article