• (cs)

    Brillant.

  • (cs)

    But, but, what if someone uses gasp ...

    [ <tag> s p a c e s </tag> ]

    ?

  • Nick J (unregistered)

    <comment>This will never work</comment>

  • floorpie (unregistered)

    My eyes! The goggles do nothing!

  • PACE (unregistered)

    Not only on one line, they also have to be in order. "But... we've got XML!"

  • anon (unregistered)

    this is arguably better than the second link in the article. UGH

    captcha: consequence

  • ex-Pizza delivery man (unregistered)

    <xmlcomment1>Efficient XML '</xmlcomment1> <xmlcomment2></xmlcomment2> <xmlcomment3></xmlcomment3> <xmlcomment4></xmlcomment4> <xmlcomment5></xmlcomment5> <xmlcomment6></xmlcomment6> <xmlcomment7></xmlcomment7> <xmlcomment8></xmlcomment8> <xmlcomment9></xmlcomment9> <xmlcomment10></xmlcomment10> <xmlcomment11></xmlcomment11> <xmlcomment12></xmlcomment12> <xmlcomment13></xmlcomment13> <xmlcomment14></xmlcomment14> <xmlcomment15></xmlcomment15> <xmlcomment16></xmlcomment16> <xmlcomment17></xmlcomment17> <xmlcomment18></xmlcomment18> <xmlcomment19></xmlcomment19> <xmlcomment20></xmlcomment20>

  • (cs)

    I particularly liked the sNot variable, although here I think they've got the case-sensitivity wrong.

  • occasional reader (unregistered) in reply to real_aardvark

    How can you like the sNot variable and miss the sExact variable entirely!!!

    Its far more promiscuous

  • cobratbq (unregistered)

    Is it me, or is this thing not even valid XML since it's missing the root node?

  • cobratbq (unregistered) in reply to cobratbq

    Sorry, i missed the sLine... that one's probably the root and the closing tag is just ignored :P

  • (cs) in reply to cobratbq
    cobratbq:
    Is it me, or is this thing not even valid XML since it's missing the root node?

    If I'm groking the code right, then sLine is only read in order to get the "root" element.

  • (cs) in reply to cobratbq
    cobratbq:
    Is it me, or is this thing not even valid XML since it's missing the root node?

    It's sNot XML...

  • (cs) in reply to occasional reader
    occasional reader:
    How can you like the sNot variable and miss the sExact variable entirely!!!

    Its far more promiscuous

    They mistpeled sLime and sUction...

  • (cs)

    This was clearly done with the intention of being more efficient than a regular xml parser. So the real wtf is:

    sExact = sExact.Replace("<exact>", "") sExact = sExact.Replace("</exact>", "")

    instead of

    sExact = sExact.Substring(8, Len(sExact)-9);

    Way more efficient, and more readable.

  • FDF (unregistered) in reply to java.lang.Chris;
    java.lang.Chris;:
    cobratbq:
    Is it me, or is this thing not even valid XML since it's missing the root node?

    If I'm groking the code right, then sLine is only read in order NOT to get the "root" element.

    fixed that for ya
  • Proof (unregistered)

    "If you consider yourself a senior level xml developer" - you'll be happy to know you can go for a promotion - they have a junior level VB developer position opening tomorrow.

  • (cs) in reply to occasional reader
    occasional reader:
    How can you like the sNot variable and miss the sExact variable entirely!!!

    Its far more promiscuous

    In which case, sPosition = 69?

  • (cs)

    To be fair, all XML-parsing techniques are Case-Sensitive, because XML itself is.

  • fruey (unregistered)

    Looks to me like a classic case of notimeleft programming to me. You've tried to use XML libraries, haven't understood, or been hit by strangeness (PHP XML libraries, anyone?). So you realise that the file is standard, and is only being done in XML because someone thought it should be. So you write a piece of crap which parses the file as you need it.

    CSV would have been adequate here, since there probably isn't anyone else trying to parse the same XML file.

    TRWTF is that the file to be parsed was in XML in the first place.

  • Paul (unregistered)

    Hey, thanks for posting that, I got to tell you that this guy was a relative of the big boss, and we had some difficulties letting him go, he did eventually. He keeps on calling me trying to get his job back.

    The whole program never worked and was re-written from scratch few month after. sNot as a variable name is cool.

  • SlyEcho (unregistered)

    It just needs a little more work to be perfect:

    sNot = sNot.Replace("&", "&");
    sNot = sNot.Replace("<", "<");
    sNot = sNot.Replace(">", ">");
    sNot = sNot.Replace("'", "'");
    sNot = sNot.Replace(""", "\"");
    

    And so on...

  • (cs)

    </exact>Do<exact> </not>valid<not> </case>xml<case> </condition>,<condition> </position>Yoda<position> </action>can..<action>

  • (cs) in reply to Paul
    Paul:
    Hey, thanks for posting that, I got to tell you that this guy was a relative of the big boss, and we had some difficulties letting him go, he did eventually. He keeps on calling me trying to get his job back.

    The whole program never worked and was re-written from scratch few month after. sNot as a variable name is cool.

    MERYL, how do I do XML? grumble

  • (cs)
  • MooseBrains (unregistered)

    I still prefer the method my ex-employer used to parse XML files. A "spider" would crawl the document tree, parsing "unwieldy" XML like this:

    <there>
        <was>Something</was>
        <in the="air">tonight</in>
    </there>
    

    into the much more readable:

    there.was=Something
    there.in.the=air
    there.in=tonight
    
  • Proof (unregistered) in reply to MooseBrains
    MooseBrains:
    I still prefer the method my ex-employer used to parse XML files. A "spider" would crawl the document tree, parsing "unwieldy" XML like this:
    <there>
        <was>Something</was>
        <in the="air">tonight</in>
    </there>
    

    into the much more readable:

    there.was=Something
    there.in.the=air
    there.in=tonight
    

    "there was Something there in the air there in tonight"

    is not more readable than

    "there was Something was in the air tonight in there"

  • (cs) in reply to occasional reader
    occasional reader:
    How can you like the sNot variable and miss the sExact variable entirely!!!

    Its far more promiscuous

    It makes me wonder where the sHit variable is. Oh yeah, it's the design method.

  • (cs) in reply to MooseBrains
    MooseBrains:
    I still prefer the method my ex-employer used to parse XML files. A "spider" would crawl the document tree, parsing "unwieldy" XML like this:
    <there>
        <was>Something</was>
        <in the="air">tonight</in>
    </there>
    

    into the much more readable:

    there.was=Something
    there.in.the=air
    there.in=tonight
    
    (this is to fix the stupid broken forum software)
    I have a problem with this example (apart from the obvious, that is). The output is ambiguous. While it could describe the xml snippet, surely it could also describe:
    <there>
      <was>Something</was>
      <in>
        <the>air</the>
        tonight
      </in>
    </there>
    Shouldn't an XML parsing process be reversible? I mean, this would be a bizarre way to deal with xml anyway, but if it didn't distinguish between attributes and content then that would seem a bigger WTF...
  • Dan (unregistered) in reply to fruey

    The SimpleXML library is pretty awesome, though.

  • (cs)

    This one wins hands-down on WTF-iness over my first "SOAP Parser". It actually used the XML Parser implementation in Java ... then discarded the whole SOAP metadata and extracted only the contents. And expected only certain tags. Look at it cross-eyed and it would barf. Ow!

  • (cs) in reply to danixdefcon5
    danixdefcon5:
    This one wins hands-down on WTF-iness over my first "SOAP Parser". It actually used the XML Parser implementation in Java ... then discarded the whole SOAP metadata and extracted only the contents. And expected only certain tags. Look at it cross-eyed and it would barf. Ow!
    You can go even more WTF-y than that. Just implement the whole of WS-Addressing, WS-Security, and WS-MetadataExchange. And use it for high-performance video streaming...
  • al3 (unregistered)

    I'll bet the guy who created XPATH is feeling pretty silly right about now

  • MooseBrains (unregistered) in reply to JimM
    JimM:
    MooseBrains:
    I still prefer the method my ex-employer used to parse XML files. A "spider" would crawl the document tree, parsing "unwieldy" XML like this:
    <there>
        <was>Something</was>
        <in the="air">tonight</in>
    </there>
    

    into the much more readable:

    there.was=Something
    there.in.the=air
    there.in=tonight
    
    (this is to fix the stupid broken forum software)
    I have a problem with this example (apart from the obvious, that is). The output is ambiguous. While it could describe the xml snippet, surely it could also describe:
    <there>
      <was>Something</was>
      <in>
        <the>air</the>
        tonight
      </in>
    </there>
    Shouldn't an XML parsing process be reversible? I mean, this would be a bizarre way to deal with xml anyway, but if it didn't distinguish between attributes and content then that would seem a bigger WTF...

    Which was pretty much my point. It's a pointless parsing step that, if anything, makes parsing slower and just provides more opportunities for things to go wrong.

  • Jay (unregistered)

    If this piece of code worked, then surely the real WTH is why the file is created using XML to begin with. If the file always consists of the same five data elements in a fixed order, what possible gain was there from wrapping them in XML? Besides, that is, to satsify some simpleminded rule that all data streams must be in XML.

    This is the same objection I have to about 90% of the uses of XML. Yes, if you have a very complex data stream, where data elements car occur in unpredictable order, and data elements can be embedded inside other data elements, then XML is beautiful. Like, say, in a word processing document, where the user could at any point want to insert italics or a footnote or a footnote with italics.

    But how did we get from, "This tool is useful for a small class of complex problems, where there is no simple solution and so we must use a complex solution," to "Let's use an awkard, complex solution for even the most simple problems!" Yes, it can be made to work, but the price is huge for what benefit?

    When Exxon writes a multi-billion dollar oil contract with Saudi Arabia, I'm sure they need a team of engineers and geologists and lawyers and linguists and cross-cultural specialists to get it all hammered out. It's a complicated process that requires a complicated solution. Does this mean that I should bring such a team with me to McDonald's to write a contract for me to buy a hamburger? If it works for Exxon, it should work for me, right?

  • jl (unregistered)
    <Fail />
  • Edward Royce (unregistered) in reply to al3
    al3:
    I'll bet the guy who created XPATH is feeling pretty silly right about now

    Don't forget XQuery.

  • (cs)

    Even if we give this guy the benefit of the doubt and we assume that this XML was forced on him, it's still a really stupid way to do this. For the record, I suggest:

    private static String extract(String rawXML)
    {
      int start = rawXML.indexOf(">") + 1;
      int end = rawXML.indexOf("<", start);
      return rawXML.substring(start, end);
    }
    
  • (cs)

    This is why you should store data in an XLS file, not XML.

  • (cs) in reply to MooseBrains
    MooseBrains:
    I still prefer the method my ex-employer used to parse XML files. A "spider" would crawl the document tree, parsing "unwieldy" XML like this:
    <there>
        <was>Something</was>
        <in the="air">tonight</in>
    </there>
    

    into the much more readable:

    there.was=Something
    there.in.the=air
    there.in=tonight
    
    The.stars.were.bright=Fernando
  • Mr.'; Drop Database -- (unregistered)
    <comment>
    	<sentence type="question">
    		<interjection>
    			<word><character>W</character><character>h</character><character>a</character><character>t</character></word>
    			<word><character>t</character><character>h</character><character>e</character></word>
    			<word><character>f</character><character>u</character><character>c</character><character>k</character></word>
    			<punctuation>?</punctuation>
    		</interjection>
    	</sentence>
    </comment>
  • Franz Kafka (unregistered) in reply to Jay
    Jay:
    But how did we get from, "This tool is useful for a small class of complex problems, where there is no simple solution and so we must use a complex solution," to "Let's use an awkard, complex solution for even the most simple problems!" Yes, it can be made to work, but the price is huge for what benefit?

    For a lot of cases, it's easier to use a xml parser/emitter that it is to come up with some sort of file format.

  • Anonymouse (unregistered)

    So if they were to create a string with the exact position, would they name it sExPosition?

  • (cs)

    Okay, I'm feeling very stupid right now! I don't understand what the goal of this program was.

    From what I can see, it's just stripping out XML tags. How does that help parse the data within? Wouldn't this, in fact, make it impossible to parse the data?

    Or did I forget to take my common-sense pills again?

  • phexitol (unregistered) in reply to Paul
    Paul:
    I got to tell you that this guy was a relative of the big boss...

    I'm sure he'll do ok, I heard he's helping a computer system called GW, for a company named Outer Haven...

  • phexitol (unregistered) in reply to phexitol

    ...helping develop

  • Paul (unregistered) in reply to donniel
    donniel:
    Okay, I'm feeling very stupid right now! I don't understand what the goal of this program was.

    From what I can see, it's just stripping out XML tags. How does that help parse the data within?

    It IS parsing the XML

    Imagine XML input

    <Rule>
     <Exact>maybe</Exact>
     <Not>so</Not>
     <Case>eh?</Case>
     <Condition>chronic</Condition>
     <Position>onTop</Position>
     <Action>ooh,err?</Action>
    </Rule>

    After running the program, you have sExact = "maybe" sNot = "so" sCase = "eh?" sCondition = "chronic" sPosition = "onTop" sAction = "ooh,err?"

    So, it's an XML parser. Much more compact than the other things you'll find around, and it works perfectly...

    ;)

  • (cs) in reply to Paul
    Paul:
    donniel:
    Okay, I'm feeling very stupid right now! I don't understand what the goal of this program was.

    From what I can see, it's just stripping out XML tags. How does that help parse the data within?

    It IS parsing the XML

    ... </Rule>[/code]

    After running the program, you have sExact = "maybe" ... sAction = "ooh,err?"

    So, it's an XML parser. Much more compact than the other things you'll find around, and it works perfectly...

    ;)

    That's...brillant.

    I'm gonna gouge out my eyes and go cry myself to sleep now.

  • Robert S. Robbins (unregistered)

    You could just convert the XML to JSON and then strip out the brackets and curly braces.

  • Jay (unregistered) in reply to Franz Kafka
    Franz Kafka:
    Jay:
    But how did we get from, "This tool is useful for a small class of complex problems, where there is no simple solution and so we must use a complex solution," to "Let's use an awkard, complex solution for even the most simple problems!" Yes, it can be made to work, but the price is huge for what benefit?

    For a lot of cases, it's easier to use a xml parser/emitter that it is to come up with some sort of file format.

    I'm not suggesting that one should invent an entirely new formatting scheme for every problem. Rather, that one should use simple formats for simple problems and reserve complex formats for complex problems.

    I think a very large set of problems could be solved with CSV, Java properties files, and Web form variables. And those are all way simpler for a human reader to comprehend, and can be parsed with a few lines of code. Sure, I can use a Saxon library or whatever to parse XML, it's not likely I have to write a parser from scratch. But that's like saying, Hey, it doesn't matter that the door to our store is 30 feet off the ground: we're happy to provide anyone who asks with a ladder.

Leave a comment on “Rigid XML Parsing”

Log In or post as a guest

Replying to comment #:

« Return to Article