• Prime Mover (unregistered)

    I do so love a story where Perl and regexps are the heroes.

  • LCrawford (unregistered)

    When you work for Nunich and are on a tight deadline, the frist tool in the toolbox is single-element-SOAP.

  • RFlaum (unregistered)

    Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.

  • RFlaum (unregistered)

    Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.

  • RFlaum (unregistered) in reply to RFlaum

    And because of stupidity issues on my part, there's an extraneous copy of this comment.

  • (nodebb)

    TRWTF is taking 5 minutes to parse a 20 megabyte XML file. The order of magnitude would be correct for a 20 GB file instead.

  • (nodebb)

    This is terrible but can be solved with a text reader which un-escapes the inner XML and forwards it to an XML reader. I doubt Perl has those facilities, though.

  • (nodebb)

    "TRWTF is taking 5 minutes to parse a 20 megabyte XML file." ... when I started, data was 10 characters per inch...63,360 inches per mile... so 315.65 miles of data.... with the PC09 (high speed reader, doing 300 bytes per second.... 18.51 hours would be about right.....

    [Of course, there was no such thing as XML at the time.... but still a measure of how far we have come....]

  • (author) in reply to RFlaum

    TIL that our CMS will eat the encoded characters inside of HTML comments for some reason. That's… a choice.

  • sokobahn (unregistered) in reply to Planar

    In 2003 when XML was all rage and Windows XP ran on 512 Mb, 20 Mb RAM was pretty decent file size.

    JVM would get 64-128Mb (looks like Yazeran was benchmarking his code on his workstation), byte buffer... ASCII bytes to 16bit Java chars... I hear swapping is beginning... Yes, 20 Mb XML could take 5 minutes to parse.

  • richarson (unregistered) in reply to Prime Mover

    Obligatory XKCD :)

    https://xkcd.com/208/

  • (nodebb)

    This gives me flashbacks to a SOAP API I used to work with for a Big Expensive SaaS Product.

    It had a single method, into which you passed various different XML strings depending on what you wanted it to do, and it returned various other strings of XML when it worked. When it didn't work, it returned a .net stack trace and an HTTP 500 status.

  • ZZartin (unregistered)

    XML is large and bureaucratic and complicated, but that complexity comes with benefits- namespaces, schemas, validation, and so on.

    Yep so beneficial that XML is being replaced with JSON.

  • (nodebb) in reply to Planar

    TRWTF is taking 5 minutes to parse a 20 megabyte XML file. The order of magnitude would be correct for a 20 GB file instead.

    That really depends on when we're talking about; computers are a lot faster now than they used to be. Also, most of the speed in modern XML handling comes from not needing to hold the whole lot in memory at once; having a big document in encoded form inside another one is just horrible. (It may well have been further wrapped in the SOAP envelope stuff, but that at least isn't using encoded payloads.) To cap it all, some of the older SOAP implementations were awfully slow due to doing some really stupid things; for example, the original version of Apache Axis was truly awful as soon as you started doing any kind of DOM writing. Fortunately, nobody sane uses it any more.

  • Tim (unregistered)

    If you live in the UK and want to know about inefficient data encoding methods, just scan one of those COVID check-in qr codes with a standard qr code reader

  • Randal L. Schwartz (google) in reply to Prime Mover

    I've made quite a good living over the decades because of this.

  • Randal L. Schwartz (google)

    If you ever ask a question, and part of the answer is "SOAP" or "XML", you've asked the wrong question. :)

  • Duke of New York (unregistered)

    I'm glad when a story that involves Perl begins "Many years ago"

  • (author) in reply to ZZartin

    To be fair, most of the protocols and technologies around JSON are basically attempts to reinvent most of XML's functionality badly. XML wasn't good, but doing ad-hoc XML in JSON isn't strictly better.

    Well, at least it's not YAML, I suppose,

  • (nodebb)

    Original submitter here.

    It was not 'many years ago', but only 5 years ago and the computer I used was a minimum configuration virtual server (as the minimum configuration at the time I requested it was plenty for the intended workload - a small Postgresql database with a Apache frontend and only for development work. (In production it went a lot faster as the server for that was a more powerful virtual server configuration).

    And to add injury to insult, that web-service is still running in production and still returning that XML in XML monstrosity (although now I can parse and load the response it a lot faster when I need to update my data once a week as the server hardware has improved in the last 5 years)....

    Yazeran

  • Kythyria (unregistered) in reply to ZZartin

    I wondered when one of these comments would happen (I'm amazed nobody has posted that rant about how you can't recognise a regular language using regular expressions if it's part of a context-sensitive language).

    The advantages JSON has over XML are 1) lower overhead before compression, 2) data model that fits automatic serialisation better, and specifically JS serialisation at that.

    The lack of tooling is only an "advantage" if you're the kind of person who is scared of type annotations (but then writes comments that formalised anyway). So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented.

  • (nodebb) in reply to Yazeran1

    Well ... five years ago you could have used XML::Rules for the inner XML. You set a few rules and you get a trimmed down datastructure with way lower memory footprint than DOM style parsers and way less complexity than SAX style parsers. (Yes, the original author here.)

  • Staticsan (unregistered)

    Reminds me of working with another dev in a previous job to build a SOAP RPC call between two systems, his and mine. It was going well until I had to escape < and > characters. Took a little bit of explaining that he needed to run the de-escape mechanism after pulling the data out of the XML, not before... I lost quite some respect for my colleague's skillset when that happened.

  • xtal256 (unregistered) in reply to Kythyria

    "So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented."

    It is my opinion that every time something gets re-invented it ends up worse than before.

  • aalien (unregistered)

    I'd say it more closely corresponds to writing a letter, cutting out the invididual letters and gluing it back together like you're some serial killer from a movie, and then stuffing everything into a box, just so that the receiver can cry out: "What's in the boooox?!"

  • Kayaman (unregistered)

    So I wrapped my XML in XML, which was the style at the time. Now, to get the data I needed required a SOAP call, and in those days, webservices were poorly designed. "Gimme all the data in an XML wrapped in an XML," you'd say.

  • (nodebb)

    Poor Munich gets misspelled as Nunich at least once, and moved far to the south - Coordinate x="45.73155848600081" y="11.395289797465072"

  • Charles (unregistered) in reply to Randal L. Schwartz

    If you ever ask a question, and part of the answer is "SOAP" or "XML", you may have asked the right question but addressed it to the wrong person.

  • Officer Johnny Holzkopf (unregistered) in reply to BernieTheBernie

    If you mean München, use "München", not "Munich". It's not just a matter of Unicode / UTF-8 being available everywhere and also being the de-facto standard encoding for everything, but surely a matter of respect, or do you see Germans running around and saying (and spelling!) things like "Voar-shing-tonn" or "Noy York" or "President Bee-den" or "Shtar Varrs"? NB: The cat in bag.

  • (nodebb) in reply to Officer Johnny Holzkopf

    How about you first convinced the people at Das offizielle Stadtportal

    https://www.muenchen.de/int/en/traffic/public-transport.html

  • nana (unregistered)
    Comment held for moderation.
  • #Buy_Verified_PayPal_Accounts (unregistered)
    Comment held for moderation.
  • Officer Johnny Holzkopf (unregistered) in reply to Jenda
    Comment held for moderation.
  • (nodebb) in reply to Officer Johnny Holzkopf

    Right, except all the languages in the world where keyboard doesn't have that u with dots. It's not just unicorn. Also, most languages have names for foreign geographic entities that are different from that entity's mother tongue. Or do you expect the whole world to spell and pronounce Deutschland the German, sorry the Deutsche, way? We can't say Germany no more either?

  • (nodebb) in reply to Officer Johnny Holzkopf

    It's called an exonym and totally not an English-only phenomenon: Germans call e.g. the city of Lviv (Львів) that's often on the news in recent days Lemberg and if you look at the same section of the world via different language maps, you'll find the habit to coin names is quite universal.

  • (nodebb) in reply to tjahns

    I'm almost afraid to tell Mr. Officer that we call that city Mnichov, the capital Berlín and I'll let the kind reader find Brémy, Norimberk, Drážďany and Lipsko on his or her own. :-)

    But of course there is no Prag and no Pilsen!

Leave a comment on “Double Bagged”

Log In or post as a guest

Replying to comment #:

« Return to Article