The Daily WTF: Curious Perversions in Information Technology

2022-02-24 Reply Admin

I do so love a story where Perl and regexps are the heroes.

2022-02-24 Reply Admin

When you work for Nunich and are on a tight deadline, the frist tool in the toolbox is single-element-SOAP.

2022-02-24 Reply Admin

Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.

2022-02-24 Reply Admin

Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.

2022-02-24 Reply Admin

And because of stupidity issues on my part, there's an extraneous copy of this comment.

Planar · 2022-02-24 Reply Admin

TRWTF is taking 5 minutes to parse a 20 megabyte XML file. The order of magnitude would be correct for a 20 GB file instead.

Mr. TA · 2022-02-24 Reply Admin

This is terrible but can be solved with a text reader which un-escapes the inner XML and forwards it to an XML reader. I doubt Perl has those facilities, though.

TheCPUWizard · 2022-02-24 Reply Admin

"TRWTF is taking 5 minutes to parse a 20 megabyte XML file." ... when I started, data was 10 characters per inch...63,360 inches per mile... so 315.65 miles of data.... with the PC09 (high speed reader, doing 300 bytes per second.... 18.51 hours would be about right.....

[Of course, there was no such thing as XML at the time.... but still a measure of how far we have come....]

Remy Porter · 2022-02-24 Reply Admin

TIL that our CMS will eat the encoded characters inside of HTML comments for some reason. That's… a choice.

2022-02-24 Reply Admin

In 2003 when XML was all rage and Windows XP ran on 512 Mb, 20 Mb RAM was pretty decent file size.

JVM would get 64-128Mb (looks like Yazeran was benchmarking his code on his workstation), byte buffer... ASCII bytes to 16bit Java chars... I hear swapping is beginning... Yes, 20 Mb XML could take 5 minutes to parse.

2022-02-24 Reply Admin

Obligatory XKCD :)

https://xkcd.com/208/

MobyDuck · 2022-02-24 Reply Admin

This gives me flashbacks to a SOAP API I used to work with for a Big Expensive SaaS Product.

It had a single method, into which you passed various different XML strings depending on what you wanted it to do, and it returned various other strings of XML when it worked. When it didn't work, it returned a .net stack trace and an HTTP 500 status.

2022-02-24 Reply Admin

XML is large and bureaucratic and complicated, but that complexity comes with benefits- namespaces, schemas, validation, and so on.

Yep so beneficial that XML is being replaced with JSON.

dkf · 2022-02-24 Reply Admin

TRWTF is taking 5 minutes to parse a 20 megabyte XML file. The order of magnitude would be correct for a 20 GB file instead.

That really depends on when we're talking about; computers are a lot faster now than they used to be. Also, most of the speed in modern XML handling comes from not needing to hold the whole lot in memory at once; having a big document in encoded form inside another one is just horrible. (It may well have been further wrapped in the SOAP envelope stuff, but that at least isn't using encoded payloads.) To cap it all, some of the older SOAP implementations were awfully slow due to doing some really stupid things; for example, the original version of Apache Axis was truly awful as soon as you started doing any kind of DOM writing. Fortunately, nobody sane uses it any more.

2022-02-24 Reply Admin

If you live in the UK and want to know about inefficient data encoding methods, just scan one of those COVID check-in qr codes with a standard qr code reader

2022-02-24 Reply Admin

I've made quite a good living over the decades because of this.

2022-02-24 Reply Admin

If you ever ask a question, and part of the answer is "SOAP" or "XML", you've asked the wrong question. :)

2022-02-24 Reply Admin

I'm glad when a story that involves Perl begins "Many years ago"

Remy Porter · 2022-02-24 Reply Admin

To be fair, most of the protocols and technologies around JSON are basically attempts to reinvent most of XML's functionality badly. XML wasn't good, but doing ad-hoc XML in JSON isn't strictly better.

Well, at least it's not YAML, I suppose,

Yazeran1 · 2022-02-24 Reply Admin

Original submitter here.

It was not 'many years ago', but only 5 years ago and the computer I used was a minimum configuration virtual server (as the minimum configuration at the time I requested it was plenty for the intended workload - a small Postgresql database with a Apache frontend and only for development work. (In production it went a lot faster as the server for that was a more powerful virtual server configuration).

And to add injury to insult, that web-service is still running in production and still returning that XML in XML monstrosity (although now I can parse and load the response it a lot faster when I need to update my data once a week as the server hardware has improved in the last 5 years)....

Yazeran

2022-02-24 Reply Admin

I wondered when one of these comments would happen (I'm amazed nobody has posted that rant about how you can't recognise a regular language using regular expressions if it's part of a context-sensitive language).

The advantages JSON has over XML are 1) lower overhead before compression, 2) data model that fits automatic serialisation better, and specifically JS serialisation at that.

The lack of tooling is only an "advantage" if you're the kind of person who is scared of type annotations (but then writes comments that formalised anyway). So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented.

Jenda · 2022-02-24 Reply Admin

Well ... five years ago you could have used XML::Rules for the inner XML. You set a few rules and you get a trimmed down datastructure with way lower memory footprint than DOM style parsers and way less complexity than SAX style parsers. (Yes, the original author here.)

2022-02-24 Reply Admin

Reminds me of working with another dev in a previous job to build a SOAP RPC call between two systems, his and mine. It was going well until I had to escape < and > characters. Took a little bit of explaining that he needed to run the de-escape mechanism after pulling the data out of the XML, not before... I lost quite some respect for my colleague's skillset when that happened.

2022-02-24 Reply Admin

"So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented."

It is my opinion that every time something gets re-invented it ends up worse than before.

2022-02-25 Reply Admin

I'd say it more closely corresponds to writing a letter, cutting out the invididual letters and gluing it back together like you're some serial killer from a movie, and then stuffing everything into a box, just so that the receiver can cry out: "What's in the boooox?!"

2022-02-25 Reply Admin

So I wrapped my XML in XML, which was the style at the time. Now, to get the data I needed required a SOAP call, and in those days, webservices were poorly designed. "Gimme all the data in an XML wrapped in an XML," you'd say.

BernieTheBernie · 2022-02-25 Reply Admin

Poor Munich gets misspelled as Nunich at least once, and moved far to the south - Coordinate x="45.73155848600081" y="11.395289797465072"

2022-02-25 Reply Admin

If you ever ask a question, and part of the answer is "SOAP" or "XML", you may have asked the right question but addressed it to the wrong person.

2022-02-26 Reply Admin

If you mean München, use "München", not "Munich". It's not just a matter of Unicode / UTF-8 being available everywhere and also being the de-facto standard encoding for everything, but surely a matter of respect, or do you see Germans running around and saying (and spelling!) things like "Voar-shing-tonn" or "Noy York" or "President Bee-den" or "Shtar Varrs"? NB: The cat in bag.

Jenda · 2022-02-26 Reply Admin

How about you first convinced the people at Das offizielle Stadtportal

https://www.muenchen.de/int/en/traffic/public-transport.html

2022-02-28 Reply Admin

How about the authors of München's "Das offizielle Stadtportal" looked up their official URL (muenchen.de) as well as what they've already written on the german pages? Same for Nürnberg - "Nuremburg" (nuernberg.de). But hey, let's arbitrarily exchange letters from cities of other countries, for example Londem and Liwerpul in Grät Britain, Waschingten and Neu Jorg in the Junited States of Amerika, and provide them with a german language variant for their official homepages where the inhabitants of those can read how different countries like to translate their city names. Maybe it makes them feel more international?

Mr. TA · 2022-03-04 Reply Admin

Right, except all the languages in the world where keyboard doesn't have that u with dots. It's not just unicorn. Also, most languages have names for foreign geographic entities that are different from that entity's mother tongue. Or do you expect the whole world to spell and pronounce Deutschland the German, sorry the Deutsche, way? We can't say Germany no more either?

tjahns · 2022-03-07 Reply Admin

It's called an exonym and totally not an English-only phenomenon: Germans call e.g. the city of Lviv (Львів) that's often on the news in recent days Lemberg and if you look at the same section of the world via different language maps, you'll find the habit to coin names is quite universal.

Jenda · 2022-03-07 Reply Admin

I'm almost afraid to tell Mr. Officer that we call that city Mnichov, the capital Berlín and I'll let the kind reader find Brémy, Norimberk, Drážďany and Lipsko on his or her own. :-)

But of course there is no Prag and no Pilsen!

Double Bagged

Leave a comment on “Double Bagged”