- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I do so love a story where Perl and regexps are the heroes.
Admin
When you work for Nunich and are on a tight deadline, the frist tool in the toolbox is single-element-SOAP.
Admin
Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.
Admin
Because of escaping issues in the "easy reader version" comment, there's an extraneous --> at the end of the article.
Admin
And because of stupidity issues on my part, there's an extraneous copy of this comment.
Admin
TRWTF is taking 5 minutes to parse a 20 megabyte XML file. The order of magnitude would be correct for a 20 GB file instead.
Admin
This is terrible but can be solved with a text reader which un-escapes the inner XML and forwards it to an XML reader. I doubt Perl has those facilities, though.
Admin
"TRWTF is taking 5 minutes to parse a 20 megabyte XML file." ... when I started, data was 10 characters per inch...63,360 inches per mile... so 315.65 miles of data.... with the PC09 (high speed reader, doing 300 bytes per second.... 18.51 hours would be about right.....
[Of course, there was no such thing as XML at the time.... but still a measure of how far we have come....]
Admin
TIL that our CMS will eat the encoded characters inside of HTML comments for some reason. That's… a choice.
Admin
In 2003 when XML was all rage and Windows XP ran on 512 Mb, 20 Mb RAM was pretty decent file size.
JVM would get 64-128Mb (looks like Yazeran was benchmarking his code on his workstation), byte buffer... ASCII bytes to 16bit Java chars... I hear swapping is beginning... Yes, 20 Mb XML could take 5 minutes to parse.
Admin
Obligatory XKCD :)
https://xkcd.com/208/
Admin
This gives me flashbacks to a SOAP API I used to work with for a Big Expensive SaaS Product.
It had a single method, into which you passed various different XML strings depending on what you wanted it to do, and it returned various other strings of XML when it worked. When it didn't work, it returned a .net stack trace and an HTTP 500 status.
Admin
Yep so beneficial that XML is being replaced with JSON.
Admin
That really depends on when we're talking about; computers are a lot faster now than they used to be. Also, most of the speed in modern XML handling comes from not needing to hold the whole lot in memory at once; having a big document in encoded form inside another one is just horrible. (It may well have been further wrapped in the SOAP envelope stuff, but that at least isn't using encoded payloads.) To cap it all, some of the older SOAP implementations were awfully slow due to doing some really stupid things; for example, the original version of Apache Axis was truly awful as soon as you started doing any kind of DOM writing. Fortunately, nobody sane uses it any more.
Admin
If you live in the UK and want to know about inefficient data encoding methods, just scan one of those COVID check-in qr codes with a standard qr code reader
Admin
I've made quite a good living over the decades because of this.
Admin
If you ever ask a question, and part of the answer is "SOAP" or "XML", you've asked the wrong question. :)
Admin
I'm glad when a story that involves Perl begins "Many years ago"
Admin
To be fair, most of the protocols and technologies around JSON are basically attempts to reinvent most of XML's functionality badly. XML wasn't good, but doing ad-hoc XML in JSON isn't strictly better.
Well, at least it's not YAML, I suppose,
Admin
Original submitter here.
It was not 'many years ago', but only 5 years ago and the computer I used was a minimum configuration virtual server (as the minimum configuration at the time I requested it was plenty for the intended workload - a small Postgresql database with a Apache frontend and only for development work. (In production it went a lot faster as the server for that was a more powerful virtual server configuration).
And to add injury to insult, that web-service is still running in production and still returning that XML in XML monstrosity (although now I can parse and load the response it a lot faster when I need to update my data once a week as the server hardware has improved in the last 5 years)....
Yazeran
Admin
I wondered when one of these comments would happen (I'm amazed nobody has posted that rant about how you can't recognise a regular language using regular expressions if it's part of a context-sensitive language).
The advantages JSON has over XML are 1) lower overhead before compression, 2) data model that fits automatic serialisation better, and specifically JS serialisation at that.
The lack of tooling is only an "advantage" if you're the kind of person who is scared of type annotations (but then writes comments that formalised anyway). So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented.
Admin
Well ... five years ago you could have used XML::Rules for the inner XML. You set a few rules and you get a trimmed down datastructure with way lower memory footprint than DOM style parsers and way less complexity than SAX style parsers. (Yes, the original author here.)
Admin
Reminds me of working with another dev in a previous job to build a SOAP RPC call between two systems, his and mine. It was going well until I had to escape < and > characters. Took a little bit of explaining that he needed to run the de-escape mechanism after pulling the data out of the XML, not before... I lost quite some respect for my colleague's skillset when that happened.
Admin
"So because schema languages, validators, serialisers for !JS, API frameworks, etc, are all useful things, they got reinvented."
It is my opinion that every time something gets re-invented it ends up worse than before.
Admin
I'd say it more closely corresponds to writing a letter, cutting out the invididual letters and gluing it back together like you're some serial killer from a movie, and then stuffing everything into a box, just so that the receiver can cry out: "What's in the boooox?!"
Admin
So I wrapped my XML in XML, which was the style at the time. Now, to get the data I needed required a SOAP call, and in those days, webservices were poorly designed. "Gimme all the data in an XML wrapped in an XML," you'd say.
Admin
Poor Munich gets misspelled as Nunich at least once, and moved far to the south - Coordinate x="45.73155848600081" y="11.395289797465072"
Admin
If you ever ask a question, and part of the answer is "SOAP" or "XML", you may have asked the right question but addressed it to the wrong person.
Admin
If you mean München, use "München", not "Munich". It's not just a matter of Unicode / UTF-8 being available everywhere and also being the de-facto standard encoding for everything, but surely a matter of respect, or do you see Germans running around and saying (and spelling!) things like "Voar-shing-tonn" or "Noy York" or "President Bee-den" or "Shtar Varrs"? NB: The cat in bag.
Admin
How about you first convinced the people at Das offizielle Stadtportal
https://www.muenchen.de/int/en/traffic/public-transport.html
Admin
How about the authors of München's "Das offizielle Stadtportal" looked up their official URL (muenchen.de) as well as what they've already written on the german pages? Same for Nürnberg - "Nuremburg" (nuernberg.de). But hey, let's arbitrarily exchange letters from cities of other countries, for example Londem and Liwerpul in Grät Britain, Waschingten and Neu Jorg in the Junited States of Amerika, and provide them with a german language variant for their official homepages where the inhabitants of those can read how different countries like to translate their city names. Maybe it makes them feel more international?
Admin
Right, except all the languages in the world where keyboard doesn't have that u with dots. It's not just unicorn. Also, most languages have names for foreign geographic entities that are different from that entity's mother tongue. Or do you expect the whole world to spell and pronounce Deutschland the German, sorry the Deutsche, way? We can't say Germany no more either?
Admin
It's called an exonym and totally not an English-only phenomenon: Germans call e.g. the city of Lviv (Львів) that's often on the news in recent days Lemberg and if you look at the same section of the world via different language maps, you'll find the habit to coin names is quite universal.
Admin
I'm almost afraid to tell Mr. Officer that we call that city Mnichov, the capital Berlín and I'll let the kind reader find Brémy, Norimberk, Drážďany and Lipsko on his or her own. :-)
But of course there is no Prag and no Pilsen!