• bvs23bkv33 (unregistered)

    looks not like SAX parsing for me, and for enterpriseness they should use Oracle XML Developer's kit

  • someone (unregistered)

    "XML is TRWTF"

    For once, I agree with a "TRWTF" accusation. (the other is JSON requiring keys to be quoted in all circumstances)

  • (nodebb)

    Your first tests are with small files with every possible type of element to make sure everything is handled properly and your

    Case Else
    never gets called, or at least logs the unexpected elements. Then you test with big files, like the 10000 records or so of the client database or something like that. If nothing else, it will show you how much your application will slow down under this kind of stress test. Of course, "crash the application and take the operating system with it" is the epitome of "slow down under heavy load", but I only did that once.

  • (nodebb) in reply to bvs23bkv33

    XmlReader is the .NET equivalent to StAX, and the right tool for this job, albeit a clumsy one. And Oracle XDK hasn't been updated in a decade and a half.

    ... you're right, let's go with MSXML4.

  • Zach (unregistered)

    Your HTTPS certificate expired 3 minutes ago

  • Andrew (unregistered)

    TRWTF is certificate invalidation

  • Brian (unregistered)

    You'd think a site dedicated to cataloging various IT-related WTFs would be able to avoid a rather basic one themselves by proactively renewing their certs.

  • Björn Tantau (unregistered)

    TRWTF is using Let's Encrypt without automation.

  • Chris H. (unregistered)

    Your TLS certificate expired this morning; maybe your Let's Encrypt scripts are failing? D'oh.

  • TMMITW (unregistered)

    Irony: The one site devoted to writing about failures in computing... has a failure. Your site certificate expired.

  • Gumpy Gus (unregistered)

    Been there! The OpenStreetMap files are XML, many gigabytes of XML. Reading them by any simple canned method results in many hours of CPU chugging. I finally bit the bullet and wrote some x86 asm code, using the 512-bit vector ops, to search and extract elements. Not foolproof, but like 50 times faster than anything else.

  • YMMV (unregistered)

    TRWTF is multi-gigabyte XML files. Get an ETL tool from any reputable vendor and put the data into a database.

  • doubting_poster (unregistered)

    For completion sake, there are three approaches in C#, the third one being where it maps the xml data to predefined C# classes (like java can map JSON to pojo's). It requires making sure everything is properly serialisable, but once you do it's a convenient way of dumping object state to disk.

  • (author) in reply to Björn Tantau

    The renewal worked perfectly. It's just that IIS 7 doesn't like updating the certificate it's supposed to use without me going in and forcing it to.

  • Ozz (unregistered)

    RAM is cheap. Just get moar RAMs.

  • OnceUponATime (unregistered)

    Once upon a time Microsoft forgot to renew/auto-renew microsoft.com. Some bloke did it for them and very kindly let them know. As I recall it was someone like a teenager who noticed when he couldn't get through to something at microsoft.

  • WTFGuy (unregistered)

    I wonder whether the spec mentioned anything about the size of XML files this thing was expected to handle? In fact I wonder if there was a spec at all?

  • (nodebb)

    Gigabyte-sized XMLs? Is this what the cool kids call "big data"?

  • K. (unregistered) in reply to Ben Lubar

    TRWTF is setting up IIS.

  • UK Pedant (unregistered) in reply to OnceUponATime

    "Microsoft forgot to renew/auto-renew microsoft.com" - seems it was hotmail.com in 1999 and, not learning their lesson, hotmail.co.uk in 2003: https://whoapi.com/blog/1582/5-all-time-domain-expirations-in-internets-history/

  • UK Pedant (unregistered)

    Christ, can't even get it right this morning when correcting someone else! It was passport.com, not hotmail.com!

  • Barf4Eva (unregistered)

    parsing very large XML files? well, I haven't quit my job yet... :P


  • Rocky (unregistered)

    I've had to support REST-services that takes an XML in a JSON-string. And the clients doesn't safely ensure that the XML has encoding set to UTF-8 which is the default JSON encoding.

    Apparently it's not a problem that the services barfs out errors in a production environment because of mixed encoding. Especially when someone inputs an €-sign on a form because the only 2 officially supported encodings are UTF-8 and ISO-8859-1.

    I'm not allowed to fix the client code since it belongs to another department. Instead we have to regularly handle production error tickets and manually handle all the encoding problems.

Leave a comment on “To Read or Parse”

Log In or post as a guest

Replying to comment #:

« Return to Article