When JSON started to displace XML as the default data format for the web, my initial reaction was, "Oh, thank goodness." Time passed, and people reinvented schemas for JSON and RPC APIs in JSON and wrote tools which turn JSON schemas into UIs and built databases which store BSON, which is JSON with extra steps, and… it makes you wonder what it was all for.
Then people like Mark send in some code with a subject, "WHY??!??!". It's code which handles some XML, in C#.
Now, a useful fact- C# has a rich set of API- for handling XML, and like most XML APIs, they implement two approaches.
The simplest and most obvious is the DOM-style approach, where you load an entire XML document into memory and construct a DOM out of it. It's easy to manipulate, but for large XML documents can strain the available memory.
The other is the "reader" approach, where you treat the document as a stream, and read through the document, one element at a time. This is a bit trickier for developers, but scales better to large XML files.
So let's say that you're reading a multi-gigabyte XML file. You'd want to quit your job, obviously. But assuming you didn't, you'd want to use the "reader" approach, yes? There's just one problem: the reader approach requires you to go through the document element-by-element, and you can't skip around easily.
public void ReadXml(XmlReader reader)
{
string xml = reader.ReadOuterXml();
XElement element = XElement.Parse(xml);
…
}
Someone decided to give us the "best of both worlds". They load the multi-gigabyte file using a reader, but instead of going elementwise through the document, they use ReadOuterXml
to pull the entire document in as a string. Once they have the multi-gigabyte string in memory, they then feed it into the XElement.Parse
method, which turns the multi-gigabyte string into a multi-gigabyte DOM structure.
You'll be shocked to learn that this code was tested with small testing files, not multi-gigabyte files, worked fine in those conditions, and thus ended up in production.