Every industry has information that needs to be moved back and forth between disparate systems. If you've lived a wholesome life, those systems are just different applications on the same platform. If you've strayed from the Holy Path, those systems are written using different languages on different platforms running different operating systems on different hardware with different endian-ness. Imagine some Java app on Safari under some version of Mac OS needing to talk to some version of .NET under some version of Windows needing to talk to some EBCIDIC-speaking version of COBOL running on some mainframe.

Long before anyone envisioned the above nightmare, we used to work with SGML, which devolved into XML, which was supposed to be a trivial tolerable way to define the format and fields contained in a document, with parsers on every platform, so that information could be exchanged without either end needing to know anything more than the DTD and/or schema for purposes of validation and parsing.

In a hopelessful attempt at making this somewhat easier, wrapper libraries were written on top of XML.

Sadly, they failed.

A hand holding a large pile of pills, in front of a background of pills

In the health care industry, some open-source folks created the (H)ealthcare (API), or HAPI project, which is basically an object oriented parser for text-based healthcare industry messages. Unfortunately, it appears they suffered from Don't-Know-When-To-Stop-Syndrome™.

Rather than implementing a generic parser that simply splits a delimited or fixed-format string into a list of text-field-values, the latest version implements 1205 different parsers, each for its own top-level data structure. Most top level structures have dozens of sub-structures. Each parser has one or more accessor methods for each field. Sometimes, a field can be a single instance, or a list of instances, in which case you must programmatically figure out which accessor to use.

That's an API with approximately 15,000 method calls! WTF were these developers thinking?

For example, the class: EHC_E15_PAYMENT_REMITTANCE_DETAIL_INFO can have zero or more product service sections. So right away, I'm thinking some sort of array or list. Thus, instead of something like:

    List<EHC_E15_PRODUCT_SERVICE_SECTION> prodServices = info.getProductServices();
    // iterate

... you need to do one of these:

    // Get sub-structure
    // Get embedded product-services from sub-structure

    // ...if you know for certain that there will be exactly one in the message:
    // ...if you don't know how many there will be:
    int n = infos.getPRODUCT_SERVICE_SECTIONReps();
    for (int i=0; i<n; i++) {
        // use it

    // ...or you can just grab them all and iterate

...and you need to call the correct one, or risk an exception. But having multiple ways of accomplishing the same thing via the API leads to multiple ways of doing the same thing in the code that is using the API, which invariably leads to problems.

So you might say, OK, that's not SO bad; you just use what you need. Until you realize that some of these data structures are embedded ten+ levels deep, each with dozens of sub-structures and/or fields, each with multiple accessors. With those really long names. Then you realize that the developers of the HAPI got tired of typing and just started using acronyms for everything, with such descriptive data structure names as: LA1, ILT and PCR.

The API does attempt to be helpful in that if it doesn't find what it's expecting in the field you ask it to parse, it throws an exception and it's up to you to figure out what went wrong. Of course, this implies that you already know what is being sent to you in the data stream.

Anonymous worked in the healthcare industry and was charged with maintaining a library that had been wrapped around HAPI. He was routinely assigned tasks (with durations of several weeks) to simply parse one additional field. After spending far too much time choking down the volumes of documentation on the API, he wrote a generic single-class 300 line parser with some split's, substring's, parseDate's and parseInt's to replace the whole thing.

Now adding an additional field takes all of ten minutes.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!