An Enterprise API

There’s a simple rule about “enterprise” software: if the word “enterprise” is used in any way to describe the product, it’s a terrible product. Enterprise products are usually pretty special purpose and serve a well-capitalized but usually relatively small market. You aren’t going to sell licenses to millions of companies, but maybe tens of thousands. Often hundreds. There are some extremely niche enterprise products that have only two or three customers.

Lots of money, combined with an actually small market in terms of customers, guarantees no real opportunity for competition. Couple that with the fact that each of your customers wants the off-the-shelf product you’re selling them to have every feature they need for their business case, you’re on a fast track to bloated software, inner platforms, and just general awfulness.

Today, we’re not talking about Oracle, though. Jacek is in the healthcare industry. Years ago, his company decided to shove off their patient scheduling and billing to a Software-as-a-Service enterprise product. Integrating that into all of their other systems has kept the development teams busy ever since, and each successive revision of the API has created a boatload of new work.

Currently, the API is on version six. V6 is notable because this is the first version which offers a REST/JSON API. V5 was a SOAP-based API.

Here’s an example message you might send to their scheduling services:

{
    "Id": 12,               
    "StartDateTime": "2018-06-16T11:00:00",
    "EndDateTime": "2018-06-16T11:30:00",             
    "Description": "Lunch"
}

That schedules a blackout window during which a service provider might be unavailable. Nothing about that seems too bad, does it?

What if, on the other hand, we wanted to schedule a wellness class, on say proper nutrition?

{
  "siteId": 7,
  "locationId": 2,
  "classScheduleId": 9,
  "classDescriptionId": 15,
  "resources": [
    {
      "id": 2,
      "name": "Conference Room 3"
    }
  ],
  "maxCapacity": 24,
  "webCapacity": 20,
  "staffId": 5,
  "staffName": "Sallybob Bobsen",
  "isActive": true,
  "startDate": "2018-07-22",
  "endDate": "2020-07-22",
  "startTime": "10:30:00",
  "endTime": "11:30:00",
  "daysOfWeek": [
    "Tuesday",
    "Thursday"
  ]
}

Well, that message on its own doesn’t look terrible either. But when you combine the two, you’ll notice that whether you use PascalCase or camelCase varies based on which endpoint you’re invoking.

If you want to create a client, you’ll post a document like:

{"FirstName": "Suebob", "LastName": "Bobsen"…}

I’ve skipped over a giant pile of fields- address, relationship to other clients, emergency contacts, etc. They’re not relevant, because I simply want to compare creating a client to updating a client:

{
  "Client": {
    "Id": "123456789",
    "LastName": "Bobsdottir",
    "CrossRegionalUpdate": false
  }
}

So we send different shaped documents to perform different operations on the same resource.

And it’s important to note, none of these resources have their own endpoints. In a RESTful webservice, I’d want to send my updates as a post to https://theservicehost/clients/123456789, while a create would just be a post to https://theservicehost/clients.

Here, we have different routes for addclient and updateclient. So this isn’t a RESTful API at all- it’s a shoddily assembled RPC API, probably just a JSON wrapper around their SOAP API. It has no concept of resources. Fine, we can accept that and move on, and hey, the documentation at least lays out the correct shape of the documents, and what the fields mean, right?

Well, look back at that update message. See the CrossRegionalUpdate field? Well, per the documentation:

When CrossRegionalUpdate is omitted or set to true, the client’s updated information is propagated to all of the region’s sites.

So, it sounds like if you have a region that contains multiple sites, you want to set that to true (or omit it), right? Wrong. Most customers should set the flag explicitly to false. When Jacek talked to a technical support rep from the company, they said: “You would know if this should be set to true. Just set it to false.”

With all that, you can imagine how their error handling works, which is to say, it doesn’t. There are two things which happen when an error occurs: it either returns a 400 status code with no body, or it returns a 302 status code which redirects to another endpoint which serves up a plain-text version of your request. Your request, not the response to the request.

This nearly created a serious problem. The requests contain personally identifiable information about patients. Their initial naive logging approach was just to dump error responses to a log file, assuming that the API would never return an error with PII in it. At first, it looked like they should just treat a 302 as an error, and log the response, but they discovered in testing that would basically be dumping PII into an insecure store reviewed by IT support personnel.

Jacek is “eagerly” looking forward to v7 of this API, which is certain to contain more “improvements”. The real question is: if this is their sixth attempt to get it right, what did v1 look like, and did anyone survive attempting to use it?