Felix caught a ticket about their OpenId authentication. For some mysterious reason, it had started failing around 30% of the time, specifically because the access token returned by the service was invalid.

Felix had originally written the code, but there was one problem: he wasn’t the last one to touch it. Another development team needed their own versions of the code, organized a bit differently, for infrastructure reasons. Eventually, the whole thing was turned into a drop-in library component that was used by all applications which depended on OpenId. The failures started after they made their changes, so obviously their changes caused the failures.

Since the errors were intermittent, their first guess was that the bug was something intermittent- perhaps an infrastructure problem, or a race condition between interacting services? They couldn’t reliably reproduce the error, so Felix spent a lot of time eliminating possibilities. Trawling through the code wasn’t very helpful. The other team had been operating under unrealistic deadlines and hacked together something that worked and wasn’t too worried about how or why it worked. The result included lots of un-patterns (like anti-patterns, but without having a pattern to them), inheritance trees that desperately needed pruning, and old-fashioned SQL injection vulnerabilities copy-pasted everywhere.

Eventually, buried deep in a common service adapter base class, no where near the code that was supposed to be responsible for managing authentication, he found this code for fetching the OpenId token:

    public async Task<T> GetAsync(string key)
    {
        T result = default(T);

        using (var httpClient = await CreateHttpClient())
        {
            HttpResponseMessage response = await httpClient.GetAsync(Route + "/" + key);
            if (response.IsSuccessStatusCode)
            {
                result = _serializer.Deserialize(
                    WebUtility.UrlDecode(response.Content.ReadAsStringAsync().Result)
                );
            }
        }

        return result;
    }

Felix cringed a little at seeing a call to CreateHttpClient with each execution- there was a lot of overhead there, and the whole thing would be more efficient if the code only did that once. Still, the problem wasn’t performance. Calling response.Content.ReadAsStringAsync().Result could actually cause deadlocks, and should have been await response.Content.ReadAsStringAsync(), but Felix wasn’t tracking down a deadlock.

In fact, Felix almost eliminated this code as the source of his problem, until he looked at the call to WebUtility.UrlDecode. The body of the response was a JSON object. There was no logical reason to do that. And the OpenId token was Base–64 encoded, using a mapping that included the “+” character. In URL terms, “+” might be known as %2B, thus mangling the token any time a "+" sign appeared in it, which coincidentally, happened about 30% of the time.

Felix fixed up this function, removing the nonsensical attempt to decode something which wasn’t encoded in the first place, and the bug went away.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!