A long, long time ago, in a phone company long since gone and resurrected, if Aunt Bee wanted to call Sheriff Andy, she picked up the phone, pressed the receiver a couple of times, the operator picked up, Bee said to connect her to Andy, and the operator shoved a jack into a hole to complete the circuit. For long distance calls, two or more operators and switchboards were involved. It left something to be desired, but it worked.

Fast forward a while, and Vint Cerf and Bob Kahn got the idea to automate it all with a packet-switched network. David Reed designed UDP, which helped eliminated the operators sitting at switchboard jack panels, but didn't guaranty delivery of the packet, and if the packets did arrive, there was no particular ordering to them. This fire-and-forget mechanism was fine for the sender. Not so much for the recipient.

Fast forward some more and they created TCP, which provided guaranteed delivery and reassembly of the logical message. This worked well for decades. Eventually, Time Berners-Lee built the internet on top of it all, and it still worked remarkably well.

At some point, people started building games to play on this wonderful web of worlds. Eventually, they realized that the games would be a lot more fun if players could share the field of battle and play with/against each other...

Stefan is a client engineer at a company that builds client-server real-time networking middleware for game development. They support more than 20 different platforms and 6 different programming languages. New features are implemented in the primary language and then ported to the other languages.

Since they are chronically short of developer manpower and there are at least two years worth of priority-one tasks in the queue, they often outsource any porting work of significant consequence and which doesn't require too much in the way of communicating requirements. The consulting companies hired to do the work are required to have programmers, each with about a decade of experience doing the relevant type of work and in the requisite language.

Typically, they gave the consulting company the code for the feature with instructions to make it work in the same way (e.g.: pass the same set of unit tests, but in the desired programming language).

The primary means of communication for the library is via UDP, and their implementation is extremely well tested and fairly bulletproof. An alternate mechanism that uses TCP was also available, but only offered for platforms that didn't support UDP. Finally, some customers wanted to run fly in the cloud and use services that didn't support UDP. Thus was born the need to port the client side TCP implementation to C++.

This task was outsourced to a company in a land recently liberated from an oppressive overlord. The consulting company assessed the work and provided a written estimate of time and cost.

After more than triple the time estimate had passed, Stefan had invested more than half of the amount of time it would have taken him to do it himself in helping the consultants. Since Stefan was already overloaded at that time, he didn't check every single line of the thousands of lines of code and tests that the consultants had produced.

Sadly, he subsequently wished that he had.

One of the test cases for handling a large UDP message would set up and validate that the correct thing happened when both sending and receiving a message that was beyond certain thresholds. Specifically, it would be broken down into smaller messages, transmitted, and reassembled on the receiving end. Since TCP is higher level than UDP and handles breaking up and reassembling messages automatically, this test case should succeed for TCP. That is, unless something has gone horribly wrong.

However, the first couple of lines in the test case for this scenario provided by the consultants were:

if (useTcp) {
   return;
}

At first, Stefan thought that this had been added just to save a bit of time when running the test cases. Then the TCP implementation started spamming the logs in an endless loop, repeatedly spewing forth the same error codes. At this point, he decided to dive a bit deeper into the TCP implementation.

Unlike the original TCP implementation, the consultants decided to use the UDP code for handling fragmentation for TCP. When they realized that this wasn't working, they decided to fix it in a unique way. Rather than just porting the correct lines of code, they simply set a using-TCP flag and checked it whenever they dove into the UDP code being used to implement TCP, thus masking the problem instead of fixing it.

Stefan went in search of an assault rifle and a plane ticket...

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!