JR earned a bit of a reputation as the developer who could solve anything. Like most reputations, this was worse than it sounded, and it meant he got the weird heisenbugs. The weirdest and the worst heisenbugs came from Gerry, a developer who had worked for the company for many, many years, and left behind many, many landmines.
Once upon a time, in those Bad Old Days, Gerry wrote a C++ socket-server. In those days, the socket-server would crash any time there was an issue with network connectivity. Crashing services were bad, so Gerry “fixed” it. Whatever Gerry did fell into the category of “good enough”, but it had one problem: after any sort of network hiccup, the server wouldn’t crash, but it would take a very long time to start servicing requests again. Long enough that other processes would sometime fail. It was infrequent enough that the bug had stuck around for years, but finally, someone wanted Something Done™.
JR got Something Done™, and he did it by looking at the CreatSocket
method, buried deep in a "God" class of 15,000 lines.
void UglyClassThatDidEverything::CreatSocket() {
while (true) {
try {
m_pSocket = new Socket((ip + ":55043").c_str());
if (m_pSocket != null) {
// LOG.info("Creat socket");
m_pSocket->connect();
break;
} else {
// LOG.info("Creat socket failed");
// usleep(1000);
// sleep(1);
sleep(5);
}
} catch (...) {
if (m_pSocket == null) {
// LOG.info("Creat socket failed");
sleep(5);
CreatSocket();
sleep(5);
}
}
}
}
The try
portion of the code provides an… interesting take on handling socket creation. Create a socket, and grab a handle. If you don’t get a socket for some reason, sleep for 5 seconds, and then the infinite while loop means that it’ll try again. Eventually, this will hopefully get a socket. It might take until the heat death of the universe, or at least until the half-created-but-never-cleaned-up sockets consume all the filehandles on the OS, but eventually.
Unless of course, there’s an exception thrown. In that case, we drop down into the catch, where we sleep for 5 seconds, and then call CreatSocket
recursively. If that succeeds, we still have that extra call to sleep
which guarantees a little nap, presumably to congratulate ourselves for finally creating a socket.
JR had a simple fix for this code: burn it to the ground and replace it with a more normal approach to creating sockets. Unfortunately, management was a bit gun-shy about making any major changes to Gerry’s work. That recursive call might be more important than anyone imagined.
JR had a simpler, if stupider fix: remove the final call to sleep(5)
after creating the socket in the exception handler. It wouldn’t make this code any less terrible, but it would mean that it wouldn’t spend all that time waiting to proceed even after it had created a socket, thus solving the initial problem: that it takes a long time to recover after failure.
Unfortunately, management balked at removing a line of code. “It wouldn’t be there if it weren’t important. Instead of removing it, can you just comment it out?”
JR commented it out, closed VIM, and hoped never to touch this service again.