Properly used, version history can easily help you track down and identify the source of a bug. Improperly used, it still can. As previously established, the chief architect Dana works with has some issues with source control.
Dana works on a large, complex embedded system. “Suddenly”, her team started to spot huge piles of memory corruption problems. Something was misbehaving, but it was hard to see exactly what.
They ported Valgrind to their platform, just so they could try and figure out what was going wrong. Eventually, they tracked the problem down to a pair of objects.
In the flow of the code, the correct path was that object A, which we’ll call Monster
would be allocated. Then a second object would be allocated. Somehow, Monster
instances were corrupting the memory of the second object.
How does an object allocated earlier corrupt the memory of an object allocated later? Well, “before” and “after” have different meaning when your code is multi-threaded, which this was. Worse, the Monster
class was katamari of functionality rolled up across thousands of lines of code. Obviously, there had to be a race condition- but a quick glance at all the Monster
methods showed that they were using a mutex to avoid the race condition.
Or were they? Dana looked more closely. One of the methods called during the initialization process, doSomething
, was marked const
. In C++, that should mean that the method doesn’t change any property values. But if it doesn’t change any property values, how can it lock the mutex?
This is where walking through the commit history tells a story. “Fortunately” this was before Jerry learned you could amend a commit, so each step of his attempts to get the code to compile are recorded for posterity.
The chain of commits started with one labeled “Add Feature $X”, and our doSomething
method looked like this.
void doSomething() const {
Mutex::ScopedLock lock(mutex);
// Dozens of lines of code
}
Now, the intent here was to create a ScopedLock
object based off a mutex
property. But that required the mutex
property to change, which violated const
, which meant this didn’t even compile.
Which brings up our next commit, labeled “Fix compile failure”:
void doSomething() const {
Mutex::ScopedLock lock(mutex) const;
// Dozens of lines of code
}
Surprisingly, just slapping the const
declaration on the variable initialization didn’t do anything. The next commit, also helpfully labeled “Fix compile failure”:
void doSomething() const {
Mutex::ScopedLock lock(const mutex);
// Dozens of lines of code
}
Again, this didn’t work. Which brings us to the last “Fix compile failure” commit in this chain:
void doSomething() const {
Mutex::ScopedLock lock(const Mutex mutex);
// Dozens of lines of code
}
By randomly adding and subtracting symbols, Jerry was able to finally write a function which compiles. Unfortunately, it also doesn’t work, because this time, the line of code is a function declaration for a function with no implementation. It takes a mutex as a parameter, and returns a lock on that mutex. Since the declaration has no implementation, if we ever tried to call this in doSomething
, we’d get an error, but we don’t, because this was always meant to be a constructor.
The end result is that nothing gets locked. Thus, the race condition means that sometimes, two threads contend with each other and corrupt memory. Dana was able to fix this method, but the root cause was only fixed when Jerry left Initech to be a CTO elsewhere.