Rebecca inherited some code that’s responsible for gathering statistical data from network interface. It was originally written a decade ago by one of those developers who wanted to make certain their influence was felt long after they left the company.
The code was supposed to write to one of two log files: a “quick log”, with 2-second resolution (but only for the last minute’s data), and a “full log”, with 1-minute resolution.
Unfortunately, it would often fail to do anything with the full log. Frustrated that this code- which had lived in a shipping product for over a decade- was so unreliable, Rebecca dug in to see what the problem was.
#define FULL_SAMPLE_DELAY 60
void dostats(FILE *quicklog, FILE* mainlog) {
//[code omitted for brevity]
while(!done) {
sleep(2);
if(!(times.now.t.tv_sec % FULL_SAMPLE_DELAY)) {
// main samples
stats.save(mainlog, times);
}
else {
// quick samples
quickstats.summarise(quicklog, quickstats_top_n);
}
}
}
Rebecca’s predecessor had the good sense to use sleep()
to keep the loop from spinning the CPU, but made one major error: he assumed that calling quickstats.summarise
took no time
. Even if the loop started at exactly the right time, the amount of time spent executing quickstats.summarise
guaranteed that eventually, the current time wouldn’t line up with an even minute, and the full log would become unreliable.