Our Anonymous submitter's first job was helping to support a distributed system running in a low-energy embedded platform. Interesting on its face, the platform was actually a bloated, outdated monstrosity made worse by the decision to use C++ in conjunction with a homemade (read: unsafe) binary data format.
The platform had one mysterious, catastrophic flaw in particular: once a cluster was up and running for a few weeks, it would sometimes fail with a series of random segfaults, with several nodes crashing at once. Serial port debugging showed that many of these failures were proceeded by an ominous log message, a single line with the number 10001 and nothing else.
After pondering the mystery for some time, someone thought to search for the number 10000 within the code base. Thus they found their culprit within the very core of their application: the code that read each record stored in their homemade binary files and copied it to memory.
#define END 0xFFFF
while (header->recordType != END) {
// unknow (sic) loop times
if (recordNum > 10000) {
log(recordNum);
return 0;
} else {
*header = *recordPtr;
memcpy(memoryPtr, recordPtr + HEADER_SIZE, header->recordSize)
*recordPtr += header->recordSize;
}
recordNum++;
}
Whenever one of the binary files got corrupted in some way, this code never found an END record, and thus proceeded to copy random memory chunks to other random memory chunks until its incredible security measures kicked in after just 10000 iterations. Even worse, the code would also alter other files read into memory. Those corrupted files were then transferred to other nodes, cutting a swath of destruction and fail across the entire platform.
Our submitter and his cohorts resolved immediately to switch to an industry-standard, resilient, checksum-protected data format in the near future. And then, knowing full well what that implied ... they added a log message.