There's plenty of room for debate about what specific poor choices in history lead to the most bugs today. Was it the billion dollar mistake of allowing null pointers? Is it the absolute mess that is C memory management? Or is it C-style strings and all the attendant functions and buffer-overruns they entail?
A developer at Jay's company had been porting some C++ code to a new platform. That developer left, and the wheel-of-you-own-this-now spun and landed on Jay. The code was messy, but mostly functional. Jay was able to get it building, running, and then added a new feature. It was during testing that Jay noticed that some fields in the UI weren't being populated.
Jay broke out a memory analyzer tool, and it popped out warnings on lines where strlcpy
was being called. Now that was odd, as strlcpy
is the "good" way to copy strings, with guarantees that it would never allow buffer overruns. The buffers were all correctly sized, which left Jay wondering what exactly was wrong with the calls to strlcpy
?
A quick grep
through the code later, and Jay knew exactly what was wrong:
#define strlcpy strncpy
The code originally had been targeting a platform which had strlcpy
available, but the port was moving to a platform which did not. The previous developer, either out of a combination of laziness, ignorance, carelessness, or some combination of all of those, decided that since strlcpy
and strncpy
had the same calling semantics, a macro could solve all their problems.
If you haven't had to deal with C-strings, or just general C-style conventions, recently, it's important to note a few things. First, C doesn't actually have strings as a datatype, it just has an array of characters. Second, arrays are actually just pointers to the first item in the array, and C doesn't do anything to enforce the length, which means you're free to access element 11 in a 10 element array, and C will let you. Finally, since "knowing how long a string is" might actually be important, the way C-strings address the problems above is that the last character in the string should be a null terminator. All the string handling functions know that if they see a null terminator, that's the end of the string, and that keeps your code from reading off the end of the array into some other block of memory- or worse, writing to that arbitrary block of memory.
Which brings us to the key difference between strlcpy
and strncpy
: the first one is "safer" and guarantees that the last character in the output buffer is going to be a null terminator. strncpy
makes no such guarantee; if there isn't room in the buffer for a null terminator, it just doesn't put one in.
In other words, with one macro, Jay's predecessor had created hundreds of buffer-overrun vulnerabilities. Jay removed the macro, properly updated the calls to safely copy strings, and the errors went away.
In any case, let's close with this quote, from the "Bugs" section of the strncpy
/strcpy
manpage, which is just a fun read:
If the destination string of a strcpy() is not large enough, then anything might happen. Overflowing fixed-length string buffers is a favorite cracker technique for taking complete control of the machine. Any time a program reads or copies data into a buffer, the program first needs to check that there's enough space. This may be unnecessary if you can show that overflow is impossible, but be careful: programs can get changed over time, in ways that may make the impossible possible.