We return to Stefano Z, who once upon a time was a lowly junior developer surrounded by Very Smart™ and Very Senior™ developers.

Some of their code was written in a not-C language. It had to process many hundreds of millions of data points, and while it was fast, it wasn't fast enough. The senior devs had a solution: rewrite it in C, the fastest of all languages.

The rewrite happened, and there was just one little problem: the performance was significantly worse, and it now crashed consistently. The senior developers had a solution for this, too: they blamed Stefano.

"Some of the code you wrote has to be breaking it, my code is highly optimized. Look at how few memory allocations it makes!"

Let's walk through the highly optimized memory allocations for how this code serializes things. Let's start with a rough sketch of the structure they serialized:

struct ImportantThing
{
	int someValue1;
	int someValue2;
	int someValue3;
	int someValue4;
	char* details;
};

Obviously anonymized, the key thing is that there are a large number of small fields, and one string that could be any length. Usually the string is NULL, and the average length is only a handful of characters.

Function names and variable names have been editorialized by Stefano.

char* VerySmartSerialize(const ImportantThing* arrayOfThings, size_t numberOfThings)
{
	const int AssumedTypicalStringSize = 50;
	size_t assumedNumberOfThingsWithNonNullString = numberOfThings;
    size_t cleverInitialBufferSize = (sizeof(ImportantThing) * numberOfThings)
		+ (AssumedTypicalStringSize * assumedNumberOfThingsWithNonNullString);

	char* buffer = (char*)malloc(cleverInitialBufferSize);
	char* positionInBuffer = buffer;
	size_t currentSize = cleverInitialBufferSize;

	for (size_t i = 0; i < numberOfThings; i++)
	{
		size_t itemSize = sizeof(ImportantThing);
		size_t stringLen = 0;
		if (arrayOfThings[i].details != NULL)
		{
			stringLen = strlen(arrayOfThings[i].details);
			itemSize += sizeof(size_t) + stringLen;
		}

		if (positionInBuffer + itemSize > buffer + currentSize)
		{
			// OMITTED for brevity: code that DOBULES the buffer size!
			// (the double the size, the double the fun)
		}

		memcpy(positionInBuffer, &arrayOfThings[i], sizeof(ImportantThing));
		positionInBuffer += sizeof(ImportantThing);
		if (arrayOfThings[i].details != NULL)
		{
			*((size_t*)positionInBuffer) = stringLen;
			positionInBuffer += sizeof(size_t);
			memcpy(positionInBuffer, &arrayOfThings[i].details, stringLen);
			positionInBuffer += stringLen;
		}
	}

	return buffer;
}

We start with a few assumptions at the top of our function. We assume that the typical string is going to hold 50 characters, and we assume that every instance of the structure has a non-null string.

These assumptions are false. They represent a worst case, which isn't inherently a bad choice, but they're not based in reality. Then we allocate a buffer to hold all of this data in memory. But that buffer might not be the right size, so as we iterate across each item, we check to see if the buffer has room, and if not, we double it's size.

So yes, this "optimized" function guarantees very few memory allocations. And at first, Stefano assumed that it was the doubling which was causing the crash. But this was itself, a bad assumption, and Stefano had to go back to the original assumptions the senior devs made.

They assumed that every entry had 50 characters. They frequently were processing hundreds of millions of these entries. But after a mere 43 million entries, this would allocate a buffer larger that 2GB- on a 32-bit machine. This is not a thing that was possible.

Simply running this function on a realistically sized dataset caused the crash instantly. Which means the senior devs just never did that.

There was also no requirement that all the data had to be serialized into a single memory buffer. The rational, normal solution to a problem like this would be to work in chunks. Copy some meaningful subset of the data to a buffer, flush the buffer to your destination, copy the next chunk, repeat. This even hits the goal of "very few memory allocations", since you can just reuse the buffer.

Unfortunately for Stefano, he was just a junior. A redesign like that would require changing a lot of the dependent code, and the seniors would never allow that to happen. Plus, he didn't quite have the experience to exactly see how such a solution would end up working. So he came up with a compromise solution: he scanned the input dataset and measured the length of all of the strings, and then allocated a buffer that was exactly the right size. It might "hurt performance", but it also didn't crash, so all in all, it was a net win.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!