A More Unique Identifier

"Oh for crying out loud," Jeremy heard his cubicle-neighbor Andy shout, followed by a string of not-so-family-friendly expletives. "It's yet another duplicate GUID!"

Jeremy was intrigued. "Duplicate" is perhaps the least likely problem for a Globally Unique Identifier. With more than 340 billion trillion quadrillion (and that's no typo) possible values, the probability of having two identical GUIDs is basically non-existent. The probability of having multiple duplicate GUIDs is smaller than winning the lottery twice. On the same day. For every lottery held in the world.

"Duplicate GUIDs?" Jeremy stood up and asked over the cubicle wall. "How is that even possible?"

"Obviously it's bound to happen sooner or later," Andy responded. "I mean, we generate a lot of GUIDs. And I mean a lot. We really should have used a more unique identifier, like I had suggested earlier."

That last sentence-especially delivered with the told-you-so inflection-was the only clue Jeremy needed to know exactly what Andy was referring to. Months earlier, the development team was presented with a bit of a unique challenge.

Unique Requirement

An automated data collection and processing application they were building required that a "dataset ID" be returned for every dataset that was uploaded to the Web service. This "dataset ID" could then be used by the consuming application to check on the processing status, cancel the processing request and, once processing was completed, retrieve the "processed dataset ID."

The tricky part in all this was that the processing application would never know how many IDs were issued or what IDs had been issued: It would somehow have to provide an ID that was always unique.

Given the globally unique requirement, the solution was obvious to Jeremy: Simply generate a GUID using the Windows API. Andy, on the other hand, hadn't used GUID in the past and didn't quite trust an algorithm to be smart enough to generate such an identifier. He didn't have a better idea, but was confident that, given enough time, he could cobble something together that utilized the computer's serial number, CPU footprint and a number of other factors.

"We're not generating that many GUIDs," Jeremy defended. "A thousand a day, tops. Statistically, we'd need to generate a hundred trillion every day for a mill-"

Andy cut him off. "Yeah, yeah, I remember your whole spiel. A billion, gazillion, fafillion, shabolubalu jillion zillion yillion ... Whatever. The fact is, we've got duplicates. It's causing all sorts of problems, and I'm going to have to spend all afternoon cleaning the mess for just this one duplicate."

Shifty Characters

Baffled, Jeremy decided to peek at the source code to see the problem. Perhaps it was a variable that was getting reused? Or maybe something in the cache?

After all of 10 minutes, Jeremy discovered the root of the problem:

// Swap two chars of dataset ID 
// to create processed ID
var dsID = dataSetGuid.ToString();
var pdsID = new StringBuilder();
pdsID.Append(dsID[1]); 
pdsID.Append(dsID[0]); 
pdsID.Append(dsID.Substring(2));
return new Guid(pdsID.ToString());

The code was checked in by Andy. In fairness, it will generate a new, unique GUID-provided that the first two characters of the GUID aren't the same.

Jeremy explained the problem to Andy, who was still working on cleaning up 664591c8-1985-4071-a4ab-ec87f1e9af1.

"Oh," Andy said, embarrassed. "I see. But what are the chances of that?"

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!