One of the skills I think programmers should develop is not directly programming related: you should be comfortable reading RFCs. If, for example, you want to know what actually constitutes an email address, you may want to brush up on your BNF grammars. Reading and understanding an RFC is its own skill, and while I wouldn’t suggest getting in the habit of reading RFCs for fun, it’s something you should do from time to time.

To build the skill, I recommend picking a simple one, like UUIDs. There’s a lot of information encoded in a UUID, and five different ways to define UUIDs- though usually we use type 1 (timestamp-based) and type 4 (random). Even if you haven’t gone through and read the spec, you already know the most important fact about UUIDs: they’re unique. They’re universally unique in fact, and you can use them as identifiers. You shouldn’t have a collision happen within the lifetime of the universe, unless someone does something incredibly wrong.

Dexen encountered a database full of collisions on UUIDs. Duplicates were scattered all over the place. Since we’re not well past the heat-death of the universe, the obvious answer is that someone did something entirely wrong.

use Ramsey\Uuid\Uuid;
 
$model->uuid = Uuid::uuid5(Uuid::NAMESPACE_DNS, sprintf('%s.%s.%s.%s', 
    rand(0, time()), time(), 
    static::class, config('modelutils.namespace')))->toString();

This block of PHP code uses the type–5 UUID, which allows you to generate the UUID based on a name. Given a namespace, usually a domain name, it runs it through SHA–1 to generate the required bytes, allowing you to create specific UUIDs as needed. In this case, Dexen’s predecessor was generating a “domain name”-ish string by combining: a random number from 0 to seconds after the epoch, the number of seconds after the epoch, the name of the class, and a config key. So this developer wasn’t creating UUIDs with a specific, predictable input (the point of UUID–5), but was mixing a little from the UUID–1 time-based generation, and the UUID–4 random-based generation, but without the cryptographically secure source of randomness.

Thus, collisions. Since these UUIDs didn’t need to be sortable (no need for UUID–1), Dexen changed the generation to UUID–4.

[Advertisement] Keep the plebs out of prod. Restrict NuGet feed privileges with ProGet. Learn more.