The Daily WTF: Curious Perversions in Information Technology

2018-09-26 Reply Admin

Because GUIDs are too complicated/too clear/too numerical, I suppose.

2018-09-26 Reply Admin

Pfft, obviously the codes should be sequential. But it skips every fourth one; that's the devious part.

2018-09-26 Reply Admin

Depending on how the UUID is generated, there can be a difference between virtually no possibility of collision, and zero chance of collision. But the race conditions can be problematic (albeit extremely unlikely to ever be an issue in the lifetime of the application).

2018-09-26 Reply Admin

At 25 alphanumeric characters, they've actually got less chance of a collision (approx half) than with a GUID... assuming their random source is good. So yeah, either someone doesn't understand the astronomical odds of a collision, or they've stuffed up their random source. Or more likely, both.

2018-09-26 Reply Admin

should a bit simplier using a do-while function instead of dublicating code for generation:

private string GetUniqueVerificationCode() { do { var code = RandomStringGenerator.GetRandomAlphanumericString(50); } while (this.userVerificationCodeRepository.CodeExists(code));

return code;

}

2018-09-26 Reply Admin

It only takes 32 bytes to represent a UUID as a textual hexadecimal string. 33 if your programming language uses null-terminated string types.

Woo-hoo - savings for everyone!

2018-09-26 Reply Admin

It's not great, but it's fit for purpose, as it would still be if it didn't even have the loop.

In the real world, security is about risk. the chances that a hacker attacker would even attempt to attack here, let alone succeed, are vanishingly small

2018-09-26 Reply Admin

Aha! I see TRWTF -- it should be:

private string GetUniqueVerificationCode(int strLen)
{
    var code = RandomStringGenerator.GetRandomAlphanumericString(strLen);
    while (this.userVerificationCodeRepository.CodeExists(code))
    {
        code = RandomStringGenerator.GetRandomAlphanumericString(strLen);
    }
    return code;
}

BernieTheBernie · 2018-09-26 Reply Admin

The reason fo rpotential failure is far simpler. The guy who wrote GetRandomAlphanumericString initializes the RandomGenerator with a hard-coded seed value. Problems would arise with the original version when they had more than 10 users, but fortunately the new solution will work for any number of users. Late users just need to wait a little longer...

2018-09-26 Reply Admin

And if the comment is to believed, the probable use is -- generate a unique 50 character string, then take the first 25 characters of the result. Which means, of course, that the first 25 characters could be duplicate of some other result.

Also, as more and more "unique" strings are generated, that lookup (to determine if there is a duplicate string) is going to take longer and longer.

2018-09-26 Reply Admin

If you go from 25 to 50 characters that's a bit more than "Ten times as unique"...

Medinoc · 2018-09-26 Reply Admin

For this kind of things, I always use the thread ID and timestamp (and if necessary, the computer name too) -- Essentially a version 1 UUID that doesn't need inter-thread synchronization. And if the value need to be unpredictable in addition to unique, then I append a random string.

2018-09-26 Reply Admin

The entire business design is a WTF. Even my Bank/Cell/Broker/etc accounts never ask for more than 6 decimal digits for a two-factor authorization code. 25 mixed case alphanumerics is ridiculous. Who is going to get it right in under 30 seconds? I would immediately terminate business with any financial services company that laid a bomb like that in my lap.

2018-09-26 Reply Admin

Perhaps even more important, it doesn't matter how many characters are in your "random" string, if they come from a seed-based random number generator. If the random number generator uses a 32-bit seed, and nothing else uses your instance of the seed (such as if it is local to your thread or process) such that all your random numbers are sequential, it is impossible to generate more than 2^32 different combinations. You might as well store seed.ToString in your database.

2018-09-26 Reply Admin

Well, I've seen similar in a codebase I'm currently maintaining.

The point in that one is that the random string should be

on the format "ABCD-EFGH-JKLM"
No "easily mistaken glyphs", so no I or l or O or 0

UUIDs really don't help with that.

2018-09-26 Reply Admin

Ugh.

Web-based support app I used to look after had a similar thing to generate "unique" ticket codes. It would generate one and then check to see if it was in use. This got noticeably slower over time. Unfortunately, I couldn't change the format of the ticket codes (they were created by the business owners). And they were smaller than a UUID. And I wasn;'t au fait with UUIDs, anyway. So I did a hack to do the generation in two steps; the first was the prefix and fished all the matching codes out of the database before generating a non-colliding second part. Much faster as the collisions gre more frequent.

I really wanted to get rid of those silly codes, but knew that wasn't going to happen before I left.

2018-09-26 Reply Admin

facepalm,

I have written this as well.

thought i was being really clever

2018-09-26 Reply Admin

The random number generator is probably seeded with a 32-bit value, thus ensuring there can be only 2^32 random results, that gets into the range where the birthday paradox probably bit them.

2018-09-26 Reply Admin

"There are a number of tools which generate universally unique identifiers, and there are hashing algorithms"

Hashing algorithms do have collisions too. Less frequent, but they do too. That's because a way to generate collision for MD5 was found that MD5 has been disqualified for cryptography uses.

2018-09-27 Reply Admin

"The best part is that regardless of which version of the code you use, since it’s part of a multiuser web application, there’s a race condition- the same code could be generated twice before being logged as an existing code in the database. That’s arguably more likely, depending on how the random generation is implemented."

Race condition is trivially dealt with by an ambient database transaction. You realise that solves 99% of race conditions and locking in real-world middle tier application code, right?)

2018-09-27 Reply Admin

"The best part is that regardless of which version of the code you use, since it’s part of a multiuser web application, there’s a race condition- the same code could be generated twice before being logged as an existing code in the database. That’s arguably more likely, depending on how the random generation is implemented."

Race condition is trivially dealt with by an ambient database transaction. You realise that solves 99% of race conditions and locking in real-world middle tier application code, right?)

2018-09-28 Reply Admin

Please don 't be proud of that. Best case you've reimplemented UUID generation, worst case your codes are not unique. Don't reinvent the wheel when there are perfectly good round ones available.

2018-09-28 Reply Admin

mmmmm, copy-pasta!! Anyone want seconds??

        using (RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider())
        {
            for (int i = 0; i < 10; i++)
            {
                // Fille the buffer with the generated data
                rng.GetBytes(data);
            }
        }

Jenda · 2018-10-02 Reply Admin

How do you know there isn't a perfectly good round generator inside the GetRandomAlphanumericString()? Besides they are not generating a specified number of random bytes to be used by the computer, they are generating something users will have to type out so they need control over that characters are used. Restricting the generated strings to just to 0-9,A-F is silly ... you are ignoring plenty of usable characters and have to make the codes way longer to get the same number of possibilities. OTOH, you do not want to use all alphanumeric characters. You want to skip capital I and lower case l (or maybe I should write capital i and lower case L?), if your users happen to live in Czech Republic you will also want to skip z, Z, y and Y (some idiot decades ago decided to swap Z and Y on the typewriter, some users use QWERTZ, others use QWERTY, yet others switch between Czech and US keyboard layout, ...) and I'm pretty sure other areas have their own characters they might like to skip. Whatever the local list of allowed characters is, it's better to hide it inside a common method.

2019-01-04 Reply Admin

Wow, so many smug comments, and all of you are wrong.

Just some examples of how wrong you are: "According to the comment, we expect 25 characters, but according to the call, it looks like it’s actually 50- GetRandomAlphanumericString(50)"

there is no indication in the snippet that the argument of function GetRandomAlphanumericString() is the length of the string
even if it is, it is more than likely that the code was changed later, without changing the comment (and probably not by the original author)

"Because GUIDs are too complicated/too clear/too numerical"

JavaScript does not have standard means to generate random GUIDs.

"At 25 alphanumeric characters, they've actually got less chance of a collision (approx half) than with a GUID" - only if the distributions of both are uniform, which is not indicated by the code (and generally unlikely). You assume that the string generator 1) perfect 2) will never change. Both assumptions are generally wrong. The original code is actually resilient against bad random string generation and fails gracefully, with an error message, if it becomes completely bad ("bad" as in "not fit for the purpose" and not some random statistical parameter).

"It only takes 32 bytes to represent a UUID as a textual hexadecimal string. 33 if your programming language uses null-terminated string types."

Or twice that if it is Java with its stupid "almost-UTF-16" strings.

"while", "do while" instead of for - congratulations, you have just introduced a potential infinite loop in case of a bad GetRandomAlphanumericString() function. If you think hanging is better than an exception with an error message, you should never ever be programming again.

2019-01-04 Reply Admin

continuing...

"Perhaps even more important, it doesn't matter how many characters are in your "random" string, if they come from a seed-based random number generator. If the random number generator uses a 32-bit seed, and nothing else uses your instance of the seed (such as if it is local to your thread or process) such that all your random numbers are sequential, it is impossible to generate more than 2^32 different combinations"

You mistake "seed" and "state".
The STATE of the PRNG is unlikely 32 bit in any modern version of JS and the implementation using Math.random() returning FP64.

"Best case you've reimplemented UUID generation"

UUID is not best case even for 16-byte strings, as it has type/compatibility bits, and its conformant textual representation is very wasteful. And early versions, with MAC address and low-precision timestamp in them, were atrocious for any transactional use.

"Problems would arise with the original version when they had more than 10 users"

Total non-sequitur.

And finally a correct comment dealing with "race condition" BS: "Race condition is trivially dealt with by an ambient database transaction. You realise that solves 99% of race conditions and locking in real-world middle tier application code, right?)"

Ten Times as Unique

Leave a comment on “Ten Times as Unique”