The Daily WTF: Curious Perversions in Information Technology

kt_ · 2016-02-21 Reply Admin

Frist!

kt_ · 2016-02-21 Reply Admin

Hey, duplicate post/article! Ban Paula, ban Paula!

:police_car: :oncoming_police_car: :passport_control:

ChrisH · 2016-02-22 Reply Admin

Because random and unique totally are the same things.

giammin · 2016-02-22 Reply Admin

unique ids created with random??? 2 wtf in a row!

Yazeran · 2016-02-22 Reply Admin

Yea, I was just going to say the same thing, if you need unique uids, use a sequence type of generator (most db's have one, and if no db, you could do it with some kind of small program which uses file locking or similar to make sure only one request at a time and then keep an integer and increment that.

To be fair though, depending on the number of requests, using a random number can work, but you have to keep it large enough to make sure that collisions are sufficiently rare

PWolff · 2016-02-22 Reply Admin

finding clever solutions

If her first solutions did really work, "clever" is the wrong word.

At first, I expected something along an IH UUID generator, but this is even better.

MikeWoodhouse · 2016-02-22 Reply Admin

Unusually, I'm not sure which is the bigger WTF here.

Picking 8 letters from 26 gives 26252423222120*19 combinations. A touch under 63 billion.

rand(1_000_000_000), on the other hand (I added the underscores, which Ruby allows, to more easily see the value) can only produce 1 billion results. So Peri wins, doesn't she?

Of course, allowing duplicates, something like (1..8).map{('a'..'z').to_a[rand(26)]}.join, gets 26^8 possible values (about 209bn), and including uppercase characters bumps that to 535 trillion. So Peri clearly missed a trick there.

isthisunique · 2016-02-22 Reply Admin

What's wrong with id++?

Medinoc · 2016-02-22 Reply Admin

Synchronization issues, I'd guess.

Medinoc · 2016-02-22 Reply Admin

Aren't various kind of web session IDs (php, ASP.Net, etc.) random?

David_Taylor · 2016-02-22 Reply Admin

So, the WTF is Steven?

Troels_Larsen · 2016-02-22 Reply Admin

What about just:

require 'securerandom' SecureRandom.uuid

Since this seems to be what they're doing anyway (generating GUIDs)

martin · 2016-02-22 Reply Admin

I love these "seniors" who cannot stand solutions different from theirs!

Programming is fun and the best is, that there is no single correct solution for a problem. There is an infinite space of possible solutions. Some short, some long, some creative, some stupid. You can choose anything you like.

And some people are so silly that they like their own solution only! And other's they mark as WRONG!

Code review should be about code, not a person. Steve wanted to show he is a alpha male and a woman is not good for programming. He should be fired.

BruceW · 2016-02-22 Reply Admin

David_Taylor:
So, the WTF is Steven?

Agreed. And as MikeWoodhouse stated, her solution creates more ids. She fully met the stated requirement.

ixvedeusi · 2016-02-22 Reply Admin

BruceW:
She fully met the stated requirement.

ahem...

TFA:
generate some **unique** IDs

(emphasis mine)

Neither of the two proposed solutions fully meets the spec.

</pendanterie>

PJH · 2016-02-22 Reply Admin

cellocgw · 2016-02-22 Reply Admin

Just pointing out that, at least in some languages, there's a builtin option something like "rand(N,replace=FALSE)" so that what you get is a shuffle of the inputs, not a random selection with possible repeats.

Dragnslcr · 2016-02-22 Reply Admin

Troels_Larsen:
What about just:
require 'securerandom' SecureRandom.uuid

Because that wouldn't be clever.

noah · 2016-02-22 Reply Admin

Came to make the same point. Neither solution is truly unique. 8 random letters is going to be more user friendly than a large random number. The original was a perfectly fine solution to a simple, unimportant problem. The real WTF is that a "senior" developer created strife within his team, duplicated work, and then went to brag about it without realizing his "improved" solution was not any better.

I wouldn't want to work with Steve. Peri might have some promise.

EatenByAGrue · 2016-02-22 Reply Admin

Predictability is a bad idea from a security standpoint. For example, if these were session IDs, and I log in as user John with session ID 100054, I could modify my session cookie to session ID 100053 and I'm now logged in as whoever logged in just before I did.

martin · 2016-02-22 Reply Admin

The reason is different. App should be protected agains this simple session ID modification. But you could be able to see the total number of users or number of new user registrations per hour. That could be a problem.

Masaaki_Hosoi · 2016-02-22 Reply Admin

Sorta agree. Neither of their solutions generates a unique ID, but Steve's solution has a smaller space of possible values so it seems like it would be more likely than Peri's to result in duplicate IDs, just from a statistical point of view. They both should have googled the subject, but seems like Peri might have at least been trying to reduce the possibility of ID collision.

RFoxmich · 2016-02-22 Reply Admin

How is the ruby randomizer seeded? Might be a very non-unique id generator if the seed is a run time constant.

dkf · 2016-02-22 Reply Admin

RFoxmich:
How is the ruby randomizer seeded?

Unless you're using an explicit seed, it does a reasonable thing (pulling from a lower-level RNG, possibly the one provided by the OS).

Ragnax · 2016-02-22 Reply Admin

ixvedeusi:
Neither of the two proposed solutions fully meets the spec.

Neither do GUIDs. Theoretically. It's just that the result space is big enough that the statistical odds of encountering a dupe are low enough to discard for (most) practical use(s).

anotherusername · 2016-02-22 Reply Admin

Masaaki_Hosoi:
Neither of their solutions generates a unique ID, but Steve's solution has a smaller space of possible values so it seems like it would be more likely than Peri's to result in duplicate IDs, just from a statistical point of view.

rand accomplishes basically the same exact thing with far less overhead than the complicator's gloves solution. It's better in pretty much every other way; if a larger space of values is needed, then the number can simply be increased.

Matt_Westwood · 2016-02-22 Reply Admin

Come on. Even Knuth made mistakes programming a random number generator.

Captain · 2016-02-22 Reply Admin

rand accomplishes basically the same exact thing with far less overhead than the complicator's gloves solution. It's better in pretty much every other way; if a larger space of values is needed, then the number can simply be increased.

I agree completely. In principle, a shuffle on 26 objects consumes at least 62 bits of entropy, but by taking only the first 8 chars, the function outputs about 25 bits of entropy. The rest is literally lost as heat.

A single random Int64 requires exactly 64 bits of entropy to produce (and contains exactly that many bits of entropy).

blakeyrat · 2016-02-22 Reply Admin

Matt_Westwood:
Even Knuth

[image]

Salamander · 2016-02-22 Reply Admin

You get about 51% chance of having duplicates with rand after ~38,000 values. The a-z shuffle version gets to that point at ~300,000 values. They both suck, but rand sucks way more if you want unique values, and the shuffle one can be increased massively by allowing duplicates and taking more than 8 characters.

Anthony_McLin · 2016-02-22 Reply Admin

WTF #1 - Steven's "better solution" is NOT UNIQUE so he failed the requirements as well. WTF #2 - Steven replaced the code instead of using this as a learning experience for Perl, exhibiting bad team leadership WTF #3 - I hope these aren't IDs for HTML elements where an ID cannot start with a number. At least Perl's randomizer would have generated valid HTML IDs. (... for HTML4. HTML5 is more permissive, but browsers may behave unexpectedly...)

Steven is an ass and should rightfully be made fun of for submitting his own WTF.

anotherusername · 2016-02-23 Reply Admin

Salamander:
You get about 51% chance of having duplicates with `rand` after ~38,000 values. The a-z shuffle version gets to that point at ~300,000 values.

Did you even read what I posted? Yes, the overcomplicated version gives around 6.3×10¹⁰ possibilities, but that's really irrelevant. If you need the larger value space, just change the rand argument from 10**9 to 6.3*10**10... now they're mathematically equivalent, and the rand version is still a lot simpler and less complex.

Salamander:
the shuffle one can be increased massively by allowing duplicates

The shuffle one can't allow duplicates, unless you simply repeat the original range so the array contains more than one of each character; then your "pick 8" can contain duplicates, but that'll compound the inefficiency problem; you're just shuffling even more characters only to throw them away in the end.

Salamander:
and taking more than 8 characters

That will increase the range of results, but it's fundamentally no different than increasing the argument to rand, and it's still an over-complicated, less-efficient way to do it.

Really, the best way to do this has to just use rand, because that is the simplest. In fact... you want letters? Letters are fine. Base 36 has letters. Here, 64 bits of entropy in base-36, that should be adequate: rand(2**64).to_s(36)

If not, concatenate two of them.

anotherusername · 2016-02-23 Reply Admin

Note that, regardless of how big or small your possible result space is from your function, any code would be incomplete unless in some way it would detect a collision and handle that gracefully. Her code, his code, even my code... no matter how unlikely a collision is, you really need to be able to handle it if one occurs.

Salamander · 2016-02-23 Reply Admin

anotherusername:
Did you even read what I posted

Yes, and I disagree that it's the *best* solution. It's *a* solution, with its own set of cons: Its ID's get significantly longer than characters, and they are arbitrary-length rather than fixed length. Whether or not those are relevant depends entirely on how they are using them.

anotherusername:
The shuffle one can't allow duplicates, unless you simply repeat the original range so the array contains more than one of each character

So it can't, except that it can. Right. If you actually find that you have a performance issue, you can write it so that instead of shuffling the array, it randomly selects each character from the range. Or just use an incrementing counter. Or not use Ruby.

anotherusername:
If not, concatenate two of them.

Is `AAB` = `A+AB` or `AA+B`?

anotherusername · 2016-02-23 Reply Admin

Salamander:
So it can't, except that it can. Right.If you actually find that you have a performance issue, you can write it so that instead of shuffling the array, it randomly selects each character from the range.Or just use an incrementing counter. Or not use Ruby.

Oooooorrrrrrrr... and I know, this is a real stretch of logic... or you could just do it the right way from the very outset.

Salamander:
Is `AAB` = `A+AB` or `AA+B`?

Yes, of course.

Seriously, though, you're right, but it’s hardly significant. You're still going to have well over 100 bits of entropy, far beyond even that 26-pick-8... no, correction: far beyond even 26-pick-26 (all 26!, or 4×10²⁶, combinations)... in fact, I think that's beyond even 26-pick-26 with duplication (26²⁶)...

...well, basically, if that's not enough, I don't think bigger numbers are going to help you. As I said before, the code still needs to handle collisions gracefully.

anotherusername · 2016-02-23 Reply Admin

And, of course, you could always just use hexadecimal instead of base-36, and then the 64-bit number corresponds to exactly 16 characters of hex. Pad it to 16 just to make sure, and there's no risk of ambiguous overlap when you concatenate two of them.

Or, heck, you could even use base-256. But I'm not sure if you could still do that in a one-liner.

ixvedeusi · 2016-02-23 Reply Admin

anotherusername:
you could just do it the right way from the very outset

For my part, I find Peri's solution preferable in every way except performance. It is concise, very easily readable and understandable, and the code says exactly what it does: "Take the letters from a to z, shuffle them and then pick the first eight". It is immediately obvious that this generates a string of 8 random letters from a to z. Keep in mind that I haven't seen more that a few trivial examples in Ruby in my life, and can still easily understand it.

In contrast, the "improved" version says "generate a random number between 0 and 999'999'999 (or is it 1'000'000'000? Who knows? Time to check the docs!). But wait, for an ID we need a string, so why did we generate a number? How do we map that number to a string? Will leading zeros be included? what length will that string have? ...

And finally, I find the IDs generated by Peri's code much more convenient. I find series of letters of fixed length much more easily recognizable and memorable than arbitrary very large numbers.

Zemm · 2016-02-23 Reply Admin

PaulaBean:
Peri

Is it just me or does anyone else keep reading this name as Perl?

ixvedeusi · 2016-02-23 Reply Admin

Zemm:
does anyone else keep reading this name as Perl?

:raised_hand:

dkf · 2016-02-23 Reply Admin

anotherusername:
Or, heck, you could even use base-256.

That's fine internally, but can't be used as an ID string in an interface; many of those characters are not happy when passed through a GUI or URL.

FTB · 2016-02-23 Reply Admin

"rand(1_000_000_000), on the other hand (I added the underscores, which Ruby allows..."

WTF?

MikeWoodhouse · 2016-02-23 Reply Admin

Simples: which is easier to read and understand, 1000000000 or 1_000_000_000?

Ruby just has slightly enhanced numeric constant parsing, which is occasionally useful.

Shoreline · 2016-02-23 Reply Admin

I feel as though if MySQL can come up with UUID and MAX, then we can do a bit better than RANDOM in Ruby.

ixvedeusi · 2016-02-23 Reply Admin

MikeWoodhouse:
which is easier to read and understand, 1000000000 or 1_000_000_000?
Ruby just has slightly enhanced numeric constant parsing, which is ~~occasionally~~ tremendously useful.

FTFM. It seems like such a small thing, but it makes reading large integer constants so much easier. I wish python and C++ supported it :cry:

accalia · 2016-02-23 Reply Admin

MikeWoodhouse:
Simples: which is easier to read and understand, 1000000000 or 1_000_000_000

the answer is: 1e9

that's a one, followed by nine zeros.

easy!

ixvedeusi · 2016-02-23 Reply Admin

Yes, but that's a float not an int

EDIT: Also, if you have something else than zeros after the first digit, or want it in hex, you lose.

accalia · 2016-02-23 Reply Admin

ixvedeusi:
Yes, but that's a float not an int

fun fact, if you are assigning that to an integer type variable, it works.

if you are in a duck typed language, well, either deal with it(tm) or slap an int cast around it, it's still easier to read.

ixvedeusi:
if you have something else than zeros after the first digit, or want it in hex, you lose.

which is best?

9223372036854775807

or

9_223_372_036_854_775_807

or

2**63-1?

they're all the same number, but i'm pretty sure you will agree that the last one is easiest to read.

ixvedeusi · 2016-02-23 Reply Admin

accalia:
which is best?
9223372036854775807

or

9_223_372_036_854_775_807

or

2**63-1?

0x7FFF_FFFF_FFFF_FFFF

accalia · 2016-02-23 Reply Admin

ixvedeusi:
0x7FFF_FFFF_FFFF_FFFF

and that's better...... why? :wtf:

ixvedeusi · 2016-02-23 Reply Admin

It's more obvious where the '63' came from, and why that number is "special", for one thing; at least to me.

I might be a bit biased on this, because I write device drivers for a living, so the underlying bit pattern is often significant to me.

Also, your variant is :moving_goal_post:: as it's not a literal but an expression, and I don't have a "power" operator for literals in C++, so it would be (1<<63)-1 for me.

Also, counter-example:

12_004_624_821_123

Find a fancy, readable expression to write this ?

Random Ruby

Leave a comment on “Random Ruby”