The Daily WTF: Curious Perversions in Information Technology

2011-04-18 Reply Admin

And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.

2011-04-18 Reply Admin

Frist! And this is to convince Akismet that I'm not a spammer.

2011-04-18 Reply Admin

Quite odd that the amount of duplicate data still was reduced with hash function. :)

2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Frist! And this is to convince Akismet that I'm not a spammer.

What the??? It said 0 comments at the first attempt. Akismet, I'll kill you!

2011-04-18 Reply Admin

'he just implemented a quick SHA-1'

Thats the real WTF, right there.

2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Frist! And this is to convince Akismet that I'm not a spammer.

Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

2011-04-18 Reply Admin

Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.

function createHash()
{
  return globalCounter++;
}

Hey, that was easy!

2011-04-18 Reply Admin

Bill Frist:
Sir Robin-The-Not-So-Brave:
Frist! And this is to convince Akismet that I'm not a spammer.
Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

∀ p ∊ P etc... ∎ (qed)

2011-04-18 Reply Admin

any_number % 1 == 0

So this function always returns a string of zeroes.

2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Bill Frist:
Sir Robin-The-Not-So-Brave:
Frist! And this is to convince Akismet that I'm not a spammer.
Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

∀ p ∊ P etc... ∎ (qed)

So you're saying that you never had a badge to begin with? That's fraud son, and it's a much more serious crime. I'm going to have to ask you to accompany me down to the station.

2011-04-18 Reply Admin

Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!

I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.

2011-04-18 Reply Admin

Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!

It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.

2011-04-18 Reply Admin

Hashcoder:
any_number % 1 == 0
So this function always returns a string of zeroes.

any_int % 1 == 0; Math.random() returns a Double with a value between 0 and 1. The remainder is the part after the comma, which is a bit silly, but isn't wrong.

2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Bill Frist:
Sir Robin-The-Not-So-Brave:
Frist! And this is to convince Akismet that I'm not a spammer.
Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

∀ p ∊ P etc... ∎ (qed)

You keep telling yourself that, maybe you'll start believing it - but we won't.

2011-04-18 Reply Admin

The previous two commenters, talking about how incrementing a global counter fails due to concurrency, or whatever nonsense Sir Robin-the-Not-So-Brave is on about, are completely missing the point of a hash.

The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

2011-04-18 Reply Admin

AA:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.

Oh my God, I'm exactly that sort of person! I also like long walks on the beach and people with a good sense of humour. Oh, I guess we're not compatible after all.

2011-04-18 Reply Admin

mangobrain:
The point of a hash is (...) to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

I stand corrected and you are 100% correct, Sir!

2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.

Sir Robin run away. The entire point of the exercise is to generate hash collisions so you can see if the data is duplicated.

2011-04-18 Reply Admin

If only there was a Math.seed() function, that could take arbitrary input. Then you could feed everything you wanted hashing into that, and this function would do something approximately correct.

2011-04-18 Reply Admin

Ehrm...

I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

2011-04-18 Reply Admin

Visage:
'he just implemented a quick SHA-1'
Thats the real WTF, right there.

Not really unless he really did implement the SHA algorithm. He probably meant he "implemented a quick SHA-1" based hashing function. I.e take a bunch of inputs and feed them to the SHA function.

Bad function though. He forgot to seed the random number generator...

dohpaz42 · 2011-04-18 Reply Admin

Without proper context, I could agree that the createHash function is a WTF. However, if you imagine that createHash is called when the page is loaded, and then passed during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed, or adding an interstitial page would certainly be better. So this is still a WTF, but not for the reason other posters have stated.

2011-04-18 Reply Admin

TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

It didn't, but once in a while, a new submission would (at random) be assigned the same "hash" as an existing submission. That, and a healthy dose of placebo effect.

dohpaz42 · 2011-04-18 Reply Admin

TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

It wasn't a hash to check the data; it was a hash to check that the data was only posted once.

2011-04-18 Reply Admin

mangobrain:
The previous two commenters, talking about how incrementing a global counter fails due to concurrency, or whatever nonsense Sir Robin-the-Not-So-Brave is on about, are completely missing the point of a hash.
The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

Does anyone else hear that odd whooshing sound?

2011-04-18 Reply Admin

dohpaz42:
Without proper context, I could agree that the createHash function is a WTF. However, if you imagine that createHash is called when the page is loaded, and then passed during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed, or adding an interstitial page would certainly be better. So this is still a WTF, but not for the reason other posters have stated.

The article clearly states in the first paragraph that "users from different parts of the world aggregate and enter all sorts of data from different sources". That is more than enough context to understand that they were trying to hash the data, not a single page instance. Beside, a hash is a hash - what you are descibing would not constitute a "hash", merely a way of assigning an ID to a page instance.

2011-04-18 Reply Admin

The objective is to remove duplicates. A hash of the text will provide a quick way to be sure two texts are different. Now, to make sure they are equal (after you got equal hashes) you will have to compare the texts. Using a "hash" that does not depends on the text will allow duplicates to slip through but will not loose texts.

If they are doing the comparison...

Bumble Bee Tuna · 2011-04-18 Reply Admin

TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

http://s260.photobucket.com/albums/ii12/REexpert44/%3Faction%3Dview%26current%3Dthats_the_joke.jpg

2011-04-18 Reply Admin

mangobrain:
TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

It didn't, but once in a while, a new submission would (at random) be assigned the same "hash" as an existing submission. That, and a healthy dose of placebo effect.

Further to that, they would be losing data because of said placebo effect.

management was satisfied with the reduction in duplicate data

I'm curious if there was an actual measurable improvement, or as you stated, simply a placebo effect... Hmm, less records means its working, right? Ahh, I should've looked closer, it was management that noticed the reduction in duplicate data; who better to vet this type of metric than someone who would pay someone like Nagesh to write their apps for them...

2011-04-18 Reply Admin

I recognise that code - its the part of the knobworks v11.78 patch - so SamF's predecessor was that small shaven poodle called Earsmus Pink. Man that blitch's code output was legendary, 10 LOC per day and all of it based on her canine logic that even our experts could not argue with...

2011-04-18 Reply Admin

This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.

frits · 2011-04-18 Reply Admin

If anything this thread has taught me it's to not post to soon (or often).

It's interesting how the term "hash" automatically biases us to think "lookup". The code in the article works fine and fills the requirement. Using a UUID or even a single 32 bit random number would have been slightly more elegant.

I would to see how long they cache the "hash" values.

2011-04-18 Reply Admin

TheSHEEEP:
Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

Depending on how it was used, it may have been to prevent multiple submission. If the data entry person clicked Submit multiple times before the page reload, the same random number would be submitted on each form. On the server side, you can check that.

It would have the effect of reducing the amount of "duplicate" data, and it would occasionally tell a data entry operator that "duplicate" data was found.

And yeah, used this way, it's still a WTF.

2011-04-18 Reply Admin

kktkkr:
This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.

That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!

bertram · 2011-04-18 Reply Admin

mangobrain:
The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

So...you're saying...that the code example in the original post isn't very good?

Have the admins been informed of this?

frits · 2011-04-18 Reply Admin

Shiva:
kktkkr:
This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.
That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!

OK. That's great and all, but what if you want to support multiversalization?

bertram · 2011-04-18 Reply Admin

Bumble Bee Tuna:
TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

http://s260.photobucket.com/albums/ii12/REexpert44/%3Faction%3Dview%26current%3Dthats_the_joke.jpg

No, that isn't the joke. The article clearly says that duplication was reduced. This has yet to be explained.

2011-04-18 Reply Admin

frits:
The code in the article works fine

No it doesn't.

frits:
and fills the requirement.

No it doesn't.

How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

2011-04-18 Reply Admin

Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Wow, someone bit! Good job frits... I always thought you were too high brow to troll.

2011-04-18 Reply Admin

C-Octothorpe:
Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Wow, someone bit! Good job frits... I always thought you were too high brow to troll.

Damn it you guys, I expect the trolls round here to be the typical "your not too smart" nonsense. OK, well done, well done.

frits · 2011-04-18 Reply Admin

Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Ahem.

2011-04-18 Reply Admin

frits:
Shiva:
kktkkr:
This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.
That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!

OK. That's great and all, but what if you want to support multiversalization?

Hmm... I'll need to identify the differences that distinguish each instance of the universe from the next, then feed that into the hashing algorithm. This might take some time, although logically I've already figured it out in once of the alternative universes so problem solved!

2011-04-18 Reply Admin

frits:
Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Ahem.

Sorry, but where in the article does it say this? I realize it doesn't explicitly say that it's using it to compare what is on the form with the DB, though it's implied saying that the hash was used to alert the user if they were entering duplicate data. Nowhere does it imply that it was used to prevent moron users from being click-happy...

Aw shit, I just got trolled... DAMNIT!

ubersoldat · 2011-04-18 Reply Admin

"Math.floor(9999999999999 * (Math.random() % 1));"

Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

2011-04-18 Reply Admin

ubersoldat:
"Math.floor(9999999999999 * (Math.random() % 1));"
Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

Go big or go home, maybe?

2011-04-18 Reply Admin

After smoking all that hash... a random number generated by the client's browser seems a perfectly sensible way of detecting whether Input A == Input B, even if the two are separated by time, space, and, thanks guys, the re-instatiation of the multiverse.

2011-04-18 Reply Admin

frits:
Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Ahem.

Am I missing something here? Sure, it was "a hash to check that the data was only posted once" - but posted once by any user anywhere in the world at any time:

The Article:
Users from different parts of the world aggregate and enter all sorts of data from different sources.

It wasn't designed to prevent one user from entering the data twice - it was designed to prevent global duplicates, across all users all over the world. At least, that's what I understood from the article. Am I wrong?

2011-04-18 Reply Admin

Wait, did I just get double-trolled?

2011-04-18 Reply Admin

C-Octothorpe:
ubersoldat:
"Math.floor(9999999999999 * (Math.random() % 1));"
Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

Go big or go home, maybe?

Might be more enterprisey to find the supposed duplicate ratio and block every "n"th insert; 100% success rate!

frits · 2011-04-18 Reply Admin

Shiva:
frits:
Shiva:
frits:
The code in the article works fine
No it doesn't.
frits:
and fills the requirement.
No it doesn't.
How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

Ahem.

Am I missing something here? Sure, it was "a hash to check that the data was only posted once" - but posted once by any user anywhere in the world at any time:
The Article:
Users from different parts of the world aggregate and enter all sorts of data from different sources.
It wasn't designed to prevent one user from entering the data twice - it was designed to prevent global duplicates, across all users all over the world. At least, that's what I understood from the article. Am I wrong?

Wouldn't that be a business practice WTF? Why would different users from all over the world have a data-entry race condition?

I'm not trolling, but I think the OP may be.

The Nondeterministic Hash

Leave a comment on “The Nondeterministic Hash”