The Daily WTF: Curious Perversions in Information Technology

PedanticCurmudgeon · 2011-04-18 Reply Admin

Shiva:
Damn it you guys, I expect the trolls round here to be the typical "your not too smart" nonsense. OK, well done, well done.

Your not too smart, are you?

There, are we happy now?

2011-04-18 Reply Admin

TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

"Management was satisfied with the reduction ... " should not be taken as meaning that there actually was a reduction.

2011-04-18 Reply Admin

F:
TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

"Management was satisfied with the reduction ... " should not be taken as meaning that there actually was a reduction.

Exactly. The fake hash (basically just a random number) would make it seem like there were less duplicate entries. It wouldn't be true, but it would certainly appear that way.

2011-04-18 Reply Admin

well you'd obviously need a UUID

that is, a Unique Universe IDentifier

dgvid · 2011-04-18 Reply Admin

Actually, it's (nTrolled % 1) times.

2011-04-18 Reply Admin

TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

Broken clock effect. Every so often, the function would generate a duplicate hash by chance, and occasionally this would correspond to actual duplicate data.

2011-04-18 Reply Admin

anon:
well you'd obviously need a UUID
that is, a Unique Universe IDentifier

I think it's implemented something like this: http://rubbishsoft.com/longguid/

I hate akismet with a white-hot passion! Piece of crap-ware!

2011-04-18 Reply Admin

Once in a while? You mean once in every 9999999999999 * 9999999999999 * 9999999999999 = 1.0E39 data entries?

2011-04-18 Reply Admin

Anonymous:
AA:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.
Oh my God, I'm exactly that sort of person! I also like long walks on the beach and people with a good sense of humour. Oh, I guess we're not compatible after all.

Wow, that has so much win I can barely speak!

2011-04-18 Reply Admin

It's client side. You can never guarantee that Math.Seed() and Math.Rand() has the same implementation on all clients.

2011-04-18 Reply Admin

Math:
Once in a while? You mean once in every 9999999999999 * 9999999999999 * 9999999999999 = 1.0E39 data entries?

Pseudorandom number generator

wikipedia:
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm for generating a sequence of numbers that approximates the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's state.

2011-04-18 Reply Admin

So, nobody has the wit to simply put a unique constraint on the data field in question - ah, I know, the application can maintain data integrity, silly me.

2011-04-18 Reply Admin

Tharg:
So, nobody has the wit to simply put a unique constraint on the data field in question - ah, I know, the application can maintain data integrity, silly me.

Isn't that the "Right Way (TM)"?

RichP · 2011-04-18 Reply Admin

bertram:
No, that isn't the joke. The article clearly says that duplication was reduced. This has yet to be explained.

My guess: Sam's colleague automated the de-duplication script and set it to run once per day. The "hash" code was a smokescreen to let him slough off for a week or so.

2011-04-18 Reply Admin

Let's say that the hashes randomly match 10% of the time without regard to any other property of the entry. Basic statistics would suggest that you'd see a 10% reduction in the absolute number duplicate entries, just as long as you didn't bother to look to see if there was any corresponding decline in valid entries compared to the total number of records processed.

That's assuming that they even bothered to count how many duplicate records were still slipping through (which would have been a sign to anyone with a sense of what the system was doing that something wasn't right) and didn't simply count the number of rejections as the number of duplicate entries prevented, ignoring actual duplication.

2011-04-18 Reply Admin

KRG:
Let's say that the hashes randomly match 10% of the time without regard to any other property of the entry. Basic statistics would suggest that you'd see a 10% reduction in the absolute number duplicate entries, just as long as you didn't bother to look to see if there was any corresponding decline in valid entries compared to the total number of records processed.
That's assuming that they even bothered to count how many duplicate records were still slipping through (which would have been a sign to anyone with a sense of what the system was doing that something wasn't right) and didn't simply count the number of rejections as the number of duplicate entries prevented, ignoring actual duplication.

...right, but you're ignoring a key piece of information here: management noticed the decline in duplicate entries...

You have a group of people who get distracted by shiny things analyzing the statistics of their data.

2011-04-18 Reply Admin

Tharg:
So, nobody has the wit to simply put a unique constraint on the data field in question - ah, I know, the application can maintain data integrity, silly me.

The data field? Oh, you idealistic DB types! The "database" is flat-file and your only means of interacting with it is via simple text read/writes (this is TDWTF after all). What do you do now?

hoodaticus · 2011-04-18 Reply Admin

AA:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.

Incrementing an integer can be done in an atomic fashion...

hoodaticus · 2011-04-18 Reply Admin

Sir Robin-The-Not-So-Brave:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.

Wow, that comment was an even bigger WTF than the original post! Congratulations!

2011-04-18 Reply Admin

hoodaticus:
Sir Robin-The-Not-So-Brave:
Anonymous:
Anonymous:
And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
function createHash()
{
  return globalCounter++;
}
Hey, that was easy!
It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.
Wow, that comment was an even bigger WTF than the original post! Congratulations!

As if you're surprised! This is TDWTF after all...

ShatteredArm · 2011-04-18 Reply Admin

And therein laid the problem

This is the most maddening thing about this whole article. It should be "lay."

2011-04-18 Reply Admin

hoodaticus:
Incrementing an integer can be done in an atomic fashion...

Except "return globalCounter++;" isn't just reading an integer.

TheCPUWizard · 2011-04-18 Reply Admin

Regarding the "post twice" vs. "multiple entry of the same data".....

I vote for the former. If people in different areas have different (unique) sources of data, and each only knows about their own source then the later is unlikely to happen.

Does not excuse this being a poor way to handle it though...

2011-04-18 Reply Admin

dohpaz42:
...during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed...

How do you plan to disable my submit button when I don't choose to give you control over my browser? And oh by the way you do realize that you are attempting client-side input control, which I can easily defeat, which means you have to implement server-side input control too, and at that point, why bother duplicating the effort on the client where it is only effective some of the time? Because your employer, perhaps, has money to waste?? People who suffer from such sloppy thinking make me long for a device that will reach out of your monitor and slap your face to wake you up.

I really do wish all you losers who use client side scripts to validate data would just dry up and blow away.

2011-04-18 Reply Admin

Just Me:
A hash of the text will provide a quick way to be sure two texts are different.

How is it faster to compute two hashes and compare them vs. just comparing the two strings?

Sir Twist · 2011-04-18 Reply Admin

hoodaticus:
Incrementing an integer can be done in an atomic fashion...

Yes, but "++" is not generally one of those fashions.

2011-04-18 Reply Admin

Ralph:
dohpaz42:
...during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed...
How do you plan to disable my submit button when I don't choose to give you control over my browser? And oh by the way you do realize that you are attempting client-side input control, which I can easily defeat, which means you have to implement server-side input control too, and at that point, why bother duplicating the effort on the client where it is only effective some of the time? Because your employer, perhaps, has money to waste?? People who suffer from such sloppy thinking make me long for a device that will reach out of your monitor and slap your face to wake you up.
I really do wish all you losers who use client side scripts to validate data would just dry up and blow away.

Did anybody else hear the wooshing sound? It gets louder everytime I hear it...

Huh, wierd. The way things are going today, I'm sure we'll hear it again, and again...

2011-04-18 Reply Admin

There's no WTF here. OK, except maybe that the OP and his colleague were working with unclear requirements. And that the unique id generator is called a hash. And languages that define modulo arithmetic on non-integers. And that the modulo operator was a no-op due to the input data. And the unnecessary use of integers more than 32-bits wide. But yeah, other than that stuff, not a WTF.

2011-04-18 Reply Admin

Ralph:
dohpaz42:
...during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed...
How do you plan to disable my submit button when I don't choose to give you control over my browser? And oh by the way you do realize that you are attempting client-side input control, which I can easily defeat, which means you have to implement server-side input control too, and at that point, why bother duplicating the effort on the client where it is only effective some of the time? Because your employer, perhaps, has money to waste?? People who suffer from such sloppy thinking make me long for a device that will reach out of your monitor and slap your face to wake you up.
I really do wish all you losers who use client side scripts to validate data would just dry up and blow away.

Validating client side AND server side saves electrons. Not everyone has high speed internet access and doesn't mind a 800kb page refresh with each incorrect form submission.

2011-04-18 Reply Admin

Jeff:
Just Me:
A hash of the text will provide a quick way to be sure two texts are different.
How is it faster to compute two hashes and compare them vs. just comparing the two strings?

I think the "idea" was to create a hash of all the field values and pass around a single hash value rather than comparing each field each time (or passing around possibly 30, 50, etc. values). In fact I would store the hash of the record in an indexed column for easy comparison (assuming they have a DB)... Of course this would require a trigger to ensure if the data is ever changed to ensure the hash is recomputed, etc. Conceptually, it's a good idea, but the implementation was an epic fail.

I once read that the only thing worse than inaccurate data is inaccurate data that you think is right...

PedanticCurmudgeon · 2011-04-18 Reply Admin

ShatteredArm:
And therein laid the problem

This is the most maddening thing about this whole article. It should be "lay."

It must annoy you to no end whenever anyone "looses" a personal item.

2011-04-18 Reply Admin

Jeff:
How is it faster to compute two hashes and compare them vs. just comparing the two strings?

It's not about comparing two strings. It's about comparing one string against many strings. A hash table or an indexed db hash field can be used to accomplish this much more quickly than brute force.

(Feeling a bit trolled somehow...)

2011-04-18 Reply Admin

You sure it would randomly happen? because Math.random() % 1 isn't a random value - it will always be 0; so the hashed value will always be "000000000000000000000000000000000000000000000000"

2011-04-18 Reply Admin

mangobrain:
TheSHEEEP:
Ehrm...
I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

It didn't, but once in a while, a new submission would (at random) be assigned the same "hash" as an existing submission. That, and a healthy dose of placebo effect.

You sure it would randomly happen? because Math.random() % 1 isn't a random value - it will always be 0; so the hashed value will always be "000000000000000000000000000000000000000000000000"

2011-04-18 Reply Admin

Oh, god, the pain. It is blinding.

I have got to stop coming by TDWTF.

2011-04-18 Reply Admin

Once in 1000000000000000000000000000000000000000000000000. Yeah, I really don't think you'll ever have the same hash twice.

cod3_complete · 2011-04-18 Reply Admin

Yes this code is a massive WTF. But the biggest WTF is their business process. They should be splitting up the work when its assigned to avoid any two people from ever performing duplicate work in the first place.

2011-04-18 Reply Admin

Visage:
'he just implemented a quick SHA-1'
Thats the real WTF, right there.

Yup! This is no guarantee against collisions.

2011-04-18 Reply Admin

cod3_complete:
Yes this code is a massive WTF. But the biggest WTF is their business process. They should be splitting up the work when its assigned to avoid any two people from ever performing duplicate work in the first place.

No, actually this code is a larger WTF because duplicate data can be handled but coordinating the efforts of users across the world could be a physical impossibility...

Management trusted the developer to build them something to prevent this problem; instead they got something that actually makes them loose data and randomly may or may not prevent duplicate data from being entered.

I'd rather have duplicate data then no/missing data.

TheRider · 2011-04-18 Reply Admin

PedanticCurmudgeon:
ShatteredArm:
And therein laid the problem

This is the most maddening thing about this whole article. It should be "lay."
It must annoy you to no end whenever anyone "looses" a personal item.

This may well be, but I am more than happy that someone laid this issue to a rest.

hoodaticus · 2011-04-18 Reply Admin

Anon:
hoodaticus:
Incrementing an integer can be done in an atomic fashion...
Except "return globalCounter++;" isn't just reading an integer.

Good point!

2011-04-18 Reply Admin

Mike:
Jeff:
How is it faster to compute two hashes and compare them vs. just comparing the two strings?
It's not about comparing two strings. It's about comparing one string against many strings. A hash table or an indexed db hash field can be used to accomplish this much more quickly than brute force.
(Feeling a bit trolled somehow...)

You replied to a comment on TDWTF, therefore you've been trolled.
I replied to a comment on TDWTF, therefore I've been trolled. Anybody who disagrees...

2011-04-18 Reply Admin

Nagesh and his outsourcing office strike again!

2011-04-18 Reply Admin

airdrik:
Mike:
Jeff:
How is it faster to compute two hashes and compare them vs. just comparing the two strings?
It's not about comparing two strings. It's about comparing one string against many strings. A hash table or an indexed db hash field can be used to accomplish this much more quickly than brute force.
(Feeling a bit trolled somehow...)

You replied to a comment on TDWTF, therefore you've been trolled.
I replied to a comment on TDWTF, therefore I've been trolled. Anybody who disagrees...

Hey, that's not tr—

Oh. Oops.

2011-04-18 Reply Admin

ÃƒÆ’Ã†â€™Ãƒâ€ Ã¢â‚¬â„¢ÃƒÆ’Ã¢â‚¬Â ÃƒÂ¢Ã¢â€šÂ¬Ã¢â€žÂ:
Nagesh and his colleagues from his outsourcing mud hut strike again!

FTFY

2011-04-18 Reply Admin

[q]Sam wanted to use the hashing logic for a similar problem. [/q]

TRWTF is code reuse in an enterprise situation, amirite?

2011-04-18 Reply Admin

John Hardin:
Oh, god, the pain. It is blinding.
I have got to stop coming by TDWTF.

You're in luck! You're no longer coming to TDWTF, you're coming to the TOEFDWTF.

(That's the once every few days WTF)

2011-04-18 Reply Admin

aptent:
Once in 1000000000000000000000000000000000000000000000000. Yeah, I really don't think you'll ever have the same hash twice.

>>> from random import randrange
>>> randrange(1000000000000000000000000000000000000000000000000)
479866125362889704564601059308345722216664906502L
>>> randrange(1000000000000000000000000000000000000000000000000)
479866125362889704564601059308345722216664906502L
>>>

Well, I never... What are the odds?!

2011-04-18 Reply Admin

Jeff:
Just Me:
A hash of the text will provide a quick way to be sure two texts are different.
How is it faster to compute two hashes and compare them vs. just comparing the two strings?

You only need to compute the hash for each string once (expensive) and then comparing the hash (very cheap).

Comparing the strings un-hashed would be moderately expensive for every string.

If the data was guaranteed to be short then perhaps you would not benefit much from the hash.

2011-04-18 Reply Admin

psuedonymous:
Does anyone else hear that odd whooshing sound?

Is that how it sounded to you?

The Nondeterministic Hash

Leave a comment on “The Nondeterministic Hash”