• Anonymous (unregistered)

    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.

  • Sir Robin-The-Not-So-Brave (unregistered)

    Frist! And this is to convince Akismet that I'm not a spammer.

  • Geert (unregistered)

    Quite odd that the amount of duplicate data still was reduced with hash function. :)

  • Sir Robin-The-Not-So-Brave (unregistered) in reply to Sir Robin-The-Not-So-Brave
    Sir Robin-The-Not-So-Brave:
    Frist! And this is to convince Akismet that I'm not a spammer.
    What the??? It said 0 comments at the first attempt. Akismet, I'll kill you!
  • Visage (unregistered)

    'he just implemented a quick SHA-1'

    Thats the real WTF, right there.

  • Bill Frist (unregistered) in reply to Sir Robin-The-Not-So-Brave
    Sir Robin-The-Not-So-Brave:
    Frist! And this is to convince Akismet that I'm not a spammer.
    Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.
  • Anonymous (unregistered) in reply to Anonymous
    Anonymous:
    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
    function createHash()
    {
      return globalCounter++;
    }
    
    Hey, that was easy!
  • Sir Robin-The-Not-So-Brave (unregistered) in reply to Bill Frist
    Bill Frist:
    Sir Robin-The-Not-So-Brave:
    Frist! And this is to convince Akismet that I'm not a spammer.
    Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

    Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

    ∀ p ∊ P etc... ∎ (qed)

  • Hashcoder (unregistered)

    any_number % 1 == 0

    So this function always returns a string of zeroes.

  • Bill Frist (unregistered) in reply to Sir Robin-The-Not-So-Brave
    Sir Robin-The-Not-So-Brave:
    Bill Frist:
    Sir Robin-The-Not-So-Brave:
    Frist! And this is to convince Akismet that I'm not a spammer.
    Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

    Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

    ∀ p ∊ P etc... ∎ (qed)

    So you're saying that you never had a badge to begin with? That's fraud son, and it's a much more serious crime. I'm going to have to ask you to accompany me down to the station.

  • AA (unregistered) in reply to Anonymous
    Anonymous:
    Anonymous:
    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
    function createHash()
    {
      return globalCounter++;
    }
    
    Hey, that was easy!

    I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.

  • Sir Robin-The-Not-So-Brave (unregistered) in reply to Anonymous
    Anonymous:
    Anonymous:
    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
    function createHash()
    {
      return globalCounter++;
    }
    
    Hey, that was easy!

    It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

    So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.

  • Sir Robin-The-Not-So-Brave (unregistered) in reply to Hashcoder
    Hashcoder:
    any_number % 1 == 0

    So this function always returns a string of zeroes.

    any_int % 1 == 0; Math.random() returns a Double with a value between 0 and 1. The remainder is the part after the comma, which is a bit silly, but isn't wrong.

  • QJ (unregistered) in reply to Sir Robin-The-Not-So-Brave
    Sir Robin-The-Not-So-Brave:
    Bill Frist:
    Sir Robin-The-Not-So-Brave:
    Frist! And this is to convince Akismet that I'm not a spammer.
    Not only are you not frist, but you got beaten by someone who actually had a relevant comment. I'm afraid I'm going to have to ask you to turn in your "loser who replies 'frist' to every new post" badge.

    Well then it's a good thing that this was only my first attempt at a "first" post, i.o.w. the word "every" doesn't apply therefor logically I am not a loser.

    ∀ p ∊ P etc... ∎ (qed)

    You keep telling yourself that, maybe you'll start believing it - but we won't.

  • mangobrain (unregistered)

    The previous two commenters, talking about how incrementing a global counter fails due to concurrency, or whatever nonsense Sir Robin-the-Not-So-Brave is on about, are completely missing the point of a hash.

    The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

  • Anonymous (unregistered) in reply to AA
    AA:
    Anonymous:
    Anonymous:
    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
    function createHash()
    {
      return globalCounter++;
    }
    
    Hey, that was easy!

    I guess, if you're the kind of person who likes to introduce concurrency bottlenecks into arbitrary nonconcurrent functions.

    Oh my God, I'm exactly that sort of person! I also like long walks on the beach and people with a good sense of humour. Oh, I guess we're not compatible after all.

  • Sir Robin-The-Not-So-Brave (unregistered) in reply to mangobrain
    mangobrain:
    The point of a hash is (...) to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.
    I stand corrected and you are 100% correct, Sir!
  • Gary (unregistered) in reply to Sir Robin-The-Not-So-Brave
    Sir Robin-The-Not-So-Brave:
    Anonymous:
    Anonymous:
    And there was me thinking that a hash was supposed to somehow relate to the item being hashed. This will make it make it much easier to implement hashing algorithms.
    function createHash()
    {
      return globalCounter++;
    }
    
    Hey, that was easy!

    It's data entry by users. By hand. One wouldn't even need a really long hash. (globalCounter++).toString(16) only once would be more than enough. OTOH 10^48 random numbers is also more than enough to avoid a hash collision in most cases of manual data entry, provided that the random generator is properly seeded. It's a really stupid implementation, but it will probably work provided that you never have to regenerate the same hash from the same source. And it's fewer lines of code than a complete SHA implementation.

    So yeah in theory it's a WTF and I would never write something like this myself, but in practice it works good enough.

    Sir Robin run away. The entire point of the exercise is to generate hash collisions so you can see if the data is duplicated.

  • Damien (unregistered)

    If only there was a Math.seed() function, that could take arbitrary input. Then you could feed everything you wanted hashing into that, and this function would do something approximately correct.

  • TheSHEEEP (unregistered)

    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

  • Kempeth (unregistered) in reply to Visage
    Visage:
    'he just implemented a quick SHA-1'

    Thats the real WTF, right there.

    Not really unless he really did implement the SHA algorithm. He probably meant he "implemented a quick SHA-1" based hashing function. I.e take a bunch of inputs and feed them to the SHA function.

    Bad function though. He forgot to seed the random number generator...

  • (cs)

    Without proper context, I could agree that the createHash function is a WTF. However, if you imagine that createHash is called when the page is loaded, and then passed during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed, or adding an interstitial page would certainly be better. So this is still a WTF, but not for the reason other posters have stated.

  • mangobrain (unregistered) in reply to TheSHEEEP
    TheSHEEEP:
    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

    It didn't, but once in a while, a new submission would (at random) be assigned the same "hash" as an existing submission. That, and a healthy dose of placebo effect.

  • (cs) in reply to TheSHEEEP
    TheSHEEEP:
    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

    It wasn't a hash to check the data; it was a hash to check that the data was only posted once.

  • psuedonymous (unregistered) in reply to mangobrain
    mangobrain:
    The previous two commenters, talking about how incrementing a global counter fails due to concurrency, or whatever nonsense Sir Robin-the-Not-So-Brave is on about, are completely missing the point of a hash.

    The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

    Does anyone else hear that odd whooshing sound?

  • Anonymous (unregistered) in reply to dohpaz42
    dohpaz42:
    Without proper context, I could agree that the createHash function is a WTF. However, if you imagine that createHash is called when the page is loaded, and then passed during form submission to check against a user mistakenly hitting submit more than once (which could happen if the submission was taking a while and an impatient user kept pressing submit thinking that would make things go faster...). Granted, there are better ways to guard against that sort of WTFry; simply disabling the submit button when it is pressed, or adding an interstitial page would certainly be better. So this is still a WTF, but not for the reason other posters have stated.
    The article clearly states in the first paragraph that "users from different parts of the world aggregate and enter all sorts of data from different sources". That is more than enough context to understand that they were trying to hash the data, not a single page instance. Beside, a hash is a hash - what you are descibing would not constitute a "hash", merely a way of assigning an ID to a page instance.
  • Just Me (unregistered)

    The objective is to remove duplicates. A hash of the text will provide a quick way to be sure two texts are different. Now, to make sure they are equal (after you got equal hashes) you will have to compare the texts. Using a "hash" that does not depends on the text will allow duplicates to slip through but will not loose texts.

    If they are doing the comparison...

  • (cs) in reply to TheSHEEEP
    TheSHEEEP:
    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

    http://s260.photobucket.com/albums/ii12/REexpert44/%3Faction%3Dview%26current%3Dthats_the_joke.jpg

  • C-Octothorpe (unregistered) in reply to mangobrain
    mangobrain:
    TheSHEEEP:
    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

    It didn't, but once in a while, a new submission would (at random) be assigned the same "hash" as an existing submission. That, and a healthy dose of placebo effect.

    Further to that, they would be losing data because of said placebo effect.

    management was satisfied with the reduction in duplicate data

    I'm curious if there was an actual measurable improvement, or as you stated, simply a placebo effect... Hmm, less records means its working, right? Ahh, I should've looked closer, it was management that noticed the reduction in duplicate data; who better to vet this type of metric than someone who would pay someone like Nagesh to write their apps for them...

  • grover_the_great (unregistered)

    I recognise that code - its the part of the knobworks v11.78 patch - so SamF's predecessor was that small shaven poodle called Earsmus Pink. Man that blitch's code output was legendary, 10 LOC per day and all of it based on her canine logic that even our experts could not argue with...

  • kktkkr (unregistered)

    This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.

  • (cs)

    If anything this thread has taught me it's to not post to soon (or often).

    It's interesting how the term "hash" automatically biases us to think "lookup". The code in the article works fine and fills the requirement. Using a UUID or even a single 32 bit random number would have been slightly more elegant.

    I would to see how long they cache the "hash" values.

  • Uninformed Opinion (unregistered) in reply to TheSHEEEP
    TheSHEEEP:
    Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.
    Depending on how it was used, it may have been to prevent multiple submission. If the data entry person clicked Submit multiple times before the page reload, the same random number would be submitted on each form. On the server side, you can check that.

    It would have the effect of reducing the amount of "duplicate" data, and it would occasionally tell a data entry operator that "duplicate" data was found.

    And yeah, used this way, it's still a WTF.

  • Shiva (unregistered) in reply to kktkkr
    kktkkr:
    This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.
    That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!
  • (cs) in reply to mangobrain
    mangobrain:
    The point of a hash is not, in this case, to assign a globally unique identifier to each new submission. It is to detect and identify duplicate (i.e. non-unique) submissions. Therefore not only is a "hashing function" which doesn't generate the same output when given the same input not a hashing function, it doesn't even come close to being applicable to the problem.

    So...you're saying...that the code example in the original post isn't very good?

    Have the admins been informed of this?

  • (cs) in reply to Shiva
    Shiva:
    kktkkr:
    This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.
    That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!

    OK. That's great and all, but what if you want to support multiversalization?

  • (cs) in reply to Bumble Bee Tuna
    Bumble Bee Tuna:
    TheSHEEEP:
    Ehrm...

    I feel kinda stupid. Can anyone tell me how that "hash" helped reducing duplicate entries? Because I really don't get how it could do that.

    http://s260.photobucket.com/albums/ii12/REexpert44/%3Faction%3Dview%26current%3Dthats_the_joke.jpg

    No, that isn't the joke. The article clearly says that duplication was reduced. This has yet to be explained.

  • Shiva (unregistered) in reply to frits
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

  • C-Octothorpe (unregistered) in reply to Shiva
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Wow, someone bit! Good job frits... I always thought you were too high brow to troll.

  • Shiva (unregistered) in reply to C-Octothorpe
    C-Octothorpe:
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Wow, someone bit! Good job frits... I always thought you were too high brow to troll.

    Damn it you guys, I expect the trolls round here to be the typical "your not too smart" nonsense. OK, well done, well done.

  • (cs) in reply to Shiva
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Ahem.

  • Shiva (unregistered) in reply to frits
    frits:
    Shiva:
    kktkkr:
    This function would make a good hash of the current time, provided the random number generator is reseeded with the time for every use.
    That would be really useful. Date/time just isn't unique enough, every time the universe resets it just starts again from the beginning. But hash it with an external seed and you could identify changes between iterations of the universe! Brillant!

    OK. That's great and all, but what if you want to support multiversalization?

    Hmm... I'll need to identify the differences that distinguish each instance of the universe from the next, then feed that into the hashing algorithm. This might take some time, although logically I've already figured it out in once of the alternative universes so problem solved!
  • C-Octothorpe (unregistered) in reply to frits
    frits:
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Ahem.

    Sorry, but where in the article does it say this? I realize it doesn't explicitly say that it's using it to compare what is on the form with the DB, though it's implied saying that the hash was used to alert the user if they were entering duplicate data. Nowhere does it imply that it was used to prevent moron users from being click-happy...

    Aw shit, I just got trolled... DAMNIT!

  • (cs)

    "Math.floor(9999999999999 * (Math.random() % 1));"

    Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

  • C-Octothorpe (unregistered) in reply to ubersoldat
    ubersoldat:
    "Math.floor(9999999999999 * (Math.random() % 1));"

    Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

    Go big or go home, maybe?

  • Billy Goat Gruff #1 (unregistered)

    After smoking all that hash... a random number generated by the client's browser seems a perfectly sensible way of detecting whether Input A == Input B, even if the two are separated by time, space, and, thanks guys, the re-instatiation of the multiverse.

  • Shiva (unregistered) in reply to frits
    frits:
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Ahem.

    Am I missing something here? Sure, it was "a hash to check that the data was only posted once" - but posted once by any user anywhere in the world at any time:
    The Article:
    Users from different parts of the world aggregate and enter all sorts of data from different sources.
    It wasn't designed to prevent one user from entering the data twice - it was designed to prevent global duplicates, across all users all over the world. At least, that's what I understood from the article. Am I wrong?
  • Shiva (unregistered)

    Wait, did I just get double-trolled?

  • Billy Goat Gruff #1 (unregistered) in reply to C-Octothorpe
    C-Octothorpe:
    ubersoldat:
    "Math.floor(9999999999999 * (Math.random() % 1));"

    Why bother? With a simple Math.random() he could have achieved the same level of FAIL!

    Go big or go home, maybe?

    Might be more enterprisey to find the supposed duplicate ratio and block every "n"th insert; 100% success rate!

  • (cs) in reply to Shiva
    Shiva:
    frits:
    Shiva:
    frits:
    The code in the article works fine
    No it doesn't.
    frits:
    and fills the requirement.
    No it doesn't.

    How does this code validate that data entered by different users at different times is not equal? Each piece of data is assigned a random number and then the random numbers are checked against each other - how does that fulfil the requirement for checking that no two pieces of data are the same?

    Ahem.

    Am I missing something here? Sure, it was "a hash to check that the data was only posted once" - but posted once by any user anywhere in the world at any time:
    The Article:
    Users from different parts of the world aggregate and enter all sorts of data from different sources.
    It wasn't designed to prevent one user from entering the data twice - it was designed to prevent global duplicates, across all users all over the world. At least, that's what I understood from the article. Am I wrong?

    Wouldn't that be a business practice WTF? Why would different users from all over the world have a data-entry race condition?

    I'm not trolling, but I think the OP may be.

Leave a comment on “The Nondeterministic Hash”

Log In or post as a guest

Replying to comment #:

« Return to Article