• (cs) in reply to CodeCow
    CodeCow:
    Ok... so the guy was fired not because he was a shitty coder, but because he smoked weed?

    Considering he's not a pilot or something like that, that seems pretty fucked up

    Not at all. Consider that programmers are knowledge workers. If I hired someone for their brains, I'd throw them out for using mind-altering substances too! If you owned a sports team, wouldn't you fire an athlete for deliberately screwing up his body?

  • (cs)

    TRWTF is not ignoring user input for file names and just generating a unique one with the timestamp.

  • jay (unregistered)

    Hmm, I don't see how you can say that this was a bad solution without knowing the requirement.

    For starters, why did he have to generate a new name for uploaded files? Why not just save them with the original name? Was the issue that if two files were uploaded both named "scan1.jpg" or "track01.avi" that he couldn't assume that the second was an updated version of the first, and so it shouldn't overwrite? If that's not true, if the original file name is indeed to be interpreted as a unique identifier of the file, then there's no reason to replace the file name with a randomly-generated name or a hash or anything else: just use the original file name. If the original file name CANNOT be assumed to be a unique identifier, than hashing the file name would not work, because duplicate names would generate duplicate hashes. That would completely fail the requirement.

    Without seeing the rest of the code, we don't know if he considered the possibility of collisions. He might have generated a name, then checked if it already existed and if so generated a new name. Sure, the set of possible results is too small if there are really tens of millions of uploads. But that's easily fixed by changing the upper bound on the loop.

    Personally, if I had to generate fake file names, I'd prefer to use a sequence number or maybe a guid than generate names randomly and check for duplicates. On the other hand, sequence numbers could generate duplicates if there are multiple threads running simultaneously. Random names might actually be better in such a case.

    Actually, without knowing the requirement, how do we know that the client did not say, "I want all file names replaced with a randomly generated name so as to obfuscate the contents"?

    If someone tells me that the answer to an arithmetic problem is 37, it's impossible to say whether that answer is right or wrong without KNOWING WHAT THE QUESTION WAS!

  • JustMe (unregistered) in reply to Math
    "an sha512 hash of the data in the file gives you 99.9% chance yu won't have duplicates for up to 5.2e75 (different) files."
    "99.9%"

    So, once out of every 1000 it will fail?

    Yes, it's going to fail once out of every 1000 times you upload 5.200.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000.000 files.

  • jay (unregistered)

    I am deeply disappointed that this story did not make SOME pun on the word "hash".

  • (cs) in reply to Tea
    Tea:
    UUID anyone?
    Anyone?
  • jay (unregistered) in reply to CodeCow
    CodeCow:
    Ok... so the guy was fired not because he was a shitty coder, but because he smoked weed?

    Considering he's not a pilot or something like that, that seems pretty fucked up

    The underlying assumption in that statement is that being a bad coder is totally unrelated to using drugs.

    Perhaps your statement is like saying, "Wait, did you not hire Mr Jones as a music critic because he was unable to distinguish good music from bad, or did you not hire him just because he's completely deaf?"

    Funny that you apparently accept the idea that using drugs would interfere with a pilot's ability to fly a plane, but would not interfere with a programmer's ability to write good code.

    (I refrain from expressing an opinion on whether marijuana use really is linked to bad coding, having no personal experience in the matter. I'm an old man, I don't need drugs. I get high just by standing up fast.)

  • (cs)

    Apparently somebody hasn't heard of GUIDs.

  • jay (unregistered) in reply to chubertdev
    chubertdev:
    TRWTF is not ignoring user input for file names and just generating a unique one with the timestamp.

    What happens if you're time stamp is accurate to the second, and two files are uploaded within one second of each other? (Adding digits to the timestamp does not, of course, solve the problem. It makes it less likely, but no matter how precise it is, collisions are still possible. Especially if you have multiple threads. Any time a programmer says, "Well, yes, that could happen theoretically, but the chances of that are so small that we don't need to worry about it" ... well, when I hear that, I hide under the table until after the explosion.)

    You'd still need to check for duplicates and either add a sequence number or have some other means to insure uniqueness.

  • (cs) in reply to jay
    jay:
    chubertdev:
    TRWTF is not ignoring user input for file names and just generating a unique one with the timestamp.

    What happens if you're time stamp is accurate to the second, and two files are uploaded within one second of each other? (Adding digits to the timestamp does not, of course, solve the problem. It makes it less likely, but no matter how precise it is, collisions are still possible. Especially if you have multiple threads. Any time a programmer says, "Well, yes, that could happen theoretically, but the chances of that are so small that we don't need to worry about it" ... well, when I hear that, I hide under the table until after the explosion.)

    You'd still need to check for duplicates and either add a sequence number or have some other means to insure uniqueness.

    Yeah, you use millisecond granularity PLUS a GUID to guarantee enough entropy that you may see one or two collisions before the heat death of the universe.

  • neminem (unregistered) in reply to jay
    jay:
    Funny that you apparently accept the idea that using drugs would interfere with a pilot's ability to fly a plane, but would not interfere with a programmer's ability to write good code.
    I personally accept the ability that using drugs would impair my ability to write good code for the period of time that said drugs were affecting me, but not forever. (I also accept that there's a taper-off, so I would probably want a pilot to have refrained from doing anything psychoactive for longer than I would care if a programmer had, though I wouldn't really care if a 35 year old pilot had done some pot when he was 30, either.)

    In any case, while I have nothing against pot per se, if an employee goes so far as to bring the accoutrements of smoking the substance to work, that probably implies something about the frequency of their use that would raise some flags. If I were hiring, I wouldn't care what my employees did on their off time, as long as it didn't impair their ability to work when they were supposed to. But I would also expect them to leave their personal hobby items at home :p. (Booze, for instance, is totally legal, but it'd raise similar flags if they kept a handle of vodka on their desk, even if they weren't drinking it during work hours...)

  • n_slash_a (unregistered) in reply to FragFrog
    FragFrog:
    Tim:
    but using a hash function instead would be guaranteed to make a collision if the original filenames were the same.
    Agreed; especially if people uploaded pictures take with their camera's, which all tend to follow the same naming scheme, or similar automatically generated filenames. A somewhat better approach is to hash over the file itself, although the effectiveness of that hinges on the hashing algorithm used.
    Presumable he generated a hash using the file name and a timestamp, thus ensuring there were no collisions.
  • neminem (unregistered)

    Also, yeah, the lack of any references to "hash" is pretty disappointing. Especially the lack of any references to the fact that he was, in fact, rolling his own. (I take no credit for that; all credit goes to SHA. Why was that not the title?)

  • (cs) in reply to chubertdev

    Well, really you use a lib but if I had to do it myself.. (practicing for interviews) If it's a service, it's easy to increment a counter after the millisecond timestamp. You could reserve blocks of ids to avoid hitting the service too often. If you're not multi-tenanted already, you could use customer id and mac. From a function, I'd be tempted to check if the name exists in a list of recent uploads. Or block for 1 ms, depending.

    Sound sane?

  • Fedaykin (unregistered)

    So replace one WTF with another huh? We can only hope you just left out the part where you were somehow guaranteeing that the input data would be unique (salting the hash with a timestampt perhaps), since a hash certainly does NOT generate unique values for non-unique input. It doesn't even guarantee unique output for unique input.

    This is a simple problem with a simple set of workable solutions (some only workable with certain constraints of course): uuid, timestamp, database sequence based names, and various other ways to generate a unique value for a name that you can associate with the original name.

  • (cs)

    Hurr durr, if imgur can get away with using (seemingly) random file names, it must be good enough for us!

  • Robert (unregistered)

    I am frequently frustrated here at work with folks that assume that because MD5 has a high degree of uniqueness that that is the same as being unique. A true WTF which yielded frequent collisions was the generation of a 'unique' key by hashing the data record to an MD5, then passing that through the Oracle hash to get it down to a manageable size. This was done to avoid using a sequence number for reasons that still have not been made clear to me.

  • (cs) in reply to jay
    jay:
    If someone tells me that the answer to an arithmetic problem is 37, it's impossible to say whether that answer is right or wrong without KNOWING WHAT THE QUESTION WAS!
    It's wrong since the correct answer is 42
  • Valued Service (unregistered) in reply to SHA
    SHA:
    Tankster:
    ZoomST:
    Tankster:
    Better titles for this article:
    • Pass the Hash
    • Hash Table Collision
    • Smokin' Hash
    • 420 Error

    'Re-Inventing the Alphabet'? ... c'mon guys!

    As pointed by Davor at 407007, you missed TRWTF: the alphabet is incomplete.

    • Joint WTF?

    Come on, "Rolling your own Hash" would still have been good.

    I would have gone for "0-way encryption" or "random-way encryption".

  • jayjay (unregistered)

    I'm with jay. We don't know enough about how this thing is supposted to work.

    From the description; the original client should have found the name of the file (on the assumption that the renaming is part of the product or else it would not have worked at all) but would have complained that the file did not contain the updated contents.

    I do understand using hashes to convert file names because it makes spaces, special characters, upper/lower, etc. portable across O/S along with having the client file name stored in a database or sidebar text file or something.

  • trtrwtf (unregistered) in reply to AllGeneralizationsAreBad
    AllGeneralizationsAreBad:

    It mentions he had rolling papers in his desk. Circumstantial evidence, yes, but come on, why would you have the paraphernalia at work?

    I have rolling papers in my desk. I roll cigarettes with them.

  • AllGeneralizationsAreBad (unregistered) in reply to trtrwtf
    trtrwtf:
    AllGeneralizationsAreBad:

    It mentions he had rolling papers in his desk. Circumstantial evidence, yes, but come on, why would you have the paraphernalia at work?

    I have rolling papers in my desk. I roll cigarettes with them.

    You didn't say if you smoke them too, but I assume that's what you mean. Good luck with that COPD! Or are you the type that says rolling your own natural tobacco won't be as bad for your health? If so, good luck with COPD!

  • Romojo (unregistered) in reply to Xyon

    My stepmother did a nice embroidery once of letters, numbers, flowers and suchlike. She missed out the number zero and the letter 'J' (speaking from memory). I never had the heart to tell her.

    Maybe some people are character-blind?

    Captcha: populus - as is, vox populus, vox deus.

  • (cs) in reply to Hannes
    Hannes:
    one method has 45! nested if-else-statements

    Don't know if I want to see the code, and one line per statement that is 1.1962222086548019456196316149566e+56 lines of code!

    Willing to bet there is not a compiler in the world that could handle that! [if you disagree, create file that size and test it!]

  • (cs) in reply to TheCPUWizard
    TheCPUWizard:
    Hannes:
    one method has 45! nested if-else-statements

    Don't know if I want to see the code, and one line per statement that is 1.1962222086548019456196316149566e+56 lines of code!

    Willing to bet there is not a compiler in the world that could handle that! [if you disagree, create file that size and test it!]

    Amazement factor, not factorial o_O

  • Draughon (unregistered) in reply to Davor
    Davor:
    I wonder why he skipped h, l, u and v?
    Because if his random number went into a standard alphabet, then someoen who worked out his seed and new the random number algorithm could predict his filenames. This way, he makes sure that the glitch in the alphamabet further obfuscates the filename. Randomly dropping letters adds security - security through obscurity...

    Gee, you guys are so dumb!!!

  • Mitch (unregistered) in reply to chubertdev
    chubertdev:
    TheCPUWizard:
    Hannes:
    one method has 45! nested if-else-statements

    Don't know if I want to see the code, and one line per statement that is 1.1962222086548019456196316149566e+56 lines of code!

    Willing to bet there is not a compiler in the world that could handle that! [if you disagree, create file that size and test it!]

    Amazement factor, not factorial o_O

    I would say lots more than that.....what if EVERY if-else had a nested if-else

    <disclaimer> Didn't do the math, so maybe the OP took that into account rather than saying 45! is a rather large number) </disclaimer>
  • Captain Oblivious (unregistered) in reply to Math
    Math:
    "99.9%"

    So, once out of every 1000 it will fail?

    Uh, no. Once out of every thousand times you generate 5x10^17 filenames (or whatever the original figure was), you can expect to have a single collision.

  • Captain Oblivious (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    CodeCow:
    Ok... so the guy was fired not because he was a shitty coder, but because he smoked weed?

    Considering he's not a pilot or something like that, that seems pretty fucked up

    Not at all. Consider that programmers are knowledge workers.

    Ha!

    If I hired someone for their brains, I'd throw them out for using mind-altering substances too! If you owned a sports team, wouldn't you fire an athlete for deliberately screwing up his body?

    You do realize athletes are not allowed to use drugs (including marijuana) because they unfairly enhance the effectiveness of training, right?

  • foo (unregistered) in reply to bjolling
    bjolling:
    jay:
    If someone tells me that the answer to an arithmetic problem is 37, it's impossible to say whether that answer is right or wrong without KNOWING WHAT THE QUESTION WAS!
    It's wrong since the correct answer is 42
    But what is the question? This is the question!
  • (cs)

    Not that the "fix" was any better. To uniquely identify a file, the checksum should be generated from its content.

  • (cs) in reply to Mike

    Dear Mike, it took you 10 minutes to read it?

  • DV (unregistered) in reply to jay

    Are you doing the same for UUID as well? i.e. has to check if it's already there bla. bla.

  • Gibbon1 (unregistered) in reply to Robert
    Robert:
    I am frequently frustrated here at work with folks that assume that because MD5 has a high degree of uniqueness that that is the same as being unique. A true WTF which yielded frequent collisions was the generation of a 'unique' key by hashing the data record to an MD5, then passing that through the Oracle hash to get it down to a manageable size. This was done to avoid using a sequence number for reasons that still have not been made clear to me.

    For work I spent a bunch of time reading 'shit cryptographers say' basically the most programmers have a cargo cult mentality when using cryptographic primitives.

    Essentially doubling down like that, using two hash functions to be 'safe' is a bad smell because it's sign that they don't understand what they are doing. For instance some hash functions have good collision resistance and some 'do not'.

    Which brings up the point. The Stoner at least knew that he didn't understand hash functions. Michael on the other hand...

  • Jeff Grigg (unregistered) in reply to FragFrog
    FragFrog:
    Hannes:
    For Draughon to be fired for being "a shitty coder" you'd need someone in charge to KNOW that he's a "shitty coder". Very unlikely.

    Wait, you never talk to your managers? They never ask why maintenance of some modules takes three times as long as work done by the other devs? I mean, it might not go all the way up the chain, but presumable someone was in charge of the development team, would they not find out eventually?

    ...

    Quite often, the worst code was written by the boss.

    :-(

  • Jeff Grigg (unregistered) in reply to neminem
    neminem:
    In any case, while I have nothing against pot per se, if an employee goes so far as to bring the accoutrements of smoking the substance to *work*, that probably implies something about the frequency of their use that would raise some flags. If I were hiring, I wouldn't care what my employees did on their off time, as long as it didn't impair their ability to work when they were supposed to. But I would also expect them to leave their personal hobby items at home :p. (Booze, for instance, is totally legal, but it'd raise similar flags if they kept a handle of vodka on their desk, even if they weren't drinking it during work hours...)

    Likewise, I don't care what they do on their own time as long as they're sober and rested when they get to work on time. They need to do good work.

    And culture makes a big difference. Here in this overseas office, we have bottles of wine on top of the kitchen cabinets here at the office. We had beer and vodka mixers, but they only lasted a few evenings. ;-> But it would be odd if anyone got into the alcohol in the middle of a work day.

  • Aris (unregistered) in reply to Salami
    Salami:
    Captcha: acsi:
    random drug screenings
    TRWTF right there. I wouldn't want to live in a country with so little freedom as the US.

    In the USA, we see this as the freedom of the business owner to demand a drug test any time he wants.

    And the worker has the "freedom" of refusing it. I would, because I'm lucky enough to work in a field where I can afford losing my job to protect my integrity

  • Pero perić (unregistered)

    WTF? TDWTF article on holiday???

  • miko (unregistered) in reply to Jeff Grigg
    Jeff Grigg:

    Quite often, the worst code was written by the boss.

    Like a baws...

  • (cs)

    Even if a collision is unlikely, never say never.

    Often collision doesn't matter (password hashes) or you can detect it and bucket it (instead of hash.file create dir hash, more hash.file to hash/1.file, put new file as hash/2.file).

    Where you really want to avoid it is where you determine that improbability aside, if it does occur, you are screwed or have big problems. Either way, improbable is not as good as impossible.

    Other solutions can be arrived at for this problem aside from hashes. Still, a hash would have probably been better than what's here.

  • nmare (unregistered)

    You dont need a hash...

    just add a timestamp.

  • testwithus (unregistered) in reply to Dogsworth

    SWIFT Interview questions on

    http://testwithus.blogspot.in/p/swift.htm

    For selenium solution visit http://testwithus.blogspot.in/p/blog-page.html

    QTP Interview Questions. http://testwithus.blogspot.in/p/qtp-questions.html

    www.searchyourpolicy.com
    
  • foo (unregistered) in reply to Pero perić
    Pero perić:
    WTF? TDWTF article on holiday???
    Labour Day on 1 May is an international holiday. Therefore, in the US, "Labor Day" is celebrated in September.
  • (cs) in reply to neminem
    neminem:
    Also, yeah, the lack of any references to "hash" is pretty disappointing. Especially the lack of any references to the fact that he was, in fact, rolling his own. (I take no credit for that; all credit goes to SHA. Why was that not the title?)
    Because he wasn't rolling his own hash. If you read the code, you'd see that the generated filename is completely random rather than depending in any way on the input filename.
  • (cs) in reply to foo
    foo:
    bjolling:
    jay:
    If someone tells me that the answer to an arithmetic problem is 37, it's impossible to say whether that answer is right or wrong without KNOWING WHAT THE QUESTION WAS!
    It's wrong since the correct answer is 42
    But what is the question? This is the question!
    This? No, we've tried it, didn't work. "This? 42". See?
  • (cs) in reply to CodeCow
    CodeCow:
    Ok... so the guy was fired not because he was a shitty coder, but because he smoked weed?

    Considering he's not a pilot or something like that, that seems pretty fucked up

    The real WTF is working for a company that doesn't have a policy along the lines of the following:

    1. If you do drugs, please try to make sure it does not become a problem such that it will cause your work to deteriorate seriously.

    2. If it does become a problem, have a word with your line manager who should be able to arrange for you to get help.

    Not telling you where I work.

  • (cs)

    ... no, the real WTF is cannabis a) being illegal, and b) being called marijuana, which is as insulting as calling alcoholic beverages "piss".

  • Norman Diamond (unregistered) in reply to Jeff Grigg
    Jeff Grigg:
    And culture makes a big difference. Here in this overseas office, we have bottles of wine on top of the kitchen cabinets here at the office. We had beer and vodka mixers, but they only lasted a few evenings. ;-> But it would be odd if anyone got into the alcohol in the middle of a work day.
    Stock brokers used to have three martini lunches at lunch time, right? After lunch, did they do better or worse than the flash crashes generated by algorithms?
  • (cs) in reply to Matt Westwood
    Matt Westwood:
    ... no, the real WTF is cannabis a) being illegal, and b) being called marijuana, which is as insulting as calling alcoholic beverages "piss".

    I thought calling cannabis marijuana would be like calling beer cerveza?

  • (cs) in reply to nmare
    nmare:
    You dont need a hash...

    just add a timestamp.

    A timestamp is not great. It has to at least be milli seconds. The best thing is a counter if predetermining filenames isn't a problem.

Leave a comment on “Re-Inventing the Alphabet”

Log In or post as a guest

Replying to comment #:

« Return to Article