- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
TRWTF is not ignoring user input for file names and just generating a unique one with the timestamp.
Admin
Hmm, I don't see how you can say that this was a bad solution without knowing the requirement.
For starters, why did he have to generate a new name for uploaded files? Why not just save them with the original name? Was the issue that if two files were uploaded both named "scan1.jpg" or "track01.avi" that he couldn't assume that the second was an updated version of the first, and so it shouldn't overwrite? If that's not true, if the original file name is indeed to be interpreted as a unique identifier of the file, then there's no reason to replace the file name with a randomly-generated name or a hash or anything else: just use the original file name. If the original file name CANNOT be assumed to be a unique identifier, than hashing the file name would not work, because duplicate names would generate duplicate hashes. That would completely fail the requirement.
Without seeing the rest of the code, we don't know if he considered the possibility of collisions. He might have generated a name, then checked if it already existed and if so generated a new name. Sure, the set of possible results is too small if there are really tens of millions of uploads. But that's easily fixed by changing the upper bound on the loop.
Personally, if I had to generate fake file names, I'd prefer to use a sequence number or maybe a guid than generate names randomly and check for duplicates. On the other hand, sequence numbers could generate duplicates if there are multiple threads running simultaneously. Random names might actually be better in such a case.
Actually, without knowing the requirement, how do we know that the client did not say, "I want all file names replaced with a randomly generated name so as to obfuscate the contents"?
If someone tells me that the answer to an arithmetic problem is 37, it's impossible to say whether that answer is right or wrong without KNOWING WHAT THE QUESTION WAS!
Admin
Admin
I am deeply disappointed that this story did not make SOME pun on the word "hash".
Admin
Admin
The underlying assumption in that statement is that being a bad coder is totally unrelated to using drugs.
Perhaps your statement is like saying, "Wait, did you not hire Mr Jones as a music critic because he was unable to distinguish good music from bad, or did you not hire him just because he's completely deaf?"
Funny that you apparently accept the idea that using drugs would interfere with a pilot's ability to fly a plane, but would not interfere with a programmer's ability to write good code.
(I refrain from expressing an opinion on whether marijuana use really is linked to bad coding, having no personal experience in the matter. I'm an old man, I don't need drugs. I get high just by standing up fast.)
Admin
Apparently somebody hasn't heard of GUIDs.
Admin
What happens if you're time stamp is accurate to the second, and two files are uploaded within one second of each other? (Adding digits to the timestamp does not, of course, solve the problem. It makes it less likely, but no matter how precise it is, collisions are still possible. Especially if you have multiple threads. Any time a programmer says, "Well, yes, that could happen theoretically, but the chances of that are so small that we don't need to worry about it" ... well, when I hear that, I hide under the table until after the explosion.)
You'd still need to check for duplicates and either add a sequence number or have some other means to insure uniqueness.
Admin
Yeah, you use millisecond granularity PLUS a GUID to guarantee enough entropy that you may see one or two collisions before the heat death of the universe.
Admin
In any case, while I have nothing against pot per se, if an employee goes so far as to bring the accoutrements of smoking the substance to work, that probably implies something about the frequency of their use that would raise some flags. If I were hiring, I wouldn't care what my employees did on their off time, as long as it didn't impair their ability to work when they were supposed to. But I would also expect them to leave their personal hobby items at home :p. (Booze, for instance, is totally legal, but it'd raise similar flags if they kept a handle of vodka on their desk, even if they weren't drinking it during work hours...)
Admin
Admin
Also, yeah, the lack of any references to "hash" is pretty disappointing. Especially the lack of any references to the fact that he was, in fact, rolling his own. (I take no credit for that; all credit goes to SHA. Why was that not the title?)
Admin
Well, really you use a lib but if I had to do it myself.. (practicing for interviews) If it's a service, it's easy to increment a counter after the millisecond timestamp. You could reserve blocks of ids to avoid hitting the service too often. If you're not multi-tenanted already, you could use customer id and mac. From a function, I'd be tempted to check if the name exists in a list of recent uploads. Or block for 1 ms, depending.
Sound sane?
Admin
So replace one WTF with another huh? We can only hope you just left out the part where you were somehow guaranteeing that the input data would be unique (salting the hash with a timestampt perhaps), since a hash certainly does NOT generate unique values for non-unique input. It doesn't even guarantee unique output for unique input.
This is a simple problem with a simple set of workable solutions (some only workable with certain constraints of course): uuid, timestamp, database sequence based names, and various other ways to generate a unique value for a name that you can associate with the original name.
Admin
Hurr durr, if imgur can get away with using (seemingly) random file names, it must be good enough for us!
Admin
I am frequently frustrated here at work with folks that assume that because MD5 has a high degree of uniqueness that that is the same as being unique. A true WTF which yielded frequent collisions was the generation of a 'unique' key by hashing the data record to an MD5, then passing that through the Oracle hash to get it down to a manageable size. This was done to avoid using a sequence number for reasons that still have not been made clear to me.
Admin
Admin
I would have gone for "0-way encryption" or "random-way encryption".
Admin
I'm with jay. We don't know enough about how this thing is supposted to work.
From the description; the original client should have found the name of the file (on the assumption that the renaming is part of the product or else it would not have worked at all) but would have complained that the file did not contain the updated contents.
I do understand using hashes to convert file names because it makes spaces, special characters, upper/lower, etc. portable across O/S along with having the client file name stored in a database or sidebar text file or something.
Admin
I have rolling papers in my desk. I roll cigarettes with them.
Admin
You didn't say if you smoke them too, but I assume that's what you mean. Good luck with that COPD! Or are you the type that says rolling your own natural tobacco won't be as bad for your health? If so, good luck with COPD!
Admin
My stepmother did a nice embroidery once of letters, numbers, flowers and suchlike. She missed out the number zero and the letter 'J' (speaking from memory). I never had the heart to tell her.
Maybe some people are character-blind?
Captcha: populus - as is, vox populus, vox deus.
Admin
Don't know if I want to see the code, and one line per statement that is 1.1962222086548019456196316149566e+56 lines of code!
Willing to bet there is not a compiler in the world that could handle that! [if you disagree, create file that size and test it!]
Admin
Amazement factor, not factorial o_O
Admin
Gee, you guys are so dumb!!!
Admin
Admin
Uh, no. Once out of every thousand times you generate 5x10^17 filenames (or whatever the original figure was), you can expect to have a single collision.
Admin
Ha!
You do realize athletes are not allowed to use drugs (including marijuana) because they unfairly enhance the effectiveness of training, right?
Admin
Admin
Not that the "fix" was any better. To uniquely identify a file, the checksum should be generated from its content.
Admin
Dear Mike, it took you 10 minutes to read it?
Admin
Are you doing the same for UUID as well? i.e. has to check if it's already there bla. bla.
Admin
For work I spent a bunch of time reading 'shit cryptographers say' basically the most programmers have a cargo cult mentality when using cryptographic primitives.
Essentially doubling down like that, using two hash functions to be 'safe' is a bad smell because it's sign that they don't understand what they are doing. For instance some hash functions have good collision resistance and some 'do not'.
Which brings up the point. The Stoner at least knew that he didn't understand hash functions. Michael on the other hand...
Admin
Quite often, the worst code was written by the boss.
:-(
Admin
Likewise, I don't care what they do on their own time as long as they're sober and rested when they get to work on time. They need to do good work.
And culture makes a big difference. Here in this overseas office, we have bottles of wine on top of the kitchen cabinets here at the office. We had beer and vodka mixers, but they only lasted a few evenings. ;-> But it would be odd if anyone got into the alcohol in the middle of a work day.
Admin
And the worker has the "freedom" of refusing it. I would, because I'm lucky enough to work in a field where I can afford losing my job to protect my integrity
Admin
WTF? TDWTF article on holiday???
Admin
Like a baws...
Admin
Even if a collision is unlikely, never say never.
Often collision doesn't matter (password hashes) or you can detect it and bucket it (instead of hash.file create dir hash, more hash.file to hash/1.file, put new file as hash/2.file).
Where you really want to avoid it is where you determine that improbability aside, if it does occur, you are screwed or have big problems. Either way, improbable is not as good as impossible.
Other solutions can be arrived at for this problem aside from hashes. Still, a hash would have probably been better than what's here.
Admin
You dont need a hash...
just add a timestamp.
Admin
SWIFT Interview questions on
http://testwithus.blogspot.in/p/swift.htm
For selenium solution visit http://testwithus.blogspot.in/p/blog-page.html
QTP Interview Questions. http://testwithus.blogspot.in/p/qtp-questions.html
Admin
Admin
Admin
Admin
The real WTF is working for a company that doesn't have a policy along the lines of the following:
If you do drugs, please try to make sure it does not become a problem such that it will cause your work to deteriorate seriously.
If it does become a problem, have a word with your line manager who should be able to arrange for you to get help.
Not telling you where I work.
Admin
... no, the real WTF is cannabis a) being illegal, and b) being called marijuana, which is as insulting as calling alcoholic beverages "piss".
Admin
Admin
I thought calling cannabis marijuana would be like calling beer cerveza?
Admin
A timestamp is not great. It has to at least be milli seconds. The best thing is a counter if predetermining filenames isn't a problem.