- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Michael Bolton?
Admin
sure the random filename function could usefully have generated longer names to reduce the possibility of a collision, but using a hash function instead would be guaranteed to make a collision if the original filenames were the same. Presumably that's not what was intended
Admin
Add a salt of the username/date&time/extrarandomcharacters
Admin
Admin
His solution was to make it more likely for collisions to occur? A genius like that should feel right at home at Oracle, Adobe or Microsoft!
Admin
an sha512 hash of the data in the file gives you 99.9% chance yu won't have duplicates for up to 5.2e75 (different) files.
Sure, if you have the same file multiple times, it will collide with itself. But does it really matter whether a hash refers to the first or second copy of exctly the same data?
Admin
Sometimes that's what you want...for the same file uploaded by the same user to make a hash collision and overwrite or prompt if that's intended. In such cases it's good not to use the date/time in the salt but only the username.
Problem is, however that as good as MD5 is you can still have collisions on different names (see http://merlot.usc.edu/csac-f06/papers/Wang05a.pdf for a paper that describes an algorithm for generating collisions to an existing MD5 hash value e.g.). So you've got to have some way to recover the original name...and MD5 doesn't give that to you.
Admin
And then Daenerys said, "Draughon, dracarys" and Draughon breathed fire and burnt the building down.
Admin
UUID anyone?
Admin
The original hash only gave 33.5 million possible hashes, so "tens of millions of files" not causing a collision seems very very unlikely...
Admin
Ok... so the guy was fired not because he was a shitty coder, but because he smoked weed?
Considering he's not a pilot or something like that, that seems pretty fucked up
Admin
Here be Draughons.
Admin
TRWTF is, that both are idiots.
Admin
His valid chars also didn't include the letter "H". I guess he doesn't trust that either.
Admin
I wonder why he skipped h, l, u and v?
Admin
hluv I know...
Admin
We had a simple public facing but in house developed system for tftp configs that would take requests and generate files from a db. Because of the dangers of putting anything from the internet into db queries it was a requirement we hashed every request and did a lookup on the hash, and because of the danger of collisions we hashed to both md5 and sha and did the lookup on both hashes.
TRWTF? The system at it's peak handled less than 50 devices total, and they simply sent their mac address as the tftp param.
(also when I said we I meant I >.>)
Admin
How do you know that Oracle, Adobe or Microsoft haven't already hired such a genius. Maybe all three of them at the same time.
Admin
Better titles for this article:
'Re-Inventing the Alphabet'? ... c'mon guys!
Admin
At first I called it fake seeing the crapper being fired -- not likely to happen in real life. But then I realized that he was being storing data files with random names (that I should point is not a hashing in any sense); suspicious files for some anti-virus systems... So I think he was fired for inserting a virus into the server. Now it has sense for me.
Admin
Admin
Admin
Let's see.
abcdefgijkmnopqrstwxyz1234567890 - that's 32 characters.
The filenames consist of 5 characters randomly chosen out of those 32. That means there are 32^5 = 33,554,432 possible filenames.
So, if there were tens of millions of files, the chance would be quite high (or even certain) that there must have been collisions.
Maybe, when there was a collision, the software would just overwrite an existing file with the same name?
Admin
Admin
With so many files, I could imagine any clients using the service would just "store and forget". Kind of like an archive in a bureaucratic environment.
Admin
"99.9%"
So, once out of every 1000 it will fail?
Admin
Admin
No, if you generate 5.2e75 different files 1000 times, one will cause a collision.
Admin
Do not MD5 in the files of Draughons, for you are crunchy, and good with ketchup.
Woah man, I got the munchies sooooo bad...
Admin
Admin
An MD5 hash is 128 bits (16 bytes), and a typical MP3 file name may be in the 15-25 byte range. But four of those bytes (".mp3") have no entropy, and the rest have about 5 bits of entropy. So that's about 50-100 bits of entropy, which means that MD5 hashes of MP3 file names are really unlikely to have accidental collisions. To make an intentional collision, you would likely have to use the full range of 8-bit characters.
Admin
You can use whatever hash algorithm you want, if you hash the same file name you get the same hash
Admin
Come on, "Rolling your own Hash" would still have been good.
Admin
Admin
It seems that he wanted to overwrite files with the same name (hmmm, what if they belong to different users?) So why not just use the original file name?
Also, it's not clear how this was supposed to work. A client uploads a file, then browses all existing file names? Or are they stored in a DB? How did this work originally?
Admin
Beyond unlikely. The birthday-paradox says that though there's on the order of 2^26 possible filenames, you'd expect to, on the average, get the first collisions at file 2^13, or after having uploaded about 10k files.
After that, collisions gets more and more likely. To get to "tens of millions of files" with zero collisions is astronomically unlikely.
Admin
Oh. I recognize the solution. The architect in my previous company did that. Guess who was escorted, though.
Admin
With 60.4 million possible filenames (36^5) generated by this function, and tens of millions of uploads, there's almost certainly a name collision.
Michael just hasn't found them yet.
Admin
l is missing because it looks like I and 1; u is missing so that none of the files end up with "fuck" in their names; h and v are missing because... errr...
OK, I got nothing.
As for using a hash over the file contents instead: I believe Dropbox does this very thing, which is why if you upload a movie that somebody else has already stolen and uploaded before you, your own upload will complete impossibly quickly. It's a reasonably effective de-duplication scheme that has to be saving Dropbox a shitload of disk space, and is probably the main reason why they don't automatically encrypt your stuff client-side.
Addendum (2013-05-01 11:31): On further thought: h is missing so none of the files end up with "shit" in their names, and v is missing because it looks too much like u.
Admin
Admin
What are you talking about, you pinko commie asshole? There is no country in the world whose corporations have more freedom to oppress, exploit and enslave their workers than the USA.
USA! USA! USA!
Admin
Dear flabdablet, it took you 10 minutes to write that comment?
Admin
In the USA, we see this as the freedom of the business owner to demand a drug test any time he wants.
Admin
Admin
Admin
Admin
I worked through some code that one of my pre-pre-pre-decessors at my current workplace wrote... boy, that's pure TDWTF material there. I will submit some of it in the future (one method has 45! nested if-else-statements...). But he wasn't fired for being "a shitty coder". In fact, his programs do work. They are slow as hell and can't be maintained (because nobody wants dig through the 1000 lines of uncommented code that guy wrote for a task as simple as reading a CSV-file into a DataGrid). But since the programs do what they should do, nobody who could fire him knew, that he was a shitty coder.
TL;DR: For Draughon to be fired for being "a shitty coder" you'd need someone in charge to KNOW that he's a "shitty coder". Very unlikely.
Admin
The hash is over the file name, not contents. It's totally possible that you could have two files (e.g. track01.mp3) with the same name and totally different contents, and those would collide.
Admin
It mentions he had rolling papers in his desk. Circumstantial evidence, yes, but come on, why would you have the paraphernalia at work? So you could roll one when you get to the car where you have the actual weed? That guy's a disaster waiting to happen. Maybe he signed a drug-free workplace agreement, or more likely this was in a right-to-work state.
Captcha: esse esse marlboro officer, I swear man. I love you man.
Admin
While I will not deny that some (okay, many) suffer from SQL injection vulnerabilities, when it concerns just a single field, I would think that even the person who wrote that requirement should have been able to find one of the dozen ways to safely insert internet data into a database.
Wait, you never talk to your managers? They never ask why maintenance of some modules takes three times as long as work done by the other devs? I mean, it might not go all the way up the chain, but presumable someone was in charge of the development team, would they not find out eventually?Heck, I have a few components that through years of change requests have become somewhat less clear than they could be, and I make sure my boss knows this when he asks for a quote on yet another change: until I get time alloted to clean it up, changes to these modules will take longer than similar changes in the rest of our code-base.