- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Wow, so what you have just told me is that PHP is so advanced it can perform a sort-of reverse inlining where it detects that an object is used in a single place or the same way in all places and can then optimize the code by executing that code during compile time and storing the output? AMAZING!
Admin
What really amazes me is that they've taken time to split the converted image data into lines, and then concatenating them......
It's (encoded ... or decored) binary data for pete's sake, it doesn't need to presented -h [:|]
And of course, the clever programmer should've used commas, not dots, when he was unfolding his (or hers...? no....) amazing cleverness [A]
P.S. This forum (software) is a small WTF in itself btw, trying it's best to emulate a M$-program. It's interesting trying to get smilies to work in Opera, I must say [8-|]
(Luckily I'm clever, so I opened IE, and did some highly cunning copy-paste operations [:P] )
Admin
Hm, I'm unable to say if I pwned (sorry 'bout the leet lingo...) the forum software or if it pwned me.
I'd like to consider myself the moral winner :-p
Admin
I generally recommend storing these in the database along with the rest of the data. When in their own table (which may or may not be in its own storage), you have no impact on non-binary data.
There are a lot of difficulties with keeping user binary data in the file system. Some bite you now, some bite you later ...
Admin
for the record, i wasnt defending the use of MD5.
i agree with Alex's post above, with my opinion that if uv got issues with images running into unique naming issues, then ddb it all the way.
Admin
MD5 hashes are remotely unique (but only remotely unique). This is why they're used for data integrity checks that have to withstand intentional, deliberate attack. SHA1 is available in longer lengths than MD5, and suspected of having fewer known problems than MD5.
The SHA1 hash function is much, much stronger than the CRC and ECC algorithms and error tolerances used by typical storage and network hardware. Suppose you don't use hashes as file names, and just put each image in a database with sequential unique IDs or file names (similar enough that flipping a single bit get you a different but still valid image ID).
Statistically, it's more likely that your database will get corrupted and point to the wrong picture without detecting the error (about 1 in 2^92 images according to EMC's specs and some quick math...about 1 in 2^50 or so for plain IDE disks, and if you're not using ECC RAM all bets are off), than it is that you'll have two legitimate but distinct images with the same SHA1-256 (about 1 in 2^128 for a birthday attack).
How much more likely? The best enterprise-class database systems will fail silently in a way that can swap images at least 16 billion times before you will expect to find one SHA hash collision.
Admin
Of course that assumes said EC-DBMS doesn't use SHA1 internally for data integrity...
Admin
And I'm the guy who stole his stapler.......
Admin
LOL, this is cool!! Clever code is teh rulez!
Admin
<i> I should note that I once worked with an PHP-based OSS photo album that worked in a similar manner. Every image in the album was one PHP script that loaded the image from disk and echo'd it to the browser with a faked MIME type.</I>
There are a number of reasons why somebody might do this, and security is but one. Other reasons:
2) Ability to pull images from areas based on environment constraints, such as logged in user.
3) Resizing the image to exactly fit width/height constraints while conserving bandwidth.
Performance would only be a problem if it wasn't written well. The parent WTF, though, is truly, utterly, a WTF.
Admin
So your web server / file system has issues with dealing security levels of objects, so the solution is to put the objects in to source code?
That's like using a guillotine to treat an abcessed tooth. I'm glad you're not a doctor.
Admin
Dang! Just one short of infinity..!
Admin
the chances to get a collision a pretty small, but never use any hash algorithm without being prepared for a collision, because it just can happen. I've seen a lot with weak algos like CRC, and none yet with MD5, but it WILL happen, it is just a matter of time.
Now, your app in particular doesn't sound that critical that you have to worry about hash collisions. But if you are generating a lot of hashs (say at the per second level) and/or a collision could really hurt your application, you HAVE to take it into account. Here the only thing you have to worry about is you might one day lose an old image (and you will probably never lose another one after that) : not such a big deal.
Admin
mathematicaly:
1:340.282.366.920.938.463.463.374.607.431.768.211.456
chance of hitting the same hash twice
Admin
W.T.F.
Admin
W.T.F.
Admin
Have a look at the FlashGot Firefox plugin build gallery option :)
Admin
Concerning MD5, collisions have already happen. In fact, an algo was published a few months ago to create such collisions easily. However, IIRW, all the data had the same length. So here is the solution:
Never have images of the same length
(that said, by definition, collisions are impossible to avoid)
Admin
> There have been relatively few MD5 collisions found. Your chances are essentially nothing.
buy now to get your own md5 collision TODAY :)
http://www.stachliu.com/md5coll.c
Admin
Oh, you're the author of curl?
CAPTCHA: BOZO (!!!)
Admin
Damm why didn't I think of that. Ohh thats right....
Admin
Are you sure this baby will be finished today? in a few minutes, my laptop will be hot enough to make an omelet :p
Admin
Don't be too sure. Google for "birthday paradox". It applies here, because for every additional image, its hash must be different from ALL the others, and with every addition the chance increases that that's not the case.
Still, IIRC the average number of hashes you can add before there's a collision is roughly the square root of the total number of different hashes, and 2^64 is not a number you'll ever approach.
Admin
Since the others haven't addressed b), I will.
Pretty much the only absolutely certain way is to assign artificalially unique IDs to the files. Have a counter that gives each new file a number and then increases by one. The important part is that *every* file must use that counter and it must be absolutely certain that it works correctly and doesn't e.g. assign the same number to two files because they arrived at exactly the same time and the code isn't threadsafe.
But actually, MD5 is "good enough" for all practical purposes. The chances for a collision are lower than for winning the lottery jackpot twice in a row.
Admin
I find it strange that a system supposedly designed to handle data on a very large scale (a big hard drive is not small) is not designed like a Very Large Database.
Admin
Um... what do you mean with images being "<font face="Georgia">turned to text files</font>"? The method used in the WTF is Base64 encoding, and that's certainly NOT different for different image formats and in fact the same whatever the input is.
Admin
(I hope this will look nice on the forum... note that I use a**b for raising a to power b... just because ^ looks VB-ish and everyone knows it actually means xor.)
1 in 2**128 for the same hash twice *in a row*. The chances of finding at least one collision in N hashes are much greater. To calculate this, you'd use the probability of not finding even a single collision in a row of N hashes and subtract that from 1. The resulting probability for finding a collision in N hashes equals 1 - (1 - 1 / 2**128) ** (N - 1). If you were to handle just over a million files, the chances of a collision would be:
N = 2**20 + 1, which is about a million and easy to use here
1 - (1 - 1 / 2**128) ** (2**20 + 1 - 1)
1 - (1 - 2**-128) ** (2**20)
Which is really beyond me to simplify any further... so I just calculated it in 1024 digit precision. And as it turns out, it's approximately 2**-108, or 1 / 324.518.553.658.426.726.783.156.020.576.256.
What can we then conclude? That for every file we hash into an MD5, our chances of finding a collision increase greatly. Now stop reading and start finding collisions!
Admin
Putting images in databases is pretty hideous (really bad performance overhead) already and ONLY justified when you actually NEED the transactional safety and other DB benefits for the image data, which is rarely the case.
But this crap has NO advantage over just putting the images in the filesystem. None at all. Only disadvantages, and lots of them.
Admin
(Heh, forgot I had an account, so I'm logged in now.)
All our data is stored on a ~25 year old database server that had gone from a nifty place to store a little bit of interesting data to a mission critical server for 10+ departments. The way the site was set up before, there were between 2 and 8 total pages of reports on a single coil of steel. Each time you loaded a page of the report the webserver would hit the database for the info. Each page would take between 10 and 25 seconds to load. Around 95% of that is purely waiting for the queries to execute. These are relatively simple queries too, they can't be simplified any further than they already are. After my modifications there would be one 20-25 second wait up front, but from then on, loading a page of the report took less than 2 seconds so long as the cookie remained.
Looking back, what I probably should have done is cached the data in text files on the webserver, holding on to them for, say, 72 hours or so. The problem is that this would have required significantly more work so I went with the simplistic route. All in all though, the modifications I made to the site vastly improved code maintainability, clarity, and re-use. Before I touched it we had wonderful mini-wtfs such as a page which included two other pages, one of which referenced functions contained in a fourth page which was included by the other included page. We also had about a dozen obsolete functions spread out over multiple pages. Among them, GoodToBad, BadToGood, and BadToUgly. These three had something to do with date formatting, with "Good" being what should be displayed to the user, "Bad" being what vbscript liked using, and "Ugly" being the format the database used. Because why bother treating query parameters as Date/Times when you can search for them as text strings. We also had one page for each database we had to connect to, each one almost a copy/paste of the other, when all that needed to be different was the connection string. But worse yet, sometimes things were added, deleted, or modified from page to page, meaning each page was capable of behaving differently. Oh, plus I solved an SQL injection vulnerability, because after all, real men concatenate query parameters right into the query and don't use ADODB.Command objects.
And yeah, I had to teach myself VB and website design from these very same pages. =/
Admin
The difference is that DBs are designed the handle extremely large numbers of relatively small data items (rows), while file systems are designed to handle comparatively moderate numbers of data items (files) of all sizes, often extremely large, but grouped in ways (directories) suited mainly for human browsing,
It used to be impossible to put data items of more than a few KB in size into a DB, and even now AFAIK even Oracle has a 4GB limit on BLOB size, while modern file systems have eliminated that restriction. And many of the same modern file systems also use DB-like structures (B-trees) to manage their directories and can handle directories with hundreds of thousands of files quite well.
Admin
I am smarter now than I was before.
Admin
I can see a blonde, brunnette, redhead.....
Admin
it's ok as long as you're not "cleverer" then you were before
Admin
I can be clever.
I once wrote a largestValue(int, int, int [, int...]) function, sort of not knowing the language supported that natively.
Very clever.
Admin
$_GET is an array in php, so you can use the same method on any other array as well. It also works on plain variables. In my opinion it is horribly underused in php...people just take advantage of the fact that php doesn't halt on those kinds of errors to make sloppy declarations.
Admin
That is because I have not submitted some of the Python WTFs I work with. Trust me, you can write bad code in python.
Actually I submitted one example and never saw a reply, I wonder if it went through. Maybe I should re-submit. (I have had replies from other submissions, but Alex has too many other excellent WTFs to publish them)
I can see already that this forum will screw up this post, so I'll just scream WTF here to save me the bother of replying to it.
Admin
it would work just fine if the guy base64decoded them *before* putting them in the array
so perhaps that's the WTF !!
Yesterday, I couldn't remember the name of the compiler I'd used but I remembered it in my sleep :
http://www.php-accelerator.co.uk/
that would make this gunk fly !!
and it wold all cache just fine to the client if they added
header("Cache-control: public");
header("Expires: Thu, 01 Dec 2094 16:00:00 GMT");
Admin
Yet you don't need 2^64 operations to create a MD5 hash (hence why it's "broken"). Likewise with SHA-1, the first "break" to it was the ability to generate collisions in under 2^69 operations instead of the theorical 2^80 (last attack recorded managed to lower the complexity to find an SHA-1 collision to 2^63 operations). In fact, you'll find at this page that MD5 collisions can now be generated extremely fast (8 hours for a 1.6MHz Pentium notebook for the first complete collision)
Admin
??? [:O]
Admin
I'm gonna go out on a limb here and defend if not the implementation, then at least the basic idea of using base64 encoded inline images. With good PHP caching/acceleration, which has already been pointed out in previous posts, there is no re-parsing of the images file -- the array is maintained in memory and easily accessible by the webserver without going to disk. Furthermore, because the images are already base64 encoded in the array, there is no need to base64 encode them for delivery.
Yes, it appears that most folks here are forgetting that HTTP base64 encodes binary files for transport. It's a text protocol, folks. You can't send an image with binary data over HTTP.
So, properly done, this technique *could* have a net increase in speed. The question that needs to be answered is "can apache read an image from disk, determine mime type, base64 encode it, send http headers, then send the image data faster than mod_php can lookup a hash for the image name, send http headers, then send the image data?" I don't know, I haven't benchmarked it. Plus, I'm ignoring *apache* caching the image file... but this is theory, right? :)
Still, it's a horribly difficult way to work *with* the images. What happens if you want to change the image? Yep, ya gotta base64 encode it yourself, then modify the images.php file. I suppose for a relatively small set of highly-frequently accessed, but rarely modified images, this might be a good technique. But it earns a WTF for most cases, for sheer time-wasting.
Admin
Yes, but that's if you deliberately WANT to create collisions and don't need to find a collision for a specific given hash. I can't really imagine a scenario in which that results in an exploitable weakness, let alone when it's about image data.
Admin
Two comments, for no particular reason other than that I'm surprised nobody has mentioned them yet:
Admin
It's (encoded ... or decored) binary data for pete's sake,
you mean decorated ... don't ya?
Admin
All good points, but you have to weigh those against the difficulties inherent in maintaining massively-sized databases. (Backups and recovery, partitioning databases, storage spanning, etc.) I'm sure that different database systems probably have some good solutions for these, but you'll need some DBAs that have good experience with maintaining large database structures.
I normally only have to maintain databases of under 1 GB, which means you don't have to worry about the more difficult aspects of these.
Admin
Yea, you should always base 64 encode your images before putting them in a database. That way you can query for them with LIKE.
Admin
isset() returns false if applied to existing array elements with the value null. If you really want to know if a key actually exists, array_key_exists is the ONLY reliable method.
Admin
HTTP Does NOT normally encode binary files in any way.
(Have a look at http://www.ietf.org/rfc/rfc2616.txt if you really care)
If the browser knew how to show base64 enocded images , there would be no need to
do a base64_decode
Admin
I actually found this quote more of a WTF than the original WTF. Wouldnt it make sense to replace the server with something a little more powerful...(Im tempted to suggest a 386SX but..) like a modern PC and port any apps than stay tied to something so incredibly legacy? I know Im being nieve here but ~25 YEARS is really pushing it........
Admin
Yea, I agree. At that point it absolutely has to be easier to graft external systems onto the legacy system than try to expand the legacy system to meet modern needs. I pretty much do legacy add-ons for a living...They're never elegant...Always some new crap bolted onto some obsolete crap, with none of the clean flow you'd expect off a new system. But they get the job done.
Admin
You are incorrect. The isset is checking to be sure the form was posted with a value to use as the key - it only looks at $_POST[]. It will not verify that it's set to something populated in the array, which will result in the mime type always being returned.
The very least this abomination could do is hand back a 404...