- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This story is like Alanis Morissette's Ironic. The story is only a WTF in the sense that there is no WTF's in the story.
It's like 10,000 servers, when all you need is a competent developer...
Admin
A sign of the Apocalypse, that's what it is...
Admin
damn complicators...
Admin
Admin
[quote]Chicken or the Egg?
Scott noticed that, after a TIFF was created, it passed through a fairly complex hashing function to generate a 64-byte long "ImageHash", which was then used as the TIFF's unique identifier./quote]
Piece of cake to optimize. All you need to do is solve the halting problem.
Admin
<quote>Just hash the input data using two different hash functions and store both hashes, Two hashes, to reduce the likeliness of a collision minimal. </quote>
Just denorm the price info into a string and use that - no collisions and if you get fewer prices than skus - bonus!
Admin
It seems to me TIFF generation is kind of a dumb solution for the problem. If the whole problem is indeed getting up-to-date signage printed at each store, I think it would make more sense to send the data for items that have changed across the wire and print the signs at the store using a report template. Faster, and they can send the now-unused servers to me.
Admin
captcha: praesent - no time like it
Admin
Admin
Leaving aside the question of whether it was better to throw hardware at the problem or switch to using PostScript instead of TIFF, does it strike anyone else that he could have made a respectable speed improvement by either eliminating the hash entirely or, if the ramifications were too great, at least replacing it with a cheaper method which returns a GUID?
Admin
This could have easily been done as a performance optimization project.
1st step:
2nd step:
3rd step:
4th step:
Result: measurable speed increase in each step, and at least 4 days of "work" ;-)
captcha: letatio
Admin
Admin
Of course, the obvious question is, is this actually what was going on? Normally, even in a story like this, they'd elaborate a bit, and at least describe that codethulu, rather than assuming we all know what it means, and why the solution was what it was.
As it is, we're left wondering if there wasn't some simple fix that would've reduced the needed servers by a factor of ten, and "just throw hardware at it" the rest of the way (but at significantly less cost).
No, that really can't be assumed from context. Remember what site you're reading -- there's more than enough Paula Beans who would declare it "too hard" and throw dozens of times more hardware at the problem than is needed.
Admin
In the past I've leapt in too quickly to correct some egregious mistake. This time I checked the other comments, and someone has done it for me.
I'll never understand, though, why people confuse 'than' and 'then' - two ENTIRELY different words. What's next? mixups over 'in' and 'on' ?
Admin
Admin
... and then, when you see it, you FIX IT.
We ARE looking at what we type aren't we? Can't always rely on spell-checkers.
Admin
Just make sure you stock up on Grammar Nazi Adult Diapers in case the retention ... doesn't. And don't bother buying them at Walmart; all they sell is that flimsy Chinese crap. Not built for Western butts.
Admin
This has to be a WTF at Scott's expense, just look at the title.
Admin
Everybody so far missed it...
Step 0: put the thing in revision control.
From here, it seems many people were somewhat sane:
Step 1a: eliminate the current hash generation - this happens after the real work has been done, and does not present any opportunity to save anything other than storage space.
Step 1b: create a new hash generation, based on the input to the image generation; use this hash to avoid generating new images.
This may be sufficient right there. TIFF files tend to be rather large, and I've seen hashing functions perform sluggishly on such large files.
If this isn't enough to solve the performance issue, then we need to step back and look at the problem in more detail:
If the input to the TIFF includes the store number, and the store number is always placed in the same region on the TIFF, pull that portion out, and splice it on either right before sending it to the store, or right after, depending on the capabilities of the remote systems. Of course, if it turns out the store number doesn't actually need to be on each sign, then it should be eliminated for simplicity.
If the placement of each of the parts of the signs are fixed, relative to each other, consider making a set of sign parts for the product name, the product barcode, the price, and so forth; then you only need to splice the parts together, which should process much quicker. If the computers at each store are sufficiently capable (they probably are), it should be pretty easy to do this splicing at each store. Splicing at each store will greatly reduce the data transfer time, as one only needs to transmit each price only once. Using rsync to do that transfer will save even more time, because it really will have to send the data only once - if you've put a product on sale before, the image of its name won't need to be sent again.
Consider putting the cached images in a hierarchical directory structure, where the first directory is named based on the first parameter in generating the image, and so forth, with the image itself being named based on the last parameter. This could reduce the accessing cost on the cache to a low enough level that one could manage to keep old sign images around long enough they may be used again.
If the current process sends uncompressed TIFFs to the various remote stores, consider turning on that LZW compression, or possibly switch to another format which is known for its lossless compression, such as PNG.
Someone mentioned modifying the current process to do all the image generation at each of the stores, but it's possible (likely, even) that the cheap hardware at each store will be underpowered for the task. However, it's possible that they will be able to do it - especially if the task is changed to rendering the components of each sign separately (using a cache to prevent duplicate image rendering), and then splicing them together (still using a cache, to avoid re-rendering a sign for yesterday's sale.)
In any event, it should be possible to get this beast to run on a single PC with some sane refactoring. Using revision control and a non-essential PC to do the development, it should be possible to do that with very little risk.
Finally, remember that you're not really just talking about the price for more hardware today. A proper rework of the system should be a one time cost, but purchasing more hardware for the task happens over and over again. Since the machines are leased, that's not just buying a few more machines each time; instead, it's leasing the same machines plus some more each time.
Admin
They're leasing 15 computers for a job that should take 1. I've never seen a leasing deal which lets one have 14 high performance PCs for a mere $4k per year. ($4k each, yes, but that's $56k, not $4k.)
Also, they're using 15 computers now. Who knows how many they'll need to bring on the 400 new stores, if they need that?
Admin
Of course, TRWTF is the incredible lengths people go to post workable solutions to the WTF. Are they looking for jobs, or just compensating for something?
Admin
Some of us want to avoid encountering the WTF - or a similar WTF - ourselves, so we try to solve it so that it gets fixed before we happen across it.
Anyone looking for a job here is probably fairly delusional - not only do the accounts in this forum link to our contact information, but one is more likely to be considered guilty by association for hanging out here. Also, most of the 'workable solutions' offered are as bad or worse than the problems themselves.
(I realize the accounts in the actual forum, linked from here by the sidebar, as well as the banner at the top right, and possibly other places, does potentially link to our contact information. However, the guilt by association factor seems even greater over there...)
Admin
So the real WTF here is Scott?
Admin
No need to Hash anything, no need to store tiffs.
Generate tiff from data
Get list of stores to send this tif to
Send tiff to stores
Next.
I suspect that the existing hardware could have generated a few hundred tiffs in one night pretty easily.
Admin
Genius...your line of thinking is wicked.
Carry on.
captcha: praesent - the gift for which you prayed.
Admin
Of course, I was talking about this guy:
captcha: Augue - How Bostonians verbally disagree.
Admin
Why not generate numbers 0 through 9 in tiff format, save them to a file, then just concatenate them into a final image? Even if they had a big/small number sign type (dollar=big font, cent=small font), it would be 20 numbers plus the decimal per sign type...10 sign types? Maybe 20? that'd be 400 files...at a couple K each, that's a megabyte or two...then to process and send 'em out, that can't turn out to need 15 servers.
I need to be in charge of everything...
Admin
Thanks for my 1st laugh of the week.
Admin
"... hulking Oracle server ..."
Surely this is a tautology? When do Oracle servers ever do anything but hulk?
Admin
Admin
Yes