• Inhibeo (unregistered)

    This story is like Alanis Morissette's Ironic. The story is only a WTF in the sense that there is no WTF's in the story.

    It's like 10,000 servers, when all you need is a competent developer...

  • IByte (unregistered)

    A sign of the Apocalypse, that's what it is...

  • JG (unregistered)

    damn complicators...

  • (cs) in reply to john
    john:
    Uh.

    You could hash THE DATA used to generate the TIFF, and use THAT as an index, avoiding the whole TIFF-generation process.

    This is a stopgap. I would then rewrite the system from scratch in C.

    I guess C is your hammer? Can't we think of about two or three languages that would be more robust, modern, and maintainable? You want to replace crufty old VB 5 with crufty ANCIENT C?

  • Max (unregistered)

    [quote]Chicken or the Egg?

    Scott noticed that, after a TIFF was created, it passed through a fairly complex hashing function to generate a 64-byte long "ImageHash", which was then used as the TIFF's unique identifier./quote]

    Piece of cake to optimize. All you need to do is solve the halting problem.

  • Franz Kafka (unregistered) in reply to Wolfgang

    <quote>Just hash the input data using two different hash functions and store both hashes, Two hashes, to reduce the likeliness of a collision minimal. </quote>

    Just denorm the price info into a string and use that - no collisions and if you get fewer prices than skus - bonus!

  • Steve (unregistered)

    It seems to me TIFF generation is kind of a dumb solution for the problem. If the whole problem is indeed getting up-to-date signage printed at each store, I think it would make more sense to send the data for items that have changed across the wire and print the signs at the store using a report template. Faster, and they can send the now-unused servers to me.

  • nut n' bitch (unregistered) in reply to sup yo
    sup yo:
    work in tesco. price change every day. long time, yo.
    Speak any non-pidgin English?

    captcha: praesent - no time like it

  • enterprisey (unregistered) in reply to Steve
    Steve:
    It seems to me TIFF generation is kind of a dumb solution for the problem.
    Yeah. They should've created an HTML page, launched IE on it in full-screen mode and taken a screencap. And a proprietary plugin to allow closing IE from javascript when they're done.
  • (cs) in reply to OldPeter

    Leaving aside the question of whether it was better to throw hardware at the problem or switch to using PostScript instead of TIFF, does it strike anyone else that he could have made a respectable speed improvement by either eliminating the hash entirely or, if the ramifications were too great, at least replacing it with a cheaper method which returns a GUID?

  • Marcus (unregistered)

    This could have easily been done as a performance optimization project.

    1st step:

    • Delete all output files each time before the program runs. This eliminates additional overhead of full directories.

    2nd step:

    • Remove the hash. It's not useful, but adds additional slowdown.

    3rd step:

    • Delete all *.tiff images older than 10 days from the output directory each time the system runs.
    • Use the input data as filename, i.e. store1234-foobar-product--name-1$99.tiff
    • Check whether a file already exists before generating it and reuse it.

    4th step:

    • Instead of deleting old files using a fixed date (that will cause many of them to be recreated), delete them based on last access time.

    Result: measurable speed increase in each step, and at least 4 days of "work" ;-)

    captcha: letatio

  • praesent for you (unregistered) in reply to Litote.
    Litote.:
    blah:
    This site is a curious perversion in linguistic shamobatics.

    Your prize is half-an-internet for the word 'shamobatics'.

    ( o ) ( o )

    And here's the other half: ( y )

  • Sanity (unregistered) in reply to IMil
    IMil:
    crap crap crap:
    Hey, has anyone noticed that he could've just hashed the parameters that went into the creation of the tiff, instead of the tiff file itself? you know, things like the SKU, the price, maybe the date and the store or region.

    Seems like that would be a pretty simple fix just to get back to aptent.

    Oh, you are so clever. Yet it seems that you never had to deal with a poorly-designed System from Hell. [snip] And all of this - naturally! - would be stuffed in a single N*10000-line procedure, pixel output and data fetching all mixed up and dependent on each other.

    Not so simple now, is it, Mr Smart Guy?

    That is exactly what I thought, at first.

    Of course, the obvious question is, is this actually what was going on? Normally, even in a story like this, they'd elaborate a bit, and at least describe that codethulu, rather than assuming we all know what it means, and why the solution was what it was.

    As it is, we're left wondering if there wasn't some simple fix that would've reduced the needed servers by a factor of ten, and "just throw hardware at it" the rest of the way (but at significantly less cost).

    No, that really can't be assumed from context. Remember what site you're reading -- there's more than enough Paula Beans who would declare it "too hard" and throw dozens of times more hardware at the problem than is needed.

  • John (unregistered) in reply to J

    In the past I've leapt in too quickly to correct some egregious mistake. This time I checked the other comments, and someone has done it for me.

    I'll never understand, though, why people confuse 'than' and 'then' - two ENTIRELY different words. What's next? mixups over 'in' and 'on' ?

  • ass-laminate (unregistered) in reply to John
    John:
    I'll never understand, though, why people confuse 'than' and 'then' - two ENTIRELY different words. What's next? mixups over 'in' and 'on' ?
    The a/e thing qualifies as a typo - just like i/o.
  • John (unregistered) in reply to ass-laminate
    ass-laminate:
    John:
    I'll never understand, though, why people confuse 'than' and 'then' - two ENTIRELY different words. What's next? mixups over 'in' and 'on' ?
    The a/e thing qualifies as a typo - just like i/o.

    ... and then, when you see it, you FIX IT.

    We ARE looking at what we type aren't we? Can't always rely on spell-checkers.

  • (cs) in reply to John
    John:
    ass-laminate:
    John:
    I'll never understand, though, why people confuse 'than' and 'then' - two ENTIRELY different words. What's next? mixups over 'in' and 'on' ?
    The a/e thing qualifies as a typo - just like i/o.

    ... and then, when you see it, you FIX IT.

    We ARE looking at what we type aren't we? Can't always rely on spell-checkers.

    If, indeed, ever. A foolish consistency is the hobgoblin of little minds... and there's nowt wrong with anal-retentive behaviour, nosiree.

    Just make sure you stock up on Grammar Nazi Adult Diapers in case the retention ... doesn't. And don't bother buying them at Walmart; all they sell is that flimsy Chinese crap. Not built for Western butts.

  • netdroid9 (unregistered)

    This has to be a WTF at Scott's expense, just look at the title.

  • (cs)

    Everybody so far missed it...

    Step 0: put the thing in revision control.

    From here, it seems many people were somewhat sane:

    Step 1a: eliminate the current hash generation - this happens after the real work has been done, and does not present any opportunity to save anything other than storage space.

    Step 1b: create a new hash generation, based on the input to the image generation; use this hash to avoid generating new images.

    This may be sufficient right there. TIFF files tend to be rather large, and I've seen hashing functions perform sluggishly on such large files.

    If this isn't enough to solve the performance issue, then we need to step back and look at the problem in more detail:

    • If the input to the TIFF includes the store number, and the store number is always placed in the same region on the TIFF, pull that portion out, and splice it on either right before sending it to the store, or right after, depending on the capabilities of the remote systems. Of course, if it turns out the store number doesn't actually need to be on each sign, then it should be eliminated for simplicity.

    • If the placement of each of the parts of the signs are fixed, relative to each other, consider making a set of sign parts for the product name, the product barcode, the price, and so forth; then you only need to splice the parts together, which should process much quicker. If the computers at each store are sufficiently capable (they probably are), it should be pretty easy to do this splicing at each store. Splicing at each store will greatly reduce the data transfer time, as one only needs to transmit each price only once. Using rsync to do that transfer will save even more time, because it really will have to send the data only once - if you've put a product on sale before, the image of its name won't need to be sent again.

    • Consider putting the cached images in a hierarchical directory structure, where the first directory is named based on the first parameter in generating the image, and so forth, with the image itself being named based on the last parameter. This could reduce the accessing cost on the cache to a low enough level that one could manage to keep old sign images around long enough they may be used again.

    • If the current process sends uncompressed TIFFs to the various remote stores, consider turning on that LZW compression, or possibly switch to another format which is known for its lossless compression, such as PNG.

    Someone mentioned modifying the current process to do all the image generation at each of the stores, but it's possible (likely, even) that the cheap hardware at each store will be underpowered for the task. However, it's possible that they will be able to do it - especially if the task is changed to rendering the components of each sign separately (using a cache to prevent duplicate image rendering), and then splicing them together (still using a cache, to avoid re-rendering a sign for yesterday's sale.)

    In any event, it should be possible to get this beast to run on a single PC with some sane refactoring. Using revision control and a non-essential PC to do the development, it should be possible to do that with very little risk.

    Finally, remember that you're not really just talking about the price for more hardware today. A proper rework of the system should be a one time cost, but purchasing more hardware for the task happens over and over again. Since the machines are leased, that's not just buying a few more machines each time; instead, it's leasing the same machines plus some more each time.

  • (cs) in reply to smbarbour
    smbarbour:
    If the hardware cost is $4000, you owe approximately $1000.

    They're leasing 15 computers for a job that should take 1. I've never seen a leasing deal which lets one have 14 high performance PCs for a mere $4k per year. ($4k each, yes, but that's $56k, not $4k.)

    Also, they're using 15 computers now. Who knows how many they'll need to bring on the 400 new stores, if they need that?

  • Brady Kelly (unregistered)

    Of course, TRWTF is the incredible lengths people go to post workable solutions to the WTF. Are they looking for jobs, or just compensating for something?

  • (cs) in reply to Brady Kelly
    Brady Kelly:
    Of course, TRWTF is the incredible lengths people go to post workable solutions to the WTF. Are they looking for jobs, or just compensating for something?

    Some of us want to avoid encountering the WTF - or a similar WTF - ourselves, so we try to solve it so that it gets fixed before we happen across it.

    Anyone looking for a job here is probably fairly delusional - not only do the accounts in this forum link to our contact information, but one is more likely to be considered guilty by association for hanging out here. Also, most of the 'workable solutions' offered are as bad or worse than the problems themselves.

    (I realize the accounts in the actual forum, linked from here by the sidebar, as well as the banner at the top right, and possibly other places, does potentially link to our contact information. However, the guilt by association factor seems even greater over there...)

  • (cs)

    So the real WTF here is Scott?

  • JayDee (unregistered)

    No need to Hash anything, no need to store tiffs.

    1. Generate tiff from data

    2. Get list of stores to send this tif to

    3. Send tiff to stores

    4. Next.

    I suspect that the existing hardware could have generated a few hundred tiffs in one night pretty easily.

  • none (unregistered) in reply to Gary Williams

    Genius...your line of thinking is wicked.

    Carry on.

    captcha: praesent - the gift for which you prayed.

  • none (unregistered) in reply to none
    none:
    Genius...your line of thinking is wicked.

    Carry on.

    captcha: praesent - the gift for which you prayed.

    Of course, I was talking about this guy:

    Gary Williams:
    Buy a network card, get the MAC address from it, feed to the application as a static value, smash the network card.

    There you go. All the servers can use the SAME hash generated from a MAC address that can't be used anywhere else.

    Cheaper than buying more hardware.

    captcha: Augue - How Bostonians verbally disagree.

  • some other guy (unregistered)

    Why not generate numbers 0 through 9 in tiff format, save them to a file, then just concatenate them into a final image? Even if they had a big/small number sign type (dollar=big font, cent=small font), it would be 20 numbers plus the decimal per sign type...10 sign types? Maybe 20? that'd be 400 files...at a couple K each, that's a megabyte or two...then to process and send 'em out, that can't turn out to need 15 servers.

    I need to be in charge of everything...

  • Overstressed Admin (unregistered) in reply to Anon
    Anon:
    1. Retrieve price information from Oracle database 2. Generate TIFF of sign 3. Print sign, place on table 4. Place handwritten note with MAC address of system below sign 5. Take photo 6. Scan into TIFF format 7. Generate hash from TIFF to get your unique identifier 8. Profit!

    Thanks for my 1st laugh of the week.

  • JM (unregistered)

    "... hulking Oracle server ..."

    Surely this is a tautology? When do Oracle servers ever do anything but hulk?

  • Frodo Baggins (unregistered) in reply to Gary Williams
    Gary Williams:
    Buy a network card, get the MAC address from it, feed to the application as a static value, smash the network card.

    There you go. All the servers can use the SAME hash generated from a MAC address that can't be used anywhere else. ...

    I'm not so sure. Network cards are programmable. MAC addresses can be changed on the fly.
  • Frodo Baggins (unregistered) in reply to Quicksilver
    Quicksilver:
    So the real WTF here is Scott?

    Yes

Leave a comment on “Throw Some Hardware at it!”

Log In or post as a guest

Replying to comment #:

« Return to Article