• (nodebb)

    As we said in the military:

    Once is bad luck. Twice is a coincidence. Three times is enemy action.

    I'd sure be looking under the nearby rocks for some sign of that enemy.

  • (nodebb)

    Why is this categorized as CodeSOD?

  • Joel (unregistered)

    A back of the napkin calculation tells me that the odds of this happening (the drive failing on the same date on three years just by chance) is about 1 in 13 million. Something must be causing this failure each year.

  • (nodebb) in reply to WTFGuy

    My dad's version of that was "once is happenstance, twice is coincidence, three times is a conspiracy".

  • Sheriff Fatman (unregistered) in reply to Steve_The_Cynic

    Ian Fleming's version, which he puts in the mouth of Auric Goldfinger, combines both: "Once is happenstance. Twice is coincidence. Three times is enemy action."

  • Anonymous') OR 1=1; DROP TABLE wtf; -- (unregistered)

    Maybe August 12 was always the day that the cleaning staff decided to come into the server closet and whack some drives with a vacuum cleaner or something.

  • Jason Stringify (unregistered)

    "Farenheight"?

  • Die Kuhe (kein roboter) (unregistered) in reply to Jason Stringify

    High temperatures, I suppose...

  • Roby McAndrew (unregistered) in reply to Jason Stringify

    Apparently Qi Faren is a Chinese aerospace engineer. I don't know what his height is.

  • (nodebb)

    private git hosting

    A better "a long time ago" storytelling technique is to replace that by "a private CVS server".

    Ah the memories... of struggle mostly. Who didn't like spending nights fixing a corrupted CVS install by hand editing RCS files because it turned out the startup not only didn't have any fail over hardware but they also did not have backups (*) of the entire source code of the small company?

    ( The thing with backups that I learned: if you create backups but never test them, then you do not have backups.)

  • Rick (unregistered) in reply to Jason Stringify

    Fair in height, with a tall complexion

  • Komodo Dragon (unregistered)

    One of the things we noticed as a RAID company, if you fill your array with drives all from the same batch, the odds of two failing nearly at once is actually substantial. RAID is not always the panacea one might like for it to be. ... And of course, there were always the customers that never bothered to replace the failing drive in a RAID-5 config, until the second one died anyway.

  • my name (unregistered) in reply to Anonymous') OR 1=1; DROP TABLE wtf; --

    I thought in the same line, but more that one of their neighbors switched off their electric appliance which could have caused a power surge

  • Stuart (unregistered) in reply to Joel

    Without doing any maths, my "something is off" sensor was pinging wildly at the statement that this happened three times. If I were the sysadmin there, I'd be thinking very carefully about what could be causing the issue, and mitigating as much as possible - as well as setting up a video recorder to see if anything untoward happens physically at that time. Surge protector and UPS, at a minimum.

  • Officer Johnny Holzkopf (unregistered) in reply to Komodo Dragon

    IBM DTLA... "We don't need no stinkin' backups - we have enterprise RAID now!"

  • (nodebb)

    Having used RAID for personal use in NAS boxes, RAID 5 is probably the worst. Because nothing is worse than seeing your box rebuild the array but worrying of another drive failure that takes it all down. After all, if a drive is going to die, it's going to be when it's most busy like during a rebuild. RAID 5 rebuilds are just too damn stressful.

    At last RAID 6 now if a drive dies, you can have another drive die while rebuilding and still be OK.

  • (nodebb) in reply to Worf

    And also it's highly likely that all the original drives in the array came from one manufacturing batch, and therefore there's a higher-than-normal chance that a second will fail shortly after the first fails even without the stress of rebuild.

    Correlation risk is a bitch.

  • Scragar (unregistered) in reply to Joel

    Had a similar thing at a previous place I worked back when everything was spinning disks.

    The issue occurred between christmas and new years every year, but we weren't allowed back in the buildimg until the 2nd of January by which point it was hard to figure out what happened.

    First couple of years we didn't know until after it'd happened, and had no hope of discovering the problem.

    Then one year a few of us decided to check it out and decided to stay slightly later on xmas eve in the server room setting up additional monitoring and backups. Then the air cooling system for the server room turned off.

    It turns out the air cooling for the server room was inadvertantly being turned off when the thermostat for the offices was turned off before the winter break. Xmas was the only time the office was ever shut for multiple days so the janitor was turning it off on xmas eve, then on again on the 1st of January so the office was fine again on the 2nd of Jan when everyone returned from the winter break.

    The server room would get hot and massively increase the failure rate for the drives, but by the time we came back days later the room would be back to a normal temperature.

Leave a comment on “The Spare Drive”

Log In or post as a guest

Replying to comment #688583:

« Return to Article