• Matt (unregistered)

    I wonder.. did Ove ever follow up their explanation of "this is just how it's done" with the phrase "by idiots" or "by rip-off artists"?

    Because he should have.

    captcha: howdy (Well, howdy to you too)

  • Kraln (unregistered)

    So, their backup methodology ... worked? I fail to see the WTF in that aspect of the story.

  • Transcendor (unregistered) in reply to Kraln

    It didn't as you can see. If this went wrong, the master hard disk would have failed during weekend, and as the new "Monday" Drive gets installed, the Raid would fail completely at once. But that's not the point: The Bad Thing[tm] is, your backup depends on the functioning of the type of part you're backing up. It's simply "hey, I've got my harddrive A in my server that's gets copied to my Master disk, which, as a part of an Raid-1-configuration, gets mirrored to another hard disk." If you take a step back, you'll notice that both, the working and the master hard drive are as likely to fail as each other, giving you a 50% chance that in case of a disk failure among one of the two disks there is no working backup mechanism (as happened here) Furthermore, a library of harddrives that grows on a weekly basis is FAR more expensive than the same library of tapes, most propably even if the tapes get stored on a daily basis, which is, of course, even better.

  • jrrs (unregistered) in reply to Transcendor
    Transcendor:
    Furthermore, a library of harddrives that grows on a weekly basis is FAR more expensive than the same library of tapes, most propably even if the tapes get stored on a daily basis, which is, of course, even better.

    The article seemed to imply that only 7 disks were used in the rotation.

  • Alan (unregistered)

    I once worked for a company where the management insisted I use RAID 0, because if they paid for 200GB of disk space, they should get 200GB of disk space.

    CAPTCHA: craaazy

  • Derek (unregistered)

    For some reason, I was going in expecting the story to involve Ove really screwing things up. Without thinking, I saw "[Ove's] hourly support rate was $25 less than the current support guy" and figured, "Ove's a bad bet".

    Too early and cold (-23F/-31C) to think this morning.

  • Stan (unregistered)

    Yeah, I have to agree, the only WTF here (aside from the disorganized server room) is abusing the RAID controller to do your backups. That is a "creative" use of technology.

    External hard disks are now $0.40/GB, which is pretty competitive with tape, if not better in some cases. For short rotation backups, disks aren't so bad. Longevity for tape will be better if you need to save stuff for years and years (assuming you can still find a working drive to read them in 5-10 years). It depends on your needs, really.

  • Doc (unregistered) in reply to Transcendor
    Transcendor:
    Furthermore, a library of harddrives that grows on a weekly basis is FAR more expensive than the same library of tapes...
    Actually, try adding up a tape drive and the requisite number of tapes for any given backup routine. Then add up the cost of drives/carriers for the same backup routine. They're getting pretty close in cost!
  • Earl Colby Pottinger (unregistered)

    The big WTF to me was that the new network support guy took the contract without inspecting what they were using. How did he know it could be done.

    The very first "That is how it is done" from the customer should have set off warning bells.

    This "Raid" backup system at least works to recover a working system. But it could have been a lot worse.

    I have seen a system where the so-called 'sysop' never did a backup for a year and a half then the hard drive, died.

    I have seen incremental backups done to the same three tapes for years - UNRECOVERABLE to say the least.

    Worse, what if part of the network was Token-Ring, ARC-Net, or the bottom of the barrel Punter-Net (Yes, I have seen PCs with Punter-Net hardware, how I don't know), I lay odds that he was not ready to support such hardware.

    Not to mention the time the hardware was in a basement when one of the pipes burst, the tech (not me) found the server floating on one of the styrofoam packing pads, and the power was still on!

  • Earl Colby Pottinger (unregistered) in reply to Derek
    Derek:
    For some reason, I was going in expecting the story to involve Ove really screwing things up. Without thinking, I saw "[Ove's] hourly support rate was $25 less than the current support guy" and figured, "Ove's a bad bet".

    It sound like the original tech was making a killing. If he was charging $200 an hour then Ove's $175 an hour would still be a sweet deal.

  • (cs)

    Stories like this make me weep for what colleges and certifications spit out.

    Stories like this also mean that I'll have a much better time in the job market than previously thought.

  • Gareth Martin (unregistered)

    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

  • Neph (unregistered)

    wtf is Punter-Net?

  • eight days a week (unregistered) in reply to Gareth Martin
    Gareth Martin:
    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

    The difference is that they were abusing RAID. It's not meant to be used for backup, just redundancy. Otherwise you're right - there isn't much difference between backing up on tape and backing up on a hard drive.

  • Jerim (unregistered) in reply to Stan

    I have to agree. I can see a "I have never seen that before" but not a WTF. The goal of any backup system is to be able to recover most of the data if something breaks. It seems to me that that requirement was met. You may not like the means, but the results are all that matters.

  • Slepnir (unregistered) in reply to Neph
    Neph:
    wtf is Punter-Net?

    I googled it, and it turned up a bunch of sleezy British massage parlors. I don't know what was meant by "Punter-net hardware", and I don't want to know.

  • Unomi (unregistered) in reply to Jerim

    No, the results should be what you could have predicted to be the results. This is done by testing it. Test what the result of a restore would be like. Will it fail?

    The guy in place did not checked this at forehand before the crash happened. That is a WTF in itself. Saying you do a job, but not securing your methods by checking what is in place (or not been allowed to) is bad.

    This time he was quick enough to find out what was the case. But it could have been worse. Management only wants to know how the damage could happen. The higher the damage, the more likely you get a kick in the butt. Inventoring the risks of this, is the first thing you should do.

    • Unomi -
  • dan s (unregistered)

    Looks like the company previosly hired montell jordan for the network gig.

  • Clod (unregistered)

    I seriously hope the company took the previous sysadmin to court, particularly since he was effectively screwing them into paying him twice.

  • Dark (unregistered)

    The main problem with using RAID like this is that you can never be sure you have a consistent & complete backup. When you pull out yesterday's disk, you might be interrupting a write operation. If you shut the system down before making the switch then I see no problem with it, but it doesn't sound like that was done.

    The main article's assertion that the master disk failed BECAUSE of the RAID is pretty WTFy, though. Poor poor disk, having to deal with the abuse of being read from and written to.

  • Dumble (unregistered)

    Now that's what I call a RAID MANAGEMENT! :D

  • (cs) in reply to Clod
    Clod:
    I seriously hope the company took the previous sysadmin to court, particularly since he was effectively screwing them into paying him twice.

    What are they going to sue him for? He set up a system that worked for at least a few years. It may be an awful system, but it's not something he could be sued for (unless it was in a regulated industry and caused injury, but then it would have been discovered earlier). And as far as the pay, they agreed to it. You can't give somebody a contract for X amount of money and then complain later when you find out they should make less than that.

  • Eirik (unregistered)

    You guys don't see the real WTF here. That is if someone in the company accidently deletes some important files and don't realize what they've done until a month later. Then they will have no backup cause the drives have already cycled.

  • drex (unregistered) in reply to Dark

    Alright, I'm trying to tell from the description, is he running an actual backup, or is he just rebuilding and using the raid (5?) to make a copy of the current data on a live server? I don't think my dc would appreciate having to rebuild a couple of drives once the work day started.

  • (cs) in reply to Gareth Martin
    Gareth Martin:
    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

    Actually, no. Tape backup schemas don't (usually) do daily full backups. Usually it is incremental daily backups, and full backups are done end-of-week or end-of-month, depending on how critical the data is.

    And of course, RAID is not for backup. I cringe in the disk usage to resync a full RAID1... I used that back in one of my old jobs. It would ALWAYS resync when the server rebooted or crashed, givin me 5 hours of real slow response. eek.

  • verisimilidude (unregistered) in reply to Matt

    Well a monthly retainer plus hourly rate seems to be standard in the world of lawyers. Usually you get a few hours as part of the base but when you get sued my experience is that you had better have business insurance to cover the cost because those puppies don't work cheap. Mostly you pay the monthly retainer so that you can get some quick attention when it is needed. It isn't an insurance plan. I wasn't aware that the standard among contract sys-admins was different so I refuse to agree with the WTFness of that part of the story. Of course if I was hiring contract sys-admins and didn't figure out what the standard terms were that would be a WTF.

  • Onomatopoeia (unregistered) in reply to Earl Colby Pottinger
    Earl Colby Pottinger:
    The big WTF to me was that the new network support guy took the contract without inspecting what they were using. How did he know it could be done.

    Assuming he was confident that he could eventually solve whatever problems they had, the worst case scenario is that he'd have to work a lot of hours, which is not an entirly bad thing if you're getting paid by the hour. You'd certainly never want to take on a support contract without inspecting the current setup if you wanted to limit the hours you work, but that could easily not be the case.

  • waefafw (unregistered)

    The retainer might be there because the customer wasn't paying the contractor on time... I've had a few contractors tell me that they had to go on retainer due to customers that just wouldn't pay, even in 90 days.

  • (cs) in reply to Slepnir
    Slepnir:
    Neph:
    wtf is Punter-Net?

    I googled it, and it turned up a bunch of sleezy British massage parlors. I don't know what was meant by "Punter-net hardware", and I don't want to know.

    At a guess, something based on the Punter protocol.

  • Patrick (unregistered) in reply to Transcendor
    Transcendor:
    It didn't as you can see. If this went wrong, the master hard disk would have failed during weekend, and as the new "Monday" Drive gets installed, the Raid would fail completely at once. But that's not the point: The Bad Thing[tm] is, your backup depends on the functioning of the type of part you're backing up. It's simply "hey, I've got my harddrive A in my server that's gets copied to my Master disk, which, as a part of an Raid-1-configuration, gets mirrored to another hard disk." If you take a step back, you'll notice that both, the working and the master hard drive are as likely to fail as each other, giving you a 50% chance that in case of a disk failure among one of the two disks there is no working backup mechanism (as happened here) Furthermore, a library of harddrives that grows on a weekly basis is FAR more expensive than the same library of tapes, most propably even if the tapes get stored on a daily basis, which is, of course, even better.

    I dont' think there was a library...I think they just kept rotating the same drives.

  • iwan (unregistered)
    I googled it, and it turned up a bunch of sleezy British massage parlors. I don't know what was meant by "Punter-net hardware", and I don't want to know.

    He might have ment this rather than the massage parlor. That still doesn't really explain the 'punter-net hardware' though.

  • (cs)

    That's essentially how my company does backups. Except we use an external hardware RAID controller which sounds a very annoying alarm if either drive fails. We sell a lot of RAID enclosures to customers wanting to do the same thing.

  • (cs) in reply to Eirik
    Eirik:
    You guys don't see the real WTF here. That is if someone in the company accidently deletes some important files and don't realize what they've done until a month later. Then they will have no backup cause the drives have already cycled.

    What if it is 6 months before they realize they deleted something important? Or a year? You can't store data forever. Everything I have ever read on the subject, uses a 7 day cycle. In practice though, a monthly backup is the norm (Multiple weekly incremental backup, and at least one monthly full backup). At some point, the client has to be responsible for their own mistakes.

  • Benjamin Smith (unregistered) in reply to eight days a week
    eight days a week:
    Gareth Martin:
    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

    The difference is that they were abusing RAID. It's not meant to be used for backup, just redundancy. Otherwise you're right - there isn't much difference between backing up on tape and backing up on a hard drive.

    How do you "abuse RAID"? It's like abusing a hammer - RAID is just a tool. Hard drives are DESIGNED to read and write data. Using RAID this way reads and writes data.

    And in case you hadn't noticed, IT WORKED! They lost 1/2 days' data, but the business continued on its merry way with a minimum of fuss because the "poor, abused RAID" system did what it was supposed to - provided a relatively recent backup in case of primary HDD failure.

    About the only major improvement would be to use RAID1 with THREE disks instead of two...

    Personally, I backup offsite with rsync using a tool I wrote I call "Backup Buddy": http://www.effortlessis.com/backupbuddy

  • Dave (unregistered)

    The retainer thing isn't that unusual, at least among the folks I know. It's usually pretty small, maybe a couple hours worth of regular time, and it's taken as a payment for availability - if I'm going to make an effort to be able to take your calls at 3am, that's going to cost something, even if you never call me outside normal hours. If you actually do make me get out of bed at 3am because your half-assed backup won't work, that's going to cost more.

    I'm dying to know what the original guys hourly rate was, though.

  • waefafw (unregistered)

    I find it funny that people actually recommend tape backups. The failure rate for tape is unsettingly high. A major oil company recently had an email server failure, and their tape backups didn't work.

    People don't use cassettes even for their music; why would you use them for your critical data?

  • (cs)

    It's hard to read enough of these stories and not take a jaded viewpoint. How many people besides me wondered if there wasn't an edict at one time to use harddrives from someone because "tapes are evil" or some such nonsense? It's easier for me to believe that the client had a hand in their system being in its current state.

  • Benjamin Smith (unregistered) in reply to eight days a week
    eight days a week:
    Gareth Martin:
    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

    The difference is that they were abusing RAID. It's not meant to be used for backup, just redundancy. Otherwise you're right - there isn't much difference between backing up on tape and backing up on a hard drive.

    How do you "abuse RAID"? It's like abusing a hammer - RAID is just a tool. Hard drives are DESIGNED to read and write data. Using RAID this way reads and writes data.

    And in case you hadn't noticed, IT WORKED! They lost 1/2 days' data, but the business continued on its merry way with a minimum of fuss because the "poor, abused RAID" system did what it was supposed to - provided a relatively recent backup in case of primary HDD failure.

    About the only major improvement would be to use RAID1 with THREE disks instead of two...

    Personally, I backup offsite with rsync using a tool I wrote I call "Backup Buddy": http://www.effortlessis.com/backupbuddy

  • DaveyW (unregistered) in reply to waefafw
    waefafw:
    People don't use cassettes even for their music; why would you use them for your critical data?

    Because thats just "How it's done"!!!

  • Russ (unregistered)
    The bad disk was labeled "master." The other disk was labeled "Wednesday."

    Tremendous. I think I need that on a witty t-shirt...

  • TimmyT (unregistered)

    Anyone who thinks that putting a RAID into critical state by replacing a drive and rebuilding is NOT a WTF is a friggin moron. The whole "but it works" mentality is completely retarded. I understand most of you are developers and not network engineers but to an actual sysadmin this sets off high pitched alarms and red flashing lights in my head. This is not the same thing as backing up to a hard drive - Backup Exec will let you back up your system to an external hard drive if you so choose but breaking a mirror for backup purposes is totally insane.

    The best analogy I can think of would be SQL replication - imagine you have 8 servers, 2 online and configured as two-way Merge Replication. Each night someone shuts down the Subscriber and replaces it with another one and it re-syncs the database, replicating to the week-old version that's on the machine that was just brought online. Does it work? Probably. Is it the right way? No. Is it a WTF? Absolutely.

    As for the question over whether Ove should have taken the job without a complete analysis, that's not a WTF, we do it all the time. We just make sure the new customer signs something saying we're not responsible for their data, and we will respond to their issues and fix whatever needs fixing, and if it can't be fixed we suggest replacements or alternatives. If a client pays on time and doesn't bitch us out over their issues, they're considered a good client no matter how screwed up their network is. In fact we end up making some good money over time by gradually upgrading and replacing their problem systems and depending on how fast they like to spend money, eventually they will have a nice stable network that we can all be proud of.

    I personally hate maintenance contracts and don't do them but from what I understand they usually cover a certain amount of hours at a discounted rate, if you work more hours than that, they are charged at an hourly rate so it's not "one or the other", but the old guy definitely should not be charging for every single hour he worked, only what was above and beyond the hours covered in the monthly fee.

    captcha: ninjas - yes that's what we are, we login remotely in the dead of night, assassinate your viruses and spyware, and your system magically works fine when you arrive in the morning!

  • (cs) in reply to verisimilidude
    verisimilidude:
    Well a monthly retainer plus hourly rate seems to be standard in the world of lawyers. Usually you get a few hours as part of the base but when you get sued my experience is that you had better have business insurance to cover the cost because those puppies don't work cheap. Mostly you pay the monthly retainer so that you can get some quick attention when it is needed. It isn't an insurance plan. I wasn't aware that the standard among contract sys-admins was different so I refuse to agree with the WTFness of that part of the story. Of course if I was hiring contract sys-admins and didn't figure out what the standard terms were that would be a WTF.

    Lawyer retainers are generally paid up front to support the lawyer taking on the case. I'm married to a lawyer, and you'd be amazed at the number of "I signed this contract and it's not fair" types of people who will give up on their stupid reasons for wanting to sue the moment that a retainer fee is mentioned. They'll even get upset and start cursing at the lawyer over the phone over it! OF COURSE laywers should take on every case that comes in their door!

    Sadly, most cases do cost a lot in "lawyerin'" time and expertise (which is what you're paying for), so that retainer covers the initial part of the lawyer's work to take on your case. My wife just charges the client hourly and deducts straight from the retainer until her work goes past the retainer fee, at which point she begins billing for hours worked.

    If this sys-admin was charging a retainer and THEN charging hourly on top of the retainer, who cares?! He's just a sys-admin who is clueless about good business practices. It's nothing unethical per the non-existent "sys-admin rules of conduct", and part of the WTF then is that the client was likewise clueless to not be alarmed at the room full o' computers without proper organization, ventilation, and automation, not to mention the fact that they were willing to get double-billed for his semi-inept way of doing things.

  • (cs) in reply to Earl Colby Pottinger
    Earl Colby Pottinger:
    The big WTF to me was that the new network support guy took the contract without inspecting what they were using. How did he know it could be done.
    Indeed
    Not to mention the time the hardware was in a basement when one of the pipes burst, the tech (not me) found the server floating on one of the styrofoam packing pads, and the power was still on!
    Bigtime LOL!

    Did the server die while floating, or did it float enough? In the latter case, it was a good precaution, isn't it? Reminds me of the computer store where they had to put everything on wooden pallet in the basement, because sometimes some water would slip in :X

  • (cs) in reply to eight days a week
    eight days a week:
    Gareth Martin:
    What's the difference between copying the entire master disk to a tape every day and copying it to another hard-disk every day? Both ways you're doing a full read of the master disk.

    Although it might have been saner if it was a 3-disk raid 1, with two masters and a rotating backup, that way if a disk failed it wouldn't destroy that day's data.

    And if they were in a case.

    The difference is that they were abusing RAID. It's not meant to be used for backup, just redundancy. Otherwise you're right - there isn't much difference between backing up on tape and backing up on a hard drive.

    This is a common technique known as split-RAID or split-mirror backup. There are some minor deficiencies in their setup: normally you want one set of disks on a weekly cycle, and another set on a much longer cycle swapped every week, and perhaps taking a disk out of the longer cycle periodically for archiving, but there's no WTF here.

    Yes, constantly rebuilding the backup drives will wear out the master drive faster, but you can get around this by putting a second master on the array, or you can balance the load by alternating which disk you pull for the daily backup.

    Price-wise, for a small business with only a few servers, split-mirror backup is cheaper than a tape drive and a set of tapes. The price of tape has dropped in the past few years, but three years ago, the break-even point between tape and split-mirror was around a terabyte of data.

  • Darin (unregistered) in reply to Tukaro
    Tukaro:
    Stories like this make me weep for what colleges and certifications spit out.

    What makes you think the guy went to college?

    And "certifications" are generally shortcuts for "I didn't go to college, but here's a piece of paper I paid for that says it doesn't matter." Seriously, any company that thinks a certification is reliable proof of competence is deluded, and a company that requires certifications is hurting itself.

    Too many vendors use certification classes as marketting and propaganda tools instead of being about solid training.

  • Adam (unregistered)

    Regarding retainers:

    I am an independent software consultant, and recently started taking retainers after a few clients decided to mess around with my time. I had clients telling me, e.g., that they needed me for 30 hours a week for three months, and then only using me for 5 hours a week, or sometimes using me for 30 hours the first week and then not giving me any additional work for several weeks. As a one-man shop, this is an impossible situation for me to manage; if I agree to do 30 hours a week for one client, I have to refuse other work that comes in that might cut into that time.

    Solution: My new contracts specify minimum retainers paid upfront, generally 2/3 of the estimated amount of time that the client thinks they'll need me. So if they ask for 30 hours a week, they have to pay me for 20 upfront. This has been great so far! If they balk at it, I know they don't really have enough work for me and I can walk. And those that do pay almost always manage to use up at least the number of hours they've prepaid. It's funny how clients suddenly get so much more responsive when you have them on the hook, instead of the other way around!

  • ChengWah (unregistered) in reply to Doc

    I must agree with the cost thingy. If you're backing up 30-40Gb daily, its FAR cheaper to use external hard drives in USB cases, and rotate them. Laptop drives are even better than 3.5 inch drives for this purpose. Approximately 1/4 the cost of tape, and run a damn site faster.

  • Lars (unregistered)

    We regularly do diskswapping like this when upgrading RAID systems. Swap disk 1 with a bigger one, wait for rebuild. Swap disk 0 and then expand partition.

    But we always, always! do a full backup to tape and then take the server offline for the evening. If we for some reason cant take it offline, the client must acknowledge that the process may be hurtful to their data, and they must be prepared to do a full restore if something goes wrong. And it did once. On a RAID 5 system. But we had full backups, so the only consequence was a 2 hour delay.

    Yes, I used to these kind of jobs for a living. And yes, the only really decent backup process is full replication to another server (backup domain/file server), with full backups every night. Of course, not all data is that critical, but ours is.

  • Jargin (unregistered) in reply to waefafw

    Actually tape backups work just fine.

    As long as you remember one rule: Don't use same tape too often.

    Good rule of thumb is to use the tape manufactures recommended maximum number of writes, and divide by two, or three.

  • Loren Pechtel (unregistered) in reply to Stan

    The last time I looked at the price of tape vs the price of HD's it was about the same. It's also a lot easier to recover stuff off HD's that have mirror image backups on them.

    It's also very easy to check a HD-based backup, you're not dependent on some backup software. These days disks are my first choice as a backup method.

Leave a comment on “RAIDing Disks”

Log In or post as a guest

Replying to comment #:

« Return to Article