• BP (unregistered) in reply to its me
    its me:

    ...
    Some people are blaming IT for the server wipe, but that's like blaming the hard drive for failing.
    ...


    Indeed I am (not to speak for anyone else) because thats what Systems should be doing. If the drive failes, you gotta' ask why didn't systems backed it up?

    Like I said in another post, I've never had systems wipe a box used by another department without asking first (and providing move/backup time/planning as needed).

    There is no excuse.
  • BP (unregistered) in reply to cronthenoob
    cronthenoob:


    We moved the server in the classroom, hooked it up, turned it on, and nothing happened.  All the data was corrupted somehow, everybody lost everything they ever created for the past 4 years.  I don't remember but I think we were able to recover some of our work.

    in the end, the school payed me back my tuition.   But I still have to live with the fact that I wasted 4 years of my life because of some idiot who doesnt know how to secure a PHP mailer.


    A whole classroom of "students"didn't know how to use revision control?
    Yeee Gads -- What do they teach in the schools anyway?

    Actually I shouldn't be surprised... I've just move to a new company and have had to explain the concept of Concurrent Version System a dozen times... now if only I could get people to check in their code more than once a week!



  • (cs) in reply to Alex Papadimoulis
    Alex Papadimoulis:
    What remains are convoluted messes of code that require dechypering lots of lines of often language-specific code just to understand what's wrong with it.

    Oh, I see that you received my submission.

  • enska (unregistered) in reply to Enric Naval

        In all the years in the industry Im yet to see a company where you have "unused temp" servers up and running. Such things do not exists, there's always someone who needs that last yet unused box for something.
    And a "temp server that can get wiped clean at any moment" is utterly useless to development people. Such a box would only be useful to the admins so they can try for example new software versions or updates or whatever. That also implies that nobody else but the admins should have access to such computer.

    So all in all a stupid act from the admin group. Its common sense really. If there's a computer/server on the network someone WILL be using it no matter what so you can't just blow it up and admis such an act with "oh, but I was just following the policy".

    But one thing that strikes me as really odd is that they weren't able to recover most if not all all documents from the participants personal computers. When I write a document I always have temporary versions saved on my own computer untill I publish the document at which I point I get rid of the oldest temps and only keep the latest copy.

    But maybe this system was an all encompassing "CSM", and they wrote all their documents, spread sheets, project plans and such entirly using the system. Sounds strange though, given the portability problems etc.

    And then a personal story. For the past 4-5 years I have been writing my workout reports into a single .txt file thats kept on my server. The file is precious and there's no way to recreate the data. So one time it just happened so that my SMB client messed up and mangled the file, resulting in a file size of 0 bytes on the server. All my data gone! Thankfully though I was able to recover about 98% of the data by mounting the device read only, grepping the partition and then dumbing about 100k bytes from a certain file offset to a file.

    Needless to say since then I have written a backup script that runs every night, tars and gzips a specific backup/ folder plus my cvsroot/ and uploads them to my university server.

  • Espen (unregistered)

    Isnt it stupid to delay the project for a year when companies like ibas probably could fix this in a week.....

  • Mike (unregistered)

    As soon as you read the words "the folks in Network Operations recommissioned an unused development server" you can basically guess the rest...

  • (cs) in reply to Charles Perreault
    Anonymous:
    Anonymous:
    Anonymous:
    First to wonder: who is the WTFer here????  The executives who hired the Expert?  The expert who turned a demonstration into the bakbone of the company's biggest project?  The executives who didn't realize this?? The policy of no backup on dev temp servers???
    The Network Operations people are also decidedly guilty. Even an unbacked-up dev server might have a few days work of one or two people on it, and shouldn't be junked without checking. The briefest check would have shown a lot of recently modified files.


    I totally agree.  I work in a not-WTF university.  There isn't any server or workstation that is wiped without first being ghosted. 


    I agree but its a good way to get an expert project manager fired... Maybe someone had a hidden agenda?
    BTW: I worked for a company that backed up the 'Oracle servers' on a regular basis. But the tapes (yeah I know) weren't written, and the combination of the OS (solaris) and the tapemachine couldn't validate if the data was written correctly. 4 months later someone needs a backup, and there is nothing useful on the tape...

    Bummmmmmerrrrrr!

  • (cs) in reply to Oliver Klozoff
    Oliver Klozoff:
    Anonymous:
    We don't have the tape, nor the bacdwidth to backup dev servers. That's why we run CMS and back that up. If anything dies in dev, just reload from Source Safe. [emphasis mine]

    You're fired.

    Ensign, get this repository transferred to Perforce...



    Under which language does "Perforce" mean "Subversion"? ;-)

    Seriously, SVN has got to be my favorite source control program, especially for my private projects.  It works extremely well under Windoze.

    Doesn't matter, SVN is a very good source control system, Perforce is a very good one as well, just use the one you prefer/is available.

  • po the violent hamster (unregistered)

    WTFs this bad should come with a warning sign. Now I'll have to spend the rest of my day in the fetal position, wriggling with angst. Horror.

  • Maddogdelta (unregistered) in reply to Demaestro

    To be fair to the PM though he did ask for a deticated server originally and they weren't up for shelling it out. I would say that for a project of this size that you give the PM the tools he requests to get the job done.

    But the issue is that he  was not authorized to get the software or purchase a server to house the app.  Which means 1) no network support, as has been seen 2) the software was probably not properly licensed, and was one phone call away from being sued by the BSA.  An 'Expert' should be aware of these things.

    This is like like hiring a pro electrition for doing a huge buildings rewiring but not giving him a tool to test if wires are live then going..... WTF how did he electricute himself? Give your workers the tools they need.

    No, this is more like hireing a programmer who refuses to develop with the tools that are available in the company that hires him/her.  If the 'expert' had such an issue that he couldn't work without 'his tools', then he needs to be a little more clear regarding the terms of hire.  For example, maybe his services include a license to use 'the tool'.
    And listen to your staff when they tell you something is wrong. Getting the software and hardware for the PM would have solved this. Listening to the developer's concerns also would have helped. Save a penny lose a dollar.

    That hits the nail right on the head. Every organization that I have dealt with that refused to listen to the folks 'on the deck plates' crashed and burned.  The most successful organizations always solicited input from the workerbees. 

  • (cs) in reply to obediah

    Everyone who's saying 'what use is a computer that can be wiped in 30 seconds' – that's not what happens to temp servers if you tell net ops that you're using them. Temp servers are there for 'oh, I wonder what happens if ...' projects that take a week or so. A temp server that has had no projects assigned to it for over a year (and was therefore presumably the longest-free server when the request for a terminal server came through) is an entirely reasonable thing to wipe.

  • (cs) in reply to Bob Janova
    Bob Janova:
    Everyone who's saying 'what use is a computer that can be wiped in 30 seconds' – that's not what happens to temp servers if you tell net ops that you're using them. Temp servers are there for 'oh, I wonder what happens if ...' projects that take a week or so. A temp server that has had no projects assigned to it for over a year (and was therefore presumably the longest-free server when the request for a terminal server came through) is an entirely reasonable thing to wipe.


    But from your own arguments they should have checked, because the temp projects, the ones that last only for a week or so, could be important enough to be saved!

    Not without checking anything. If that is your companies policy, than please give me the name of the company. For I do not want work there ever!

  • (cs) in reply to zamies
    zamies:
    But from your own arguments they should have checked, because the temp projects, the ones that last only for a week or so, could be important enough to be saved!


    No, because no-one had told them, in the last week or so (or in fact waaaay longer), that they were using that server. Okay, in an ideal world they'd check anyway, but you can bet any money the 'expert' PM would have bitched about the delay in getting that terminal server.
  • (cs) in reply to SL
    SL:
    GoatCheez:
    Technically, it wasn't random at all. The last time that machine was ever used for anything was probably when the PM asked for a demo server for the software they were using. When they wiped that machine, that request was over a year and a half old.


    If that's the case, I'd be tempted to say that The Real WTF (TM) is that this machine was plugged in and running for a year and half without being used for anything.


    It was used for the "Expert Team Collaboration Software"

    SL:

     If it's a machine that could be taken away and wiped clean at any time without notice, nobody (in their right mind) would ever use it.


    Yes, EXACTLY... One of the more major WTFs.

  • (cs) in reply to GoatCheez

    I'm surprised that no one has noted that a major part of this WTF is the fact that the PM asked for the software and was told no, but was then allowed to go ahead and install a demo version of the software. If that hadn't been allowed, none of the resulting problems would have existed, because the dev temp server would never have existed.

    If the PM had already been told they couldn't have the software, WTF would they be allowed to install it? Why was the dev temp server even allowed?

  • (cs) in reply to Bob Janova
    Bob Janova:
    zamies:
    But from your own arguments they should have checked, because the temp projects, the ones that last only for a week or so, could be important enough to be saved!


    No, because no-one had told them, in the last week or so (or in fact waaaay longer), that they were using that server. Okay, in an ideal world they'd check anyway, but you can bet any money the 'expert' PM would have bitched about the delay in getting that terminal server.


    Well then I lived in an ideal world sofar :)

  • (cs) in reply to GoatCheez
    GoatCheez:

    SL:

     If it's a machine that could be taken away and wiped clean at any time without notice, nobody (in their right mind) would ever use it.


    Yes, EXACTLY... One of the more major WTFs.



    Well I''m not arguing the fact that using a temp server as a production server is a major WTF.
    But IMHO a temp server is a server to which people have access without having to inform the administrators. If the temp server should get an other purpose, the system administrator(s) should inform the employees.

    Failing to inform is IMHO a administrator WTF


  • (cs)

    Real Management doesn't "backup" -- they only "move forward".

  • (cs)

    That IT department must be foolish and damn idiotic not to back up any HDD before wiping it. Even my wife, who lives in the world of servers, though the NetOps kid that wiped the drive should have their head smacked around. That is just really bad.

  • (cs) in reply to zamies
    zamies:
    GoatCheez:

    SL:

     If it's a machine that could be taken away and wiped clean at any time without notice, nobody (in their right mind) would ever use it.


    Yes, EXACTLY... One of the more major WTFs.



    Well I''m not arguing the fact that using a temp server as a production server is a major WTF.
    But IMHO a temp server is a server to which people have access without having to inform the administrators. If the temp server should get an other purpose, the system administrator(s) should inform the employees.

    Failing to inform is IMHO a administrator WTF


    There were Many, MANY MANY Wtfs in this case... I'll try to list them and prioritize how big they are imho.

    1. The project manager failed to inform IT that the demo/temp server was being used for non-demo/temp purposes.
    2. A temp/dev server was used for production project data (the Expert Team Collaboration Software)
    3. The collaboration software was deemed to expensive for the 2.3M euro project.
    4. The whole project team continued to use the demo software.
    5. The IT dept failed to ask the rest of the company if the server going to be wiped was in use.
    6. The temp/dev server that was wiped was not checked for use, nor was it backed up prior to the wipe.

    I'm sure there are more WTFs, but those I think are the main ones. If any one of those weren't true, then this WTF would never have happened. The order is simply my personal order. It is hard to quantify how WTFy something is, so if you disagree on the order, good for you.



  • Drum D. (unregistered)

    Names, ladies and gents names.
    How can we ever saved from such utter stupidity when we don't know the names of those who might bring it into our companies.

    Still laughing my ass of about how an "expert" risks his job by acting like that.

  • Anthony (unregistered) in reply to cronthenoob

    OMFG!

  • (cs)

    My last employer mentioned a couple years after I left that they disposed of the system where I did all my work, accidentally deleted the file share I kept all my backups on, and wiped the server that held an additional copy of a website I had worked on. They were a bit saddened over the loss.

    Oh well. At least I still have the experience...

  • Nobody (unregistered) in reply to GoatCheez
    GoatCheez:
    Anonymous:
    Anonymous:
    Lots of blame to go around on this one. I agree that even the cursory of checks before wiping this machine would have prevented this (as a last defense of course). Just shoot an email out that says "Hey, anyone using DEVT09? We're gonna wipe it at 5PM on Friday." Simple. Plus you've covered your ass.

    Hell, just unplugging the network cable or admin-downing its switch port a week before re-imaging the machine would have been something.  Dev-temp box or not, you don't just pick a random box and start re-imaging it.

    Of course if they had a web checkout page for dev-temp servers and this one had long ago expired, then the net-ops people are a lot less at fault.


    Technically, it wasn't random at all. The last time that machine was ever used for anything was probably when the PM asked for a demo server for the software they were using. When they wiped that machine, that request was over a year and a half old. The PM should have sent a memo to IT telling them that even though the machine was a dev machine, it had valuable infomation on it and should be treated like a server as much as possible. That is the biggest WTF imo.

    There were things that IT could have done to prevent what happened, but none of them were WTFs. IT didn't disobey company policy.

    The PM decided to use a dev box for something that should be on production. A big WTF yes, but not as big of a WTF as him not deciding to tell IT that he is doing so.


    You must be in the ITS department at our university:
    1.  Don't give the users access to the tools that they ask for or even the tools that would help them perfrom their jobs in a more efficient manner.
    2.  Ignore anything the users do to circumvent the restrictions already placed on them by #1 above (users are going to do what it takes to get the job done, regardless of ITS).
    3.  Regularly wipe out large portions of the users work forcing them to have to manually rebuild everythign (since ITS hasn't given them access to save their work in projects, there is no chance that they can just restore what they have already done).
    4.  Claim that it's just ITS policy...
  • (cs) in reply to Nobody
    Anonymous:
    GoatCheez:

    Technically, it wasn't random at all. The last time that machine was ever used for anything was probably when the PM asked for a demo server for the software they were using. When they wiped that machine, that request was over a year and a half old. The PM should have sent a memo to IT telling them that even though the machine was a dev machine, it had valuable infomation on it and should be treated like a server as much as possible. That is the biggest WTF imo.

    There were things that IT could have done to prevent what happened, but none of them were WTFs. IT didn't disobey company policy.

    The PM decided to use a dev box for something that should be on production. A big WTF yes, but not as big of a WTF as him not deciding to tell IT that he is doing so.


    You must be in the ITS department at our university:
    1.  Don't give the users access to the tools that they ask for or even the tools that would help them perfrom their jobs in a more efficient manner.
    2.  Ignore anything the users do to circumvent the restrictions already placed on them by #1 above (users are going to do what it takes to get the job done, regardless of ITS).
    3.  Regularly wipe out large portions of the users work forcing them to have to manually rebuild everythign (since ITS hasn't given them access to save their work in projects, there is no chance that they can just restore what they have already done).
    4.  Claim that it's just ITS policy...


    Do your users store their important information in directories labled TEMP. Or do they put them on boxes labeled "Testing purposes only". Perhaps they are putting their files on the machine labeled "Demo Machine - Use At Own Risk". See where I'm going!?!?!?!?

    Thanks to your post I just noticed that I contradicted myself earlier. I had stated "There were things that IT could have done to prevent what happened, but none of them were WTFs.", but then I later posted:


    5. The IT dept failed to ask the rest of the company if the server going to be wiped was in use.
    6. The temp/dev server that was wiped was not checked for use, nor was it backed up prior to the wipe.

    I'm sure there are more WTFs, but those I think are the main ones. If any one of those weren't true,
     then this WTF would never have happened.


    I'll expand on #1. I think the WTF labeled #1 is the biggest because not only does it require the least amount of effort on all parts to do, but it also is the most obvious thing that should have been done as soon as the whole team was using that machine.
  • (cs) in reply to Nobody

    A question to those putting the larger part of the blame to the network ops:
    What if the machine had a head crash the week before?
    Would it be considered a WTF not to include a test machine in the regular backup?

  • (cs)

    I'm surprised that no-one has yet mentioned FILE_NOT_FOUND.  It seems rather appropriate here.

  • (cs)

    This whole thread made me go and backup all sourcefiles ;)

  • LRB (unregistered) in reply to ammoQ

    ammoQ:
    A question to those putting the larger part of the blame to the network ops:
    What if the machine had a head crash the week before?
    Would it be considered a WTF not to include a test machine in the regular backup?

    I'll say that I argued that the network ops committed a serious WTF, but they don't appear to be in the top 2 IMO from what we've seen.  The executives and the PM are far more culpable for this fiasco the network ops.  However, just because someone else screwed up worse than you did, doesn't give you a free pass IMO.

    I don't think that in most cases it would be a WTF for the network ops if the machine had a head crash and there was no backup or test machines weren't in the regular backup, as long as there was a clear policy stating these machines would have not regular backups and the policy was well communicated to all users who had access to the machines in question. 

    However I think there is a huge difference between not having a backup incase something outside of the direct control of network ops causes data to go missing on the test computers and delibertly destroying the data without 1st making a backup. 

  • ajk (unregistered) in reply to cronthenoob
    cronthenoob:
    This reminds me of something that happened to me in college!

    It was my final semester and my final project was coming along very nicely.  The guy next to me was creating a sort of email type thing,  I don't exactly remember. 

    Well he didn't secure it.  Some spammers got ahold of the script and sent out mass amounts of SPAM email.  The ISP shut the web server down.  After a few weeks, the server was finally online again.  The guy never secured his code, so of course it went down again.  Eventually my teacher decided to take the server and move it into the classroom where we could at least work on our projects locally. 

    We moved the server in the classroom, hooked it up, turned it on, and nothing happened.  All the data was corrupted somehow, everybody lost everything they ever created for the past 4 years.  I don't remember but I think we were able to recover some of our work.

    in the end, the school payed me back my tuition.   But I still have to live with the fact that I wasted 4 years of my life because of some idiot who doesnt know how to secure a PHP mailer.


    seems pretty clear who the idiot is...
  • (cs) in reply to ammoQ
    ammoQ:
    A question to those putting the larger part of the blame to the network ops:
    What if the machine had a head crash the week before?
    Would it be considered a WTF not to include a test machine in the regular backup?


    There isn't a tiny amount of incompetence in this story that we have to fight over like misers to appropriate. There is plenty of room to point and laugh at all the WTFs, the wunder manager and ops both crapped all over best practices (as well as average and even bad practices).
  • J Random Hacker (unregistered) in reply to Maddogdelta
    Anonymous:
    To be fair to the PM though he did ask for a deticated server originally and they weren't up for shelling it out. I would say that for a project of this size that you give the PM the tools he requests to get the job done.


    Well, top of the range project/code management software can get ridiculously expensive; the hardware cost of the server is nothing. We could easily be talking $100K or more,  which will make a dent in even a medium sized project like his. Think Perforce is expensive? - you ain't see nothing.

    And you have to train the whole development team to use this software, probably just for this one job. There are a lot of 'prima donna' programmers around who think they are 'leet and get to use whatever tools and languages they feel like - everyone else just gets in the way - hiring one as a PM is surely the biggest WTF of all though.
  • YodaYid (unregistered) in reply to snoofle
    snoofle:

    I used to manage a single server that was used exclusively for experimental software (during the IE/Netscape release-of-the-hour browser wars). I would be forever reinstalling software on it (Orbix/Visibroker/NES/...  - all alpha releases). About every 7 or 8 days, I'd need to reinstall windows because there was just that much crap and no way to undo it. Every couple of weeks I'd need to wipe the whole disk and start clean.

    The rule was that you could access the server to try the new software, but never put anything on it.

    Every once in a while, someone would put something on there without telling me. One such person in particular used it as a backup drive -  without telling anyone. They never noticed that their backups were being obliterated because they never looked at them. I found out about it 6 months later when they went to retrieve something from their backup files, couldn't find it and started to panic. They went to my boss, and her boss, and complained - loudly - that I trashed their backups. I got called in to explain. My response:

    "What do you expect when you use a dev server named: DoNotUse for backup?"



    I know what the somone was thinking: "Hey - DoNotUse!  That means I have it all to myself!" :-)
  • YodaYid (unregistered) in reply to ammoQ
    ammoQ:
    A question to those putting the larger part of the blame to the network ops:
    What if the machine had a head crash the week before?
    Would it be considered a WTF not to include a test machine in the regular backup?


    I'm not following the question - a test machine to do what?

    I don't blame the network ppl for not backing up the server - they had no reason to think it should be, from what I understand.  But here's what Alex wrote:

    [The PM] was... authorized to set up a demonstration of it so that others could "experiment on it and see how it works." And that he did: a development server was recommissioned and the Expert Team Collaboration Software was installed.


    The key phrase for me is "a development server was recommissioned" - it's written in passive corporate-speak, but someone had to request it from somebody, and somebody else had to respond.  That means that the network people should have known that it was being used for something.  It seems like they just assumed that it wasn't important anymore.

    The more I think about it, the more sympathy I have for the PM - management turned him down for some software he really wanted, and he played corporate-fu: he got a demo version with the intention to make it mission critical and force management to buy the full version.  Sometimes you have to play the game that way.  He just made the huge WTF of assuming it was backed up (which isn't so outrageous - he put everything on a corporate server, dev or not).

    And as you can see, the two assumptions collided into a spectacular mess.
  • (cs) in reply to cronthenoob
    cronthenoob:
    in the end, the school payed me back my tuition.   But I still have to live with the fact that I wasted 4 years of my life because of some idiot who doesnt know how to secure a PHP mailer.


    The real WTF is that you think PHP code can be made secure.
  • tjr (unregistered) in reply to Dazed

    Pin a WTF badge on everybody in the place. But put two on Network Ops people who wiped a development server someone was using without checking to see what was on it. Or did they choose that server because they didn't like being shoved aside?

    BTW - why wasn't the PM allowed to get the software and hardware to work on such a high-priority project? Was that Network Ops decision?
     

Leave a comment on “Back That Thang Up”

Log In or post as a guest

Replying to comment #:

« Return to Article