• whicker (unregistered) in reply to UH OH

    Big red unlabeled buttons are typically 'Stop!' Only Evil Geniuses will rig their plants with seemingly innocuous stop buttons.

  • (cs) in reply to UH OH

    So let me understand. You have NEVER made a WTF mistake and got away with it. You NEVER delete a directory, file, unplugged an essential machine, connected to the wrong server and mistakenly started updating the system, stopped a service or anything similar.

    Wow, I bow to your perfection. I have done at least one of those, maybe even two.

    Regards

  • James (unregistered) in reply to OneMHz

    My favorite was, the batteries on a UPS started expanding/smoking (eek!) in a server cabinet I didn't "own", but I was one of the last people in the office at the end of the day when the problem started. I went to reconnect everything (the backup UPS had sufficient capacity to power all the systems in the cabinet), and noticed that all the servers, with their redundant power supplies, were connected with IEC Y-cables -- each server had two power supplies, connected to one UPS! After re-jiggering some of them (knocking wood and praying that both power supplies on each server were in good enough shape to bear the whole load) to run off both the UPSes, I killed the bad UPS and everything, thank goodness, stayed up.

    Except for the one "mystery" cable I had traced under the floor, but couldn't for the life of me guess the purpose of. Turned out to be running the (very high draw) freestanding air conditioner that was cooling the server closet. Yes, the air conditioner was running off the server cabinet UPS. I guess that's a good thing?

    Everything survived till the next morning, but we had a talk with the sysadmin about failover reliability.

  • (cs) in reply to nerdydeeds

    They should've labeled it Orange Server.

  • (cs) in reply to DKleinsc
    DKleinsc:
    And we all learn a valuable lesson: Put clear labels on all servers, and possibly even a map to each server in a cabinet on the door of the cabinet.

    And don't give the keys to the castle to newbie engineers?

  • Sin Tax (unregistered) in reply to user

    onomatopoeia: A sound imitating word resembling the utterances from a person who just broke his toe accidentally kicking it against some hard, non-moving object:

    "Oh no! my toe! P...! O-Eiaa!"

    -Sin Tax

  • (cs) in reply to Sin Tax

    This one doesn't make a whole lot of sense. Systems that deal with financial trades are tightly regulated, and need to be quite robust. I work for a finanical company and the production datacenters only have production machines in them. Dev/QA/pre-production machines are never in the same datacenter.

    Also, what system that deals with financial trades can handle the load on a single server?

    Finally, for financial systems like this you need at least two physically separate datacenters, and you need backups at both sites; so to roll out a system that needs a single server for load, you need four servers minimum to start. If it's a non-critical system you might be able to get away with two servers, one for each datacenter. But usually that's not cutting it for mission critical stuff like trades....

    -Me

  • Enrique (unregistered) in reply to nerdydeeds
    nerdydeeds:
    Should there not be a label somewhere on the box that says "MISSION CRITICAL SERVER" or some thing? Shouldn't the test server be labeled as such? The real WTF is that these people don't know what labels are.

    OF course the servers were labeled! They big labels that said "BLACK server"

  • (cs) in reply to BiggBru
    BiggBru:
    When all logical steps fail, reboot.

    If that fails, wipe & reload.

    And if THAT fails...

    ... plug it back in! :P

  • B.D. (unregistered)

    The rare WTF with a punchline. Excellent.

  • UH OH (unregistered) in reply to Another Oracle DBA
    Another Oracle DBA:
    So let me understand. You have NEVER made a WTF mistake and got away with it. You NEVER delete a directory, file, unplugged an essential machine, connected to the wrong server and mistakenly started updating the system, stopped a service or anything similar.

    Wow, I bow to your perfection. I have done at least one of those, maybe even two.

    No, I have never, and I do not plan to make a mistake in the future. I am not trying to give myself a pat on the back. Not making a mistake is not something someone should boast about.

    Unplug the wrong machine? Delete the wrong directory? How in the world does that happen?

    Do we all remember seeing a Standard Operating Procedure publications or documents with similar purposes lying around the office? Why would they take the time to design, revise, maintain and print these documents? READ THEM and LEARN THEM and make it a part of yours. It is, in fact, your job to do so.

    You make a mistake, you should pay for it. Don't let others pay for it. Before you do anything important like deleting an entire directory or unplugging a machine, imagine yourself 2 seconds into the future. That's all it takes.

  • Sandy (unregistered)

    He should be careful about being competent. I shared hot-seat duties with someone I'll call E. I got a 2am call once, I answered with "but I'm not on call this week! E. is!" and the reply came, "yes, but we like how you fix things better."

    And yes, I told them to call E. and hung up.

  • An ISP (unregistered) in reply to its me
    its me:
    This one doesn't make a whole lot of sense. Systems that deal with financial trades are tightly regulated, and need to be quite robust. I work for a finanical company and the production datacenters only have production machines in them. Dev/QA/pre-production machines are never in the same datacenter.

    Also, what system that deals with financial trades can handle the load on a single server?

    Sadly you're thinking of the wrong kind of organisation. In this story, it's likely an organisation that sells access to trade data rather than being involved in actual trades (and there are a lot of these).

    Most of these companies appear to be built on mile-high stacks of crap code that simply won't adapt to reliable automated failover. The crap code inevitably comes about from the fact that none of the stock exchanges have the same data format, so everytime management add a new exchange, some new contractor writes a bit of code to plug into the other pieces of crap.

    The comms server was likely just a gateway to funnel data from the leased circuit to the exchange onto the processing farm.

    CAPTCHA: Ninjas. Like the kind that would have had to have stolen that server from any decent facility, if the hardware engineer hadn't promptly come clean :P

  • (cs) in reply to Someone You Know
    Someone You Know:
    mrs_helm:
    The REAL WTF is that the hardware engineer had removed the machine before he left for the day (as evidenced by the fact that Michael had to call him back IN), but nobody NOTICED until 2AM. On a "mission critical" system. That means if the hardware eng left at 5pm, it was 9 hrs later. Heck, even if he was working until 10pm that night, it was 4 hrs later...which is pretty bad...

    On the other hand, this is about the stock market; possibly no one uses the server when the exchanges aren't open.

    Just about to say the same thing. They probably closed their communications at 5pm, then ran internal stuff overnight (data collection, reporting, maintenance etc) then at some point after midnight (semi arbitrary considering that he got the call at 2am, which would mean that the HE got called at midnight, pissed around for an hour, like all good HE's do, then ran diags for an hour before calling in help), tries to re-establish communication with the external systems

  • (cs) in reply to UH OH
    UH OH:
    Another Oracle DBA:
    So let me understand. You have NEVER made a WTF mistake and got away with it. You NEVER delete a directory, file, unplugged an essential machine, connected to the wrong server and mistakenly started updating the system, stopped a service or anything similar.

    Wow, I bow to your perfection. I have done at least one of those, maybe even two.

    No, I have never, and I do not plan to make a mistake in the future. I am not trying to give myself a pat on the back. Not making a mistake is not something someone should boast about.

    Unplug the wrong machine? Delete the wrong directory? How in the world does that happen?

    If you've never made a mistake then you have no experience. And I'm not talking about years.

    Everyone makes mistakes. We're human it happens. Without mistakes, without risks, without failures, we cannot have success.

    If you're being truthful and you've never made a mistake, then I certainly hope you're in your early twenties. To think about never hading made a mistake and being older is just sad.

    -Me

  • (cs) in reply to James
    James:
    My favorite was, the batteries on a UPS started expanding/smoking (eek!) in a server cabinet I didn't "own", but I was one of the last people in the office at the end of the day when the problem started. I went to reconnect everything (the backup UPS had sufficient capacity to power all the systems in the cabinet), and noticed that all the servers, with their redundant power supplies, were connected with *IEC Y-cables* -- each server had two power supplies, connected to one UPS! After re-jiggering some of them (knocking wood and praying that both power supplies on each server were in good enough shape to bear the whole load) to run off both the UPSes, I killed the bad UPS and everything, thank goodness, stayed up.

    Except for the one "mystery" cable I had traced under the floor, but couldn't for the life of me guess the purpose of. Turned out to be running the (very high draw) freestanding air conditioner that was cooling the server closet. Yes, the air conditioner was running off the server cabinet UPS. I guess that's a good thing?

    Everything survived till the next morning, but we had a talk with the sysadmin about failover reliability.

    now THAT is a WTF..you don't even put your AC on the same circuit as your servers, simply because of their tendency to create surges when they kick in or out (or in this case, overload a UPS). Hell, i've worked in places where they've had the electric company come in and install custom wiring for the AC array in the server room that was completely isolated from the electrical for the servers themselves (which was isolated from the electrical for the rest of the building)

  • UH OH (unregistered) in reply to its me
    its me:
    UH OH:
    Another Oracle DBA:
    So let me understand. You have NEVER made a WTF mistake and got away with it. You NEVER delete a directory, file, unplugged an essential machine, connected to the wrong server and mistakenly started updating the system, stopped a service or anything similar.

    Wow, I bow to your perfection. I have done at least one of those, maybe even two.

    No, I have never, and I do not plan to make a mistake in the future. I am not trying to give myself a pat on the back. Not making a mistake is not something someone should boast about.

    Unplug the wrong machine? Delete the wrong directory? How in the world does that happen?

    If you've never made a mistake then you have no experience. And I'm not talking about years.

    Everyone makes mistakes. We're human it happens. Without mistakes, without risks, without failures, we cannot have success.

    If you're being truthful and you've never made a mistake, then I certainly hope you're in your early twenties. To think about never hading made a mistake and being older is just sad.

    -Me

    OK. Making mistakes and taking risks. They are two different things.

    You take risks when putting money into stocks or taking out a loan to start up a new business with a fresh idea. If you make a mistake and put the money into wrong stocks then it is certainly your fault.

    You do not make mistakes doing the job where there are plenty of documentation available to you. If you are unsure of which cable to unplug then do not unplug it. If the documentation is missing for that particular task, take the initiative to RESEARCH before doing anything. If you are going to go ahead and RISK unplugging the wrong cable, by all means, go ahead but don't expect someone to cover your ass. If all cables are orange and are labeled 'orange', then obviously someone didn't do his/her job appropriately. By attempting to unplug such cables the responsibility now is bestowed upon you. Be smart about it.

  • Michael (unregistered) in reply to GrandmasterB

    Tools that get paid a lot of money to be on call.

  • (cs) in reply to UH OH
    UH OH:
    OK. Making mistakes and taking risks. They are two different things.

    You take risks when putting money into stocks or taking out a loan to start up a new business with a fresh idea. If you make a mistake and put the money into wrong stocks then it is certainly your fault.

    You do not make mistakes doing the job where there are plenty of documentation available to you. If you are unsure of which cable to unplug then do not unplug it. If the documentation is missing for that particular task, take the initiative to RESEARCH before doing anything. If you are going to go ahead and RISK unplugging the wrong cable, by all means, go ahead but don't expect someone to cover your ass. If all cables are orange and are labeled 'orange', then obviously someone didn't do his/her job appropriately. By attempting to unplug such cables the responsibility now is bestowed upon you. Be smart about it.

    The risks I'm talking about are work and project related. As far as unplugging the wrong cable, I suppose if your entire job revolves around plugging and unplugging cables then you have a fairly low risk job and you probably make few mistakes. But like I said before, no risks means no successes either. If you don't follow that then perhaps when/if you get more experience you'll understand what I mean.... -Me

  • UH OH (unregistered) in reply to its me
    its me:
    The risks I'm talking about are work and project related. As far as unplugging the wrong cable, I suppose if your entire job revolves around plugging and unplugging cables then you have a fairly low risk job and you probably make few mistakes. But like I said before, no risks means no successes either. If you don't follow that then perhaps when/if you get more experience you'll understand what I mean.... -Me

    You are confused between taking risks and making mistakes. How do you put together "a fairly low risk job" and "probably make few mistakes?" That's like saying apples are sweet because they are red.

    If your entire job revolves around un/plugging cables and you make a mistake you should be fired for it. If your job does not revolve around un/plugging cables then you should not even do it in the first place. Have others that are proficient at doing the job do it for you. Resources are available. Don't make a mistake by thinking that you are taking a risk by trying to unplug cables. What are you actually trying to accomplish? Cross your fingers and hope that you unplug the right cables? Is that the kind of risks you take at work?

    If you keep on wanting to equate mistakes with successes I would definitely love to become your financial manager and make "mistakes" investing your money. I am sure you will forego my mistakes.

    I strictly adhered to making mistakes, not taking risks, on my comments. You are the one that is confused about it and decided it to bring it forward and somehow fudge them as "mistakes = risks = successes."

    Have a good weekend.

  • Paul (unregistered)

    Ummm...no one has asked the $1,000,000 question. Why do you have to take a server out of the rack to configure it? WTF? :-P

  • (cs) in reply to UH OH
    UH OH:
    Do we all remember seeing a Standard Operating Procedure publications or documents with similar purposes lying around the office? Why would they take the time to design, revise, maintain and print these documents? READ THEM and LEARN THEM and make it a part of yours. It is, in fact, your job to do so.

    Wow. Until now I have never fully understood what you americans call "a douche".

    I am now enlightened.

  • - (unregistered) in reply to Paul

    Try upgrading the hardware with the server still inside the rack...

  • Simmo (unregistered) in reply to UH OH
    UH OH:
    its me:
    The risks I'm talking about are work and project related. As far as unplugging the wrong cable, I suppose if your entire job revolves around plugging and unplugging cables then you have a fairly low risk job and you probably make few mistakes. But like I said before, no risks means no successes either. If you don't follow that then perhaps when/if you get more experience you'll understand what I mean.... -Me

    You are confused between taking risks and making mistakes. How do you put together "a fairly low risk job" and "probably make few mistakes?" That's like saying apples are sweet because they are red.

    If your entire job revolves around un/plugging cables and you make a mistake you should be fired for it. If your job does not revolve around un/plugging cables then you should not even do it in the first place. Have others that are proficient at doing the job do it for you. Resources are available. Don't make a mistake by thinking that you are taking a risk by trying to unplug cables. What are you actually trying to accomplish? Cross your fingers and hope that you unplug the right cables? Is that the kind of risks you take at work?

    If you keep on wanting to equate mistakes with successes I would definitely love to become your financial manager and make "mistakes" investing your money. I am sure you will forego my mistakes.

    I strictly adhered to making mistakes, not taking risks, on my comments. You are the one that is confused about it and decided it to bring it forward and somehow fudge them as "mistakes = risks = successes."

    Have a good weekend.

    No I'm sorry, I can't accept this. You sound like the kind of person who is going to suffer their whole life because you will never understand about us humans not being machines. You have completely missed the point here. you're like the philosophers in H2G2: 'We demand rigidly defined areas of doubt and uncertainty'.

    Sometimes like it or not you have to wing it, the pressures of time and expectation, work and stress, life and booze and so on take their toll, and you just have to guess. No software is so meticulously documented that everything about it is known in advance; that would cost almost as much as writing it twice.

    If you refuse to take any chances at all, you will eventually find yourself in a situation where you can make no further progress without doing so. What will you do then? Back off and fail? Or close your eyes, press [Enter] and pray?

  • Funky (unregistered) in reply to Anon
    Anon:
    The real fun part is that Michael covered for the hardware engineer. I hope the hardware engineer purchased him a beer or two for that one.

    I'd have done that too. It seemed like an honest mistake, and who knows what I screw up the next day when I need cover from a workmate to not loose my job. I know what you think: "Just don't screw up". But it still happens. To everyone. Ok, maybe not that severe a screwup, but who knows.

    Of course the hardware guy would have owed one big bad ass dinner for me and my gf, for her lost sleep. And a few beers for me. And then we'd have laughed about it.

  • Ken (unregistered) in reply to UH OH
    UH OH:
    OK. Making mistakes and taking risks. They are two different things.

    You take risks when putting money into stocks or taking out a loan to start up a new business with a fresh idea. If you make a mistake and put the money into wrong stocks then it is certainly your fault.

    You do not make mistakes doing the job where there are plenty of documentation available to you. If you are unsure of which cable to unplug then do not unplug it. If the documentation is missing for that particular task, take the initiative to RESEARCH before doing anything. If you are going to go ahead and RISK unplugging the wrong cable, by all means, go ahead but don't expect someone to cover your ass. If all cables are orange and are labeled 'orange', then obviously someone didn't do his/her job appropriately. By attempting to unplug such cables the responsibility now is bestowed upon you. Be smart about it.

    And what happens if that cable is in a large, tangled pile (say you inherited a messy system from the previous datacenter team), you have upper management screaming at you to fix the problem FIFTEEN MINUTES AGO and need to replace one of the cables, and so... you calm yourself, trace the cables, and accidentally get the wrong one anyways?

    You're either trolling, stupid, not actually employed and talking out your ass, or you haven't ever had to do anything like that. Or you're a god and we should all worshop you. In any case, cram it and go back to your perfect world while the rest of us deal with reality.

  • Snow_Cat (unregistered)

    ^ - -^ groan

    ^- - ^ His point is that making a mistake (error of judgement) is unnecessary, and that by exercising minimal diligence in research and documentation, one is not exposed to blame for avoidable mistakes; but failure still occurs.

    Failure is not a mistake, it is an outcome.

    ^ - -^ And while he is a bit of a prick about it, I can respect that level of professionalism and ethics.

  • Frats (unregistered)

    He sounds like a corperate drone who cannot perform any duty outside of the type expressly stated in his contract...

    Then again he probably works for a company that actually has people on staff with the sole task of plugging in cables :)

    When you work for a small company and actually have to do the jobs that in a large company would be done by 5 different people, you sometimes have to do things you aren´t really proficient at, but neither is anyone else in the company...

    Jobs need to be done, even if nobody is really qualified to do them... you can´t just let them be.

    Such a ´read everything, think carefully, and don't do anything you lack a diploma for´ attitude only works in bogged-down, slow moving, huge companies, where everything can take ten times the time needed... I simply don't have the time to read all the documentation, if there even is any.

Leave a comment on “Broken Communication”

Log In or post as a guest

Replying to comment #:

« Return to Article