- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Big red unlabeled buttons are typically 'Stop!' Only Evil Geniuses will rig their plants with seemingly innocuous stop buttons.
Admin
So let me understand. You have NEVER made a WTF mistake and got away with it. You NEVER delete a directory, file, unplugged an essential machine, connected to the wrong server and mistakenly started updating the system, stopped a service or anything similar.
Wow, I bow to your perfection. I have done at least one of those, maybe even two.
Regards
Admin
My favorite was, the batteries on a UPS started expanding/smoking (eek!) in a server cabinet I didn't "own", but I was one of the last people in the office at the end of the day when the problem started. I went to reconnect everything (the backup UPS had sufficient capacity to power all the systems in the cabinet), and noticed that all the servers, with their redundant power supplies, were connected with IEC Y-cables -- each server had two power supplies, connected to one UPS! After re-jiggering some of them (knocking wood and praying that both power supplies on each server were in good enough shape to bear the whole load) to run off both the UPSes, I killed the bad UPS and everything, thank goodness, stayed up.
Except for the one "mystery" cable I had traced under the floor, but couldn't for the life of me guess the purpose of. Turned out to be running the (very high draw) freestanding air conditioner that was cooling the server closet. Yes, the air conditioner was running off the server cabinet UPS. I guess that's a good thing?
Everything survived till the next morning, but we had a talk with the sysadmin about failover reliability.
Admin
They should've labeled it Orange Server.
Admin
And don't give the keys to the castle to newbie engineers?
Admin
onomatopoeia: A sound imitating word resembling the utterances from a person who just broke his toe accidentally kicking it against some hard, non-moving object:
"Oh no! my toe! P...! O-Eiaa!"
-Sin Tax
Admin
This one doesn't make a whole lot of sense. Systems that deal with financial trades are tightly regulated, and need to be quite robust. I work for a finanical company and the production datacenters only have production machines in them. Dev/QA/pre-production machines are never in the same datacenter.
Also, what system that deals with financial trades can handle the load on a single server?
Finally, for financial systems like this you need at least two physically separate datacenters, and you need backups at both sites; so to roll out a system that needs a single server for load, you need four servers minimum to start. If it's a non-critical system you might be able to get away with two servers, one for each datacenter. But usually that's not cutting it for mission critical stuff like trades....
-Me
Admin
OF course the servers were labeled! They big labels that said "BLACK server"
Admin
... plug it back in! :P
Admin
The rare WTF with a punchline. Excellent.
Admin
No, I have never, and I do not plan to make a mistake in the future. I am not trying to give myself a pat on the back. Not making a mistake is not something someone should boast about.
Unplug the wrong machine? Delete the wrong directory? How in the world does that happen?
Do we all remember seeing a Standard Operating Procedure publications or documents with similar purposes lying around the office? Why would they take the time to design, revise, maintain and print these documents? READ THEM and LEARN THEM and make it a part of yours. It is, in fact, your job to do so.
You make a mistake, you should pay for it. Don't let others pay for it. Before you do anything important like deleting an entire directory or unplugging a machine, imagine yourself 2 seconds into the future. That's all it takes.
Admin
He should be careful about being competent. I shared hot-seat duties with someone I'll call E. I got a 2am call once, I answered with "but I'm not on call this week! E. is!" and the reply came, "yes, but we like how you fix things better."
And yes, I told them to call E. and hung up.
Admin
Sadly you're thinking of the wrong kind of organisation. In this story, it's likely an organisation that sells access to trade data rather than being involved in actual trades (and there are a lot of these).
Most of these companies appear to be built on mile-high stacks of crap code that simply won't adapt to reliable automated failover. The crap code inevitably comes about from the fact that none of the stock exchanges have the same data format, so everytime management add a new exchange, some new contractor writes a bit of code to plug into the other pieces of crap.
The comms server was likely just a gateway to funnel data from the leased circuit to the exchange onto the processing farm.
CAPTCHA: Ninjas. Like the kind that would have had to have stolen that server from any decent facility, if the hardware engineer hadn't promptly come clean :P
Admin
Just about to say the same thing. They probably closed their communications at 5pm, then ran internal stuff overnight (data collection, reporting, maintenance etc) then at some point after midnight (semi arbitrary considering that he got the call at 2am, which would mean that the HE got called at midnight, pissed around for an hour, like all good HE's do, then ran diags for an hour before calling in help), tries to re-establish communication with the external systems
Admin
If you've never made a mistake then you have no experience. And I'm not talking about years.
Everyone makes mistakes. We're human it happens. Without mistakes, without risks, without failures, we cannot have success.
If you're being truthful and you've never made a mistake, then I certainly hope you're in your early twenties. To think about never hading made a mistake and being older is just sad.
-Me
Admin
now THAT is a WTF..you don't even put your AC on the same circuit as your servers, simply because of their tendency to create surges when they kick in or out (or in this case, overload a UPS). Hell, i've worked in places where they've had the electric company come in and install custom wiring for the AC array in the server room that was completely isolated from the electrical for the servers themselves (which was isolated from the electrical for the rest of the building)
Admin
OK. Making mistakes and taking risks. They are two different things.
You take risks when putting money into stocks or taking out a loan to start up a new business with a fresh idea. If you make a mistake and put the money into wrong stocks then it is certainly your fault.
You do not make mistakes doing the job where there are plenty of documentation available to you. If you are unsure of which cable to unplug then do not unplug it. If the documentation is missing for that particular task, take the initiative to RESEARCH before doing anything. If you are going to go ahead and RISK unplugging the wrong cable, by all means, go ahead but don't expect someone to cover your ass. If all cables are orange and are labeled 'orange', then obviously someone didn't do his/her job appropriately. By attempting to unplug such cables the responsibility now is bestowed upon you. Be smart about it.
Admin
Tools that get paid a lot of money to be on call.
Admin
The risks I'm talking about are work and project related. As far as unplugging the wrong cable, I suppose if your entire job revolves around plugging and unplugging cables then you have a fairly low risk job and you probably make few mistakes. But like I said before, no risks means no successes either. If you don't follow that then perhaps when/if you get more experience you'll understand what I mean.... -Me
Admin
You are confused between taking risks and making mistakes. How do you put together "a fairly low risk job" and "probably make few mistakes?" That's like saying apples are sweet because they are red.
If your entire job revolves around un/plugging cables and you make a mistake you should be fired for it. If your job does not revolve around un/plugging cables then you should not even do it in the first place. Have others that are proficient at doing the job do it for you. Resources are available. Don't make a mistake by thinking that you are taking a risk by trying to unplug cables. What are you actually trying to accomplish? Cross your fingers and hope that you unplug the right cables? Is that the kind of risks you take at work?
If you keep on wanting to equate mistakes with successes I would definitely love to become your financial manager and make "mistakes" investing your money. I am sure you will forego my mistakes.
I strictly adhered to making mistakes, not taking risks, on my comments. You are the one that is confused about it and decided it to bring it forward and somehow fudge them as "mistakes = risks = successes."
Have a good weekend.
Admin
Ummm...no one has asked the $1,000,000 question. Why do you have to take a server out of the rack to configure it? WTF? :-P
Admin
Wow. Until now I have never fully understood what you americans call "a douche".
I am now enlightened.
Admin
Try upgrading the hardware with the server still inside the rack...
Admin
No I'm sorry, I can't accept this. You sound like the kind of person who is going to suffer their whole life because you will never understand about us humans not being machines. You have completely missed the point here. you're like the philosophers in H2G2: 'We demand rigidly defined areas of doubt and uncertainty'.
Sometimes like it or not you have to wing it, the pressures of time and expectation, work and stress, life and booze and so on take their toll, and you just have to guess. No software is so meticulously documented that everything about it is known in advance; that would cost almost as much as writing it twice.
If you refuse to take any chances at all, you will eventually find yourself in a situation where you can make no further progress without doing so. What will you do then? Back off and fail? Or close your eyes, press [Enter] and pray?
Admin
I'd have done that too. It seemed like an honest mistake, and who knows what I screw up the next day when I need cover from a workmate to not loose my job. I know what you think: "Just don't screw up". But it still happens. To everyone. Ok, maybe not that severe a screwup, but who knows.
Of course the hardware guy would have owed one big bad ass dinner for me and my gf, for her lost sleep. And a few beers for me. And then we'd have laughed about it.
Admin
And what happens if that cable is in a large, tangled pile (say you inherited a messy system from the previous datacenter team), you have upper management screaming at you to fix the problem FIFTEEN MINUTES AGO and need to replace one of the cables, and so... you calm yourself, trace the cables, and accidentally get the wrong one anyways?
You're either trolling, stupid, not actually employed and talking out your ass, or you haven't ever had to do anything like that. Or you're a god and we should all worshop you. In any case, cram it and go back to your perfect world while the rest of us deal with reality.
Admin
^ - -^ groan
^- - ^ His point is that making a mistake (error of judgement) is unnecessary, and that by exercising minimal diligence in research and documentation, one is not exposed to blame for avoidable mistakes; but failure still occurs.
Failure is not a mistake, it is an outcome.
^ - -^ And while he is a bit of a prick about it, I can respect that level of professionalism and ethics.
Admin
He sounds like a corperate drone who cannot perform any duty outside of the type expressly stated in his contract...
Then again he probably works for a company that actually has people on staff with the sole task of plugging in cables :)
When you work for a small company and actually have to do the jobs that in a large company would be done by 5 different people, you sometimes have to do things you aren´t really proficient at, but neither is anyone else in the company...
Jobs need to be done, even if nobody is really qualified to do them... you can´t just let them be.
Such a ´read everything, think carefully, and don't do anything you lack a diploma for´ attitude only works in bogged-down, slow moving, huge companies, where everything can take ten times the time needed... I simply don't have the time to read all the documentation, if there even is any.