The Daily WTF: Curious Perversions in Information Technology

KattMan · 2007-07-19 Reply Admin

And that my friends is how you cage the beast!

2007-07-19 Reply Admin

Could be worse, they could have been using it as a saw horse.

n9ds · 2007-07-19 Reply Admin

That's one good reason to have plain old copper-based analog voice lines...let the phone company worry about keeping the phone lines powered. And frankly, they're pretty damned good at it.

tchize · 2007-07-19 Reply Admin

makes me remember when a storm, a few years ago, broke big electrical lines in a whole part of France. A network company was running it's data line for clients on generator. They were buying diesel every week to fill the tank at every local network hub. In this devasted part of France, one of the only electrical thing working for nearly a month were switches and routers for network. Of course, nobody was using them because clients didn't have electricity, but the service was available :D

Someone You Know · 2007-07-19 Reply Admin

I hate it when I can't get bring my server back online!

2007-07-19 Reply Admin

I have heard the previous WTF calls of lame and so forth. But crap on a cracker, this one sucked.

Boiling it down:

Power failure.
Power was out a long time and UPS lost reserves. (Duh!)
Phone notifications did not work.

Wow!!! Stop the presses!!! Call the media!!!

The rest is BS fluff.

So, this boils down to a WTF of somebody at the COMPANY did not test the system LowestBidder Inc put in for the phone system.

Geee... I need to go change my pants I was so surprised about that conclusion.

N.

(yes, laden with LOTS of sarcasm)

2007-07-19 Reply Admin

Yes, never, never, never put your outage paging system on a PBX! And don't use email paging, especially when your email system is down.

You have no idea how many times I've had to explain those 2 key points to dim-witted managers.

2007-07-19 Reply Admin

The end of this story is very ambiguously written.

One reading of it indicates that the PBX's power was coming directly from the grid and not from the UPS. This is somewhat boneheaded but entirely understandable. It's also very hard to test for, so it's unsurprising that it wasn't caught.

A different reading indicates that the PBX's phone line was not plugged in to the UPS. This is truly stupid and even a trivial test would reveal the problem. This one would be a true WTF.

So which one is it?

2007-07-19 Reply Admin

I give... "cosmic microwave background"?

jimlangrunner · 2007-07-19 Reply Admin

Okay, gotta ask. If there was that much depending on it, why weren't the generators wired to start automatically when the power went out? UPS only lasts so long.

I mean, what if the pole outside were hit & took the lines down. Even if the PBX were plugged in, no lines = no phones.

But I'm a pessimist.

KattMan · 2007-07-19 Reply Admin

jimlangrunner:
Okay, gotta ask. If there was that much depending on it, why weren't the generators wired to start automatically when the power went out? UPS only lasts so long.
I mean, what if the pole outside were hit & took the lines down. Even if the PBX were plugged in, no lines = no phones.

But I'm a pessimist.

A pessimist is exactly the type of person that should be responsible for these kinds of systems.

The security head for the world trade centers was laughed at for his tenacity in putting all the safety measures in place and actually requiring evacuation practices. When the time came, many people were saved because they knew what to do.

Things like this you put in place to handle the worse thing you can think of, then hope you never ever need it.

2007-07-19 Reply Admin

Solaris does not take kindly to having the lights go out unexpectedly

Uh-oh, don't let Jörg Schilling hear that!

2007-07-19 Reply Admin

Let's add another WTF to this dog-pile: Why didn't the UPS notify the Sun system to shutdown as it neared exhaustion? I refuse to believe that an expensive 8 hour UPS doesn't include a serial or USB port and software to gracefully shutdown connected equipment.

2007-07-19 Reply Admin

the magic option "logging" in one's /etc/vfstab can save gobs of time running fsck...

2007-07-19 Reply Admin

"continuous membrane bioreactor"?

FredSaw · 2007-07-19 Reply Admin

Rugger fan:
Wow!!! Stop the presses!!! Call the media!!!

Um... if you can stop the presses, you are the media.

2007-07-19 Reply Admin

"community mailbox"?

2007-07-19 Reply Admin

I admit I've only seen a few PBX systems, but they all had internal batteries, and they do work during power outages (although the runtime depends on how many extensions are active, and you don't get voicemail while the line voltage is down). Maybe larger PBX systems don't work that way...?

I once built a NOC where the notification server is a laptop configured to send SMS with a cell phone connected by bluetooth (as well as two or three more conventional notification methods using more conventional hardware). As long as the two devices are within 30 feet of each other and the chargers are plugged in, they work well (as confirmed by one test message arriving every week, and all the reports of actual power outages and server problems received over the last several years). The laptop runs for 5 hours and the phone for 53 on top of any runtime provided by an external UPS.

2007-07-19 Reply Admin

"combat medical badge"?

2007-07-19 Reply Admin

Was this disaster recovery system put into place by the same beaurocrats from the earlier post? Seems like someone missed a meeting about the phone notifications, and it got built wrong...

Just wondering....

2007-07-19 Reply Admin

LMAO, you made my day!

2007-07-19 Reply Admin

Ridiculous. If you have a generator and a UPS (with 8 HOURS of runtime, which is also ridiculous), there's no excuse for not having an automatic transfer switch. Our UPSes have about an hour of runtime, and seldom use more than 10 seconds of it, since that's how long it takes for the ATS to start the generator, test that it's receiving good power from the generator, and switch over to the generator. When the power comes back, the ATS waits 15 minutes to make sure the power is stable, then switches back and shuts down the generator (after a cooling-off period).

That's just how it's done. Anyone who installs a system that does otherwise should be fired/sued into non-existence. The ATS is the cheapest part of our setup.

2007-07-19 Reply Admin

Anonymous Coward:
Solaris does not take kindly to having the lights go out unexpectedly

Uh-oh, don't let Jörg Schilling hear that!

don't try to do no thinkin' just go on with your drinkin', just have your fun you old son of a gun and drive home in your lincoln

i'm gonna tell you the way it is, and i'm not going to be kind or easy, your whole attitude stinks i say and the life you lead is completely empty,

you paint your head, your mind is dead, you don't even know what I just said,

That's you american womanhood.

You're phony on top , you're phony underneath, you lay in bed and grit your teeth

Madge i want your body, harry get back... madge it's not merely physical, oh harry you're a beast <female crying>

madge I couldn't help it.... awww dogg gonnit

what's the ugliest part of your body, what's the ugliest part of your body?

some say it's your nose, some say it's your toes, i say it's your mind

all your children are poor unfortunate victims of systems beyond their control

2007-07-19 Reply Admin

I was helping out a small ISP that was growing faster than their one sysadmin could handle. I stopped by one morning when the building power was supposed to be out due to power company work. We knew about the outage for weeks ahead of time. Since I was only really there to work on my colo that morning, and the ISP's admin said that he would be there, I didn't plan for much.

6:00am: power goes out. a few second later, half of their datacenter goes out.. Crap, the UPS was running over capacity, and tripped it's internal mains. The junior admin was around, and I said, damn, that sucks, and we cranked the thing into bypass mode.. but nothing happened. "Why isn't the backup generator running?"

This is of course Januaryish in Minnesota. I run outside to the generator, of course the cabinet to the control side of the engine isn't locked. The little toggle switch was sitting in the "Off" mode instead of the "Auto" mode. click Vroom! we have power again..

The real WTF of this story was.. neither the ISP's owner or sysadmin showed up, and the sysadmin slept through the whole thing while we called him ever 5min.

2007-07-19 Reply Admin

Last time we had a power outage at the local headquarter, we were surprised to see our UPS go down after less than 10 mn when it was supposed to provide power for more than 1 hour, time enough to shut down gracefully all the servers. (The PBX system lasted on its internal batteries for 14 hours)

Since we are only an office, not a plant like other locations, we didn't had any generator, so all work was suspended till the electrical company restored the lines.

After checking, we (the IT people, responsible of the UPS system) that the local administrative manager had somehow succeeded in connecting a brand new building directly on the special power lines dedicated to connecting the servers room and the UPS...

We managed to get some funding for a new UPS and a generator too.

2007-07-19 Reply Admin

It's ironic that this was on Solaris. If they used ZFS, they wouldn't have had to worry about data loss or corruption at all.

2007-07-19 Reply Admin

sewiv:
Ridiculous. If you have a generator and a UPS (with 8 HOURS of runtime, which is also ridiculous). Our UPSes have about an hour of runtime, and seldom use more than 10 seconds of it, since that's how long it takes for the ATS to start the generator, test that it's receiving good power from the generator, and switch over to the generator. When the power comes back, the ATS waits 15 minutes to make sure the power is stable, then switches back and shuts down the generator (after a cooling-off period).

"Your faith in your [generator] will be your[ downfall]." (loosely quoting RotJ) It's just super that you have an ATS that allows for a 10-second cutover, quite fantastic, really. Now what happens when you have a generator failure due to, well, pretty much pick any mechanical reason a generator can fail? Do you think an hour will give you enough time to: a) find a replacement generator; b) hire an electrician to re-wire it into your building EP grid; c) fire-up & test the replacement before cutting it over?

sewiv:
That's just how it's done. Anyone who installs a system that does otherwise should be fired/sued into non-existence. The ATS is the cheapest part of our setup.

About the only part of this statement I agree with is the ATS part and related liability. I can tell you that "how it's done" is entirely dependant on your company's continuity objectives.

As the IT Manager for a scheduled airline, my objective was keep things running as long as possible. So yes, we had our key systems connected to the building EP Grid, with a 5 second transfer to gen-power at loss of building mains, on top of that we had a UPS monstrosity that would run all of our servers, network gear, phones, and key workstations for 8 hours. At about the 4 hour mark on the UPS, our managed power bars would start shutting off power to all non-essential peripheral equipment. At the 2 hour mark, non-essential servers would begin the shutdown process. At the 1 hour mark, only two servers, two switches, two routers, one workstation and the phone system would remain powered. At the half hour mark, the phone system would have power removed and work off of its internal batteries (good for 2 hours) and the UPS would run down to zero.

Overkill? Well, depends on whether you'd like to see the company monitoring your aircraft have some idea of where those aircraft are and be able to communicate with them. The process outlined above could theoretically extend the time of power delivery from 8 hours to just under 24. As for us, we never experienced an outage beyond 6 hours, but the 6 hour outage we did experience was due to, of all things, a generator failure while the building was blacked out :)

Is it ironic or just entertaining that the captcha for this post is 'alarm'?

2007-07-19 Reply Admin

"That's not irony, that's just mean"

2007-07-19 Reply Admin

kbiel:
Let's add another WTF to this dog-pile: Why didn't the UPS notify the Sun system to shutdown as it neared exhaustion? I refuse to believe that an expensive 8 hour UPS doesn't include a serial or USB port and software to gracefully shutdown connected equipment.

Doesn't matter if they have it, if: I once worked at a company that refused to allow the sysadmin (me) to load the UPS monitoring software onto our servers, because "there were other things to do". Along comes a power failure, down (crash) goes 72 servers after the UPS failed (the generator was inoperative). The boss started to "assign responsibility" for the damaged hardware and corrupted data to me - the result? I went and had a chat with the company legal council, the chief financial officer, and the company president, (they were standing around in a panic) and then walked away (quit) and left my ex-boss to sort out the mess - happily for me, ('cuz I'm a nasty guy) the company lost several of their major accounts, and two quarters later closed their doors for good. Cost them a ton of money for those "failure to perform" contract clauses. From what other employees later told me, they never did get all their systems back up - the ex-boss tried to use desktops to replace the damaged servers...

2007-07-19 Reply Admin

Not sure why my on-topic post got deleted while the irrelevant "i'm gay/you're gay" posts remain. I guess they want to filter out any dissenting comments.

The real WTF is whis site doesn't even know what a real WTF is anymore.

Yes we all get it, companies have strict policies, people make mistakes. So what? Why don't you once again show us something that makes us go WTF, rather than just a bitch session about various companies red tape, or silly error messages with typos in them.

2007-07-19 Reply Admin

Sasha -- thanks for the ZFS comment. Exactly right. Solaris is sexy, but ZFS, aka IBM's z/OS File System, is superstable.

2007-07-19 Reply Admin

So is UFS with the logging option enabled. If you have a semi-recent solaris install, oh say solaris 8 and above. Which is only about 8 years old by now mind you. This isn't an issue.

That is one wtf right there. Patching is another one. While not mentioned I bet the os hasn't been patched since it was installed. Which only means you can hit fun little data corruption bugs.

I am assuming this was running ufs with a large (>100G) filesystem. I don't know if solstice/svm was the volume manager. But given how cheap they seem to be vxfs/vxvm is out of the question. That is even better than ufs if you can afford it.

And zfs means the zeta byte filesystem. Not any of IBM's stuff. zfs for solaris has only been stable, read the source code if you don't believe me. Wait, you can't do that for the z/os stuff doh!

How much does a z/OS server run these days? Would a million even get me an intro machine?

db2 · 2007-07-19 Reply Admin

We had something slightly similar happen to us once.

We've got an Exchange server on site. Our "plan B" involves a large mail spooling/spam filtering service that you've likely heard of. Our mail server goes down, no problem. Their machines spool it until ours comes back up, at which point it delivers the payload to us. We may lose internal correspondence abilities for a little while, but customers can send mail to us without it bouncing, and the world keeps turning.

Likewise, we've got DNS on site. We also have redundant DNS hosted off site at our ISP. It zone transfers, and all is well.

And then came the week(s) of server room reconfiguration. During the consolidation, some unracked machines were migrated onto racks. Our DNS server just so happened to be one such machine. It also happened to be nothing more than an average desktop machine (a WTF in and of itself). Can you guess what happens when a tech connects a 120V desktop power supply to a 240V PDU fed from a very large UPS?

Suffice it to say, DNS was no more. But hey! We've got off site DNS! We don't need to worry about that until Monday! Well, it turns out the TTL on our zone transfers was a bit lower than expected. Let's just say that all the mail spooling contingency plans in the world don't mean dick if you can't even get an MX record from DNS in the first place.

The moral of the story: a chain is only as strong as its weakest link. That was a fun morning of faking my way through rebuilding a DNS machine. I became experienced with it real quick. :P

2007-07-19 Reply Admin

db2:
We had something slightly similar happen to us once.
We've got an Exchange server on site. Our "plan B" involves a large mail spooling/spam filtering service that you've likely heard of. Our mail server goes down, no problem. Their machines spool it until ours comes back up, at which point it delivers the payload to us. We may lose internal correspondence abilities for a little while, but customers can send mail to us without it bouncing, and the world keeps turning.

Likewise, we've got DNS on site. We also have redundant DNS hosted off site at our ISP. It zone transfers, and all is well.

And then came the week(s) of server room reconfiguration. During the consolidation, some unracked machines were migrated onto racks. Our DNS server just so happened to be one such machine. It also happened to be nothing more than an average desktop machine (a WTF in and of itself). Can you guess what happens when a tech connects a 120V desktop power supply to a 240V PDU fed from a very large UPS?

Suffice it to say, DNS was no more. But hey! We've got off site DNS! We don't need to worry about that until Monday! Well, it turns out the TTL on our zone transfers was a bit lower than expected. Let's just say that all the mail spooling contingency plans in the world don't mean dick if you can't even get an MX record from DNS in the first place.

The moral of the story: a chain is only as strong as its weakest link. That was a fun morning of faking my way through rebuilding a DNS machine. I became experienced with it real quick. :P

I'm confused on what you mean by TTL on the zone transfers was too low? I'm not too sure how zone transfer work, but does it mean that it just replicates the dns info to the off site dns server? If the server is down, it cant' replicate anything, now can it/

2007-07-19 Reply Admin

Today's WTF provides valuable information.

A ton of other WTF stories have involved consultants who bill $300/hour, big famous expensive consulting companies, big famous expensive vendors, etc.

Today's WTF proves that Lowest Bidder Inc. gets the same results as the big guys. You don't have to pay enormous fees.

2007-07-19 Reply Admin

To Russ:

The DNS transfer worked. The TTL for the DNS transfer told the ISP how long to use their copies of the DNS records. When the TTL expired, the ISP obediently stopped using their copies (including the MX record).

operagost · 2007-07-19 Reply Admin

Notta Noob:
"Your faith in your [generator] will be your[ downfall]." (loosely quoting RotJ) It's just super that you have an ATS that allows for a 10-second cutover, quite fantastic, really. Now what happens when you have a generator failure due to, well, pretty much pick any mechanical reason a generator can fail? Do you think an hour will give you enough time to: a) find a replacement generator; b) hire an electrician to re-wire it into your building EP grid; c) fire-up & test the replacement before cutting it over?

You can get a diesel generator installed in eight hours? Quite fantastic, really.

2007-07-20 Reply Admin

FredSaw:
Rugger fan:
Wow!!! Stop the presses!!! Call the media!!!
Um... if you can stop the presses, you are the media.

A medium, surely?

2007-07-20 Reply Admin

"WorseThanFailure"? Nah, this story sounds much like just an ordinary failure.

Raggles · 2007-07-20 Reply Admin

Rugger fan:
I have heard the previous WTF calls of lame and so forth. But crap on a cracker, this one sucked.
Boiling it down:

Power failure.

Power was out a long time and UPS lost reserves. (Duh!)

Phone notifications did not work.

Don't forget: 4) In the end, it turned out OK 'cos the server wasn't really that busted. 5) The end.

2007-07-20 Reply Admin

operagost:

Notta Noob:
"Your faith in your [generator] will be your[ downfall]." (loosely quoting RotJ) It's just super that you have an ATS that allows for a 10-second cutover, quite fantastic, really. Now what happens when you have a generator failure due to, well, pretty much pick any mechanical reason a generator can fail? Do you think an hour will give you enough time to: a) find a replacement generator; b) hire an electrician to re-wire it into your building EP grid; c) fire-up & test the replacement before cutting it over?

You can get a diesel generator installed in eight hours? Quite fantastic, really.

At least you have 800% more opportunities to get it.

2007-07-20 Reply Admin

sasha:
It's ironic that this was on Solaris. If they used ZFS, they wouldn't have had to worry about data loss or corruption at all.

Assuming server-grade hardware (and not toy ATA disks with write caches enabled), it should handle things just fine with other file systems, as well.

However, if it was an oldish server with lots of disks, I can see this kind of situation causing problems. Failing to spin up after being powered down is a common failure mode in disks that were otherwise fine, and it might even happen to several simultaneously after the system has been running continuously for years.

2007-07-20 Reply Admin

We can actually get one of those container sized 400 kVA generators up and running in 2 hrs. But then, we A. have those on site, B. have experience doing so and C. we ARE the electricity company.

poochner · 2007-07-20 Reply Admin

Aurora:
... and C. we ARE the electricity company.

That's cheating!

2007-07-20 Reply Admin

Yep - I was going to say - that's what 1 hard wired POTS line is for. Standard disaster recovery stuff

valerion · 2007-07-20 Reply Admin

We had a power outage yesterday and the backup generator wouldn't start, resulting in some major stuff going down.

I guess I can make the front page, too!

2007-07-20 Reply Admin

SuperQ:
The real WTF of this story was.. neither the ISP's owner or sysadmin showed up, and the sysadmin slept through the whole thing while we called him ever 5min.

Feel lucky. Gateway goes out friday afternoon. "Sorry, but the service people went home already. They will fix it in monday".

2007-07-20 Reply Admin

Russ:
db2:
Suffice it to say, DNS was no more. But hey! We've got off site DNS! We don't need to worry about that until Monday! Well, it turns out the TTL on our zone transfers was a bit lower than expected. Let's just say that all the mail spooling contingency plans in the world don't mean dick if you can't even get an MX record from DNS in the first place.

I'm confused on what you mean by TTL on the zone transfers was too low? I'm not too sure how zone transfer work, but does it mean that it just replicates the dns info to the off site dns server? If the server is down, it cant' replicate anything, now can it/

If a secondary DNS server transfers a zone from a primary server, it is supposed to periodically query the primary to make sure the data is still current. If the primary goes down, the secondary may continue to serve the zone data, but only for an amount of time indicated in the zone's SOA record which is called the zone expiration time. After zone expiration, the secondary is supposed to throw away the data. I think this is because the secondary cannot tell if the problem is that the primary is not allowing it to do zone transfers, in which case the secondary's data might be stale, or if it's just that the primary is down, in which case the secondary remains the best source of the data.

"TTL" in the DNS world means something else.

2007-07-20 Reply Admin

Top Cod3r:
Not sure why my on-topic post got deleted while the irrelevant "i'm gay/you're gay" posts remain. I guess they want to filter out any dissenting comments.
The real WTF is whis site doesn't even know what a real WTF is anymore.

Yes we all get it, companies have strict policies, people make mistakes. So what? Why don't you once again show us something that makes us go WTF, rather than just a bitch session about various companies red tape, or silly error messages with typos in them.

SAND IN VAGINA

2007-07-20 Reply Admin

10 minutes for decision from the boss, 5 minutes to find an employee with a pickup truck, half a hour drive to the store, 5 minutes to find it and get it to the car, 10 minutes of getting the invoice, another half a hour to get it back to the site, with 5-minute stop to buy fuel. Setting up a portable generator (gasoline, not diesel) takes maybe 15 minutes. Installing it is as easy as plugging the UPS into it.

This is all providing you get a blank cheque from the boss. Otherwise, purchase approval can take up to a month.

It's not a hi-tech solution, but if you want it fast and improvise, you don't have time for hi-tech.

Paging Dr. UPS

Leave a comment on “Paging Dr. UPS”