- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
And that my friends is how you cage the beast!
Admin
Could be worse, they could have been using it as a saw horse.
Admin
That's one good reason to have plain old copper-based analog voice lines...let the phone company worry about keeping the phone lines powered. And frankly, they're pretty damned good at it.
Admin
makes me remember when a storm, a few years ago, broke big electrical lines in a whole part of France. A network company was running it's data line for clients on generator. They were buying diesel every week to fill the tank at every local network hub. In this devasted part of France, one of the only electrical thing working for nearly a month were switches and routers for network. Of course, nobody was using them because clients didn't have electricity, but the service was available :D
Admin
I hate it when I can't get bring my server back online!
Admin
I have heard the previous WTF calls of lame and so forth. But crap on a cracker, this one sucked.
Boiling it down:
Wow!!! Stop the presses!!! Call the media!!!
The rest is BS fluff.
So, this boils down to a WTF of somebody at the COMPANY did not test the system LowestBidder Inc put in for the phone system.
Geee... I need to go change my pants I was so surprised about that conclusion.
N.
(yes, laden with LOTS of sarcasm)
Admin
Yes, never, never, never put your outage paging system on a PBX! And don't use email paging, especially when your email system is down.
You have no idea how many times I've had to explain those 2 key points to dim-witted managers.
Admin
The end of this story is very ambiguously written.
One reading of it indicates that the PBX's power was coming directly from the grid and not from the UPS. This is somewhat boneheaded but entirely understandable. It's also very hard to test for, so it's unsurprising that it wasn't caught.
A different reading indicates that the PBX's phone line was not plugged in to the UPS. This is truly stupid and even a trivial test would reveal the problem. This one would be a true WTF.
So which one is it?
Admin
I give... "cosmic microwave background"?
Admin
Okay, gotta ask. If there was that much depending on it, why weren't the generators wired to start automatically when the power went out? UPS only lasts so long.
I mean, what if the pole outside were hit & took the lines down. Even if the PBX were plugged in, no lines = no phones.
But I'm a pessimist.
Admin
A pessimist is exactly the type of person that should be responsible for these kinds of systems.
The security head for the world trade centers was laughed at for his tenacity in putting all the safety measures in place and actually requiring evacuation practices. When the time came, many people were saved because they knew what to do.
Things like this you put in place to handle the worse thing you can think of, then hope you never ever need it.
Admin
Uh-oh, don't let Jörg Schilling hear that!
Admin
Let's add another WTF to this dog-pile: Why didn't the UPS notify the Sun system to shutdown as it neared exhaustion? I refuse to believe that an expensive 8 hour UPS doesn't include a serial or USB port and software to gracefully shutdown connected equipment.
Admin
the magic option "logging" in one's /etc/vfstab can save gobs of time running fsck...
Admin
"continuous membrane bioreactor"?
Admin
Admin
"community mailbox"?
Admin
I admit I've only seen a few PBX systems, but they all had internal batteries, and they do work during power outages (although the runtime depends on how many extensions are active, and you don't get voicemail while the line voltage is down). Maybe larger PBX systems don't work that way...?
I once built a NOC where the notification server is a laptop configured to send SMS with a cell phone connected by bluetooth (as well as two or three more conventional notification methods using more conventional hardware). As long as the two devices are within 30 feet of each other and the chargers are plugged in, they work well (as confirmed by one test message arriving every week, and all the reports of actual power outages and server problems received over the last several years). The laptop runs for 5 hours and the phone for 53 on top of any runtime provided by an external UPS.
Admin
"combat medical badge"?
Admin
Was this disaster recovery system put into place by the same beaurocrats from the earlier post? Seems like someone missed a meeting about the phone notifications, and it got built wrong...
Just wondering....
Admin
LMAO, you made my day!
Admin
Ridiculous. If you have a generator and a UPS (with 8 HOURS of runtime, which is also ridiculous), there's no excuse for not having an automatic transfer switch. Our UPSes have about an hour of runtime, and seldom use more than 10 seconds of it, since that's how long it takes for the ATS to start the generator, test that it's receiving good power from the generator, and switch over to the generator. When the power comes back, the ATS waits 15 minutes to make sure the power is stable, then switches back and shuts down the generator (after a cooling-off period).
That's just how it's done. Anyone who installs a system that does otherwise should be fired/sued into non-existence. The ATS is the cheapest part of our setup.
Admin
don't try to do no thinkin' just go on with your drinkin', just have your fun you old son of a gun and drive home in your lincoln
i'm gonna tell you the way it is, and i'm not going to be kind or easy, your whole attitude stinks i say and the life you lead is completely empty,
you paint your head, your mind is dead, you don't even know what I just said,
That's you american womanhood.
You're phony on top , you're phony underneath, you lay in bed and grit your teeth
Madge i want your body, harry get back... madge it's not merely physical, oh harry you're a beast <female crying>
madge I couldn't help it.... awww dogg gonnit
what's the ugliest part of your body, what's the ugliest part of your body?
some say it's your nose, some say it's your toes, i say it's your mind
all your children are poor unfortunate victims of systems beyond their control
Admin
I was helping out a small ISP that was growing faster than their one sysadmin could handle. I stopped by one morning when the building power was supposed to be out due to power company work. We knew about the outage for weeks ahead of time. Since I was only really there to work on my colo that morning, and the ISP's admin said that he would be there, I didn't plan for much.
6:00am: power goes out. a few second later, half of their datacenter goes out.. Crap, the UPS was running over capacity, and tripped it's internal mains. The junior admin was around, and I said, damn, that sucks, and we cranked the thing into bypass mode.. but nothing happened. "Why isn't the backup generator running?"
This is of course Januaryish in Minnesota. I run outside to the generator, of course the cabinet to the control side of the engine isn't locked. The little toggle switch was sitting in the "Off" mode instead of the "Auto" mode. click Vroom! we have power again..
The real WTF of this story was.. neither the ISP's owner or sysadmin showed up, and the sysadmin slept through the whole thing while we called him ever 5min.
Admin
Last time we had a power outage at the local headquarter, we were surprised to see our UPS go down after less than 10 mn when it was supposed to provide power for more than 1 hour, time enough to shut down gracefully all the servers. (The PBX system lasted on its internal batteries for 14 hours)
Since we are only an office, not a plant like other locations, we didn't had any generator, so all work was suspended till the electrical company restored the lines.
After checking, we (the IT people, responsible of the UPS system) that the local administrative manager had somehow succeeded in connecting a brand new building directly on the special power lines dedicated to connecting the servers room and the UPS...
We managed to get some funding for a new UPS and a generator too.
Admin
It's ironic that this was on Solaris. If they used ZFS, they wouldn't have had to worry about data loss or corruption at all.
Admin
As the IT Manager for a scheduled airline, my objective was keep things running as long as possible. So yes, we had our key systems connected to the building EP Grid, with a 5 second transfer to gen-power at loss of building mains, on top of that we had a UPS monstrosity that would run all of our servers, network gear, phones, and key workstations for 8 hours. At about the 4 hour mark on the UPS, our managed power bars would start shutting off power to all non-essential peripheral equipment. At the 2 hour mark, non-essential servers would begin the shutdown process. At the 1 hour mark, only two servers, two switches, two routers, one workstation and the phone system would remain powered. At the half hour mark, the phone system would have power removed and work off of its internal batteries (good for 2 hours) and the UPS would run down to zero.
Overkill? Well, depends on whether you'd like to see the company monitoring your aircraft have some idea of where those aircraft are and be able to communicate with them. The process outlined above could theoretically extend the time of power delivery from 8 hours to just under 24. As for us, we never experienced an outage beyond 6 hours, but the 6 hour outage we did experience was due to, of all things, a generator failure while the building was blacked out :)
Is it ironic or just entertaining that the captcha for this post is 'alarm'?
Admin
"That's not irony, that's just mean"
Admin
Doesn't matter if they have it, if: I once worked at a company that refused to allow the sysadmin (me) to load the UPS monitoring software onto our servers, because "there were other things to do". Along comes a power failure, down (crash) goes 72 servers after the UPS failed (the generator was inoperative). The boss started to "assign responsibility" for the damaged hardware and corrupted data to me - the result? I went and had a chat with the company legal council, the chief financial officer, and the company president, (they were standing around in a panic) and then walked away (quit) and left my ex-boss to sort out the mess - happily for me, ('cuz I'm a nasty guy) the company lost several of their major accounts, and two quarters later closed their doors for good. Cost them a ton of money for those "failure to perform" contract clauses. From what other employees later told me, they never did get all their systems back up - the ex-boss tried to use desktops to replace the damaged servers...
Admin
Not sure why my on-topic post got deleted while the irrelevant "i'm gay/you're gay" posts remain. I guess they want to filter out any dissenting comments.
The real WTF is whis site doesn't even know what a real WTF is anymore.
Yes we all get it, companies have strict policies, people make mistakes. So what? Why don't you once again show us something that makes us go WTF, rather than just a bitch session about various companies red tape, or silly error messages with typos in them.
Admin
Sasha -- thanks for the ZFS comment. Exactly right. Solaris is sexy, but ZFS, aka IBM's z/OS File System, is superstable.
Admin
So is UFS with the logging option enabled. If you have a semi-recent solaris install, oh say solaris 8 and above. Which is only about 8 years old by now mind you. This isn't an issue.
That is one wtf right there. Patching is another one. While not mentioned I bet the os hasn't been patched since it was installed. Which only means you can hit fun little data corruption bugs.
I am assuming this was running ufs with a large (>100G) filesystem. I don't know if solstice/svm was the volume manager. But given how cheap they seem to be vxfs/vxvm is out of the question. That is even better than ufs if you can afford it.
And zfs means the zeta byte filesystem. Not any of IBM's stuff. zfs for solaris has only been stable, read the source code if you don't believe me. Wait, you can't do that for the z/os stuff doh!
How much does a z/OS server run these days? Would a million even get me an intro machine?
Admin
We had something slightly similar happen to us once.
We've got an Exchange server on site. Our "plan B" involves a large mail spooling/spam filtering service that you've likely heard of. Our mail server goes down, no problem. Their machines spool it until ours comes back up, at which point it delivers the payload to us. We may lose internal correspondence abilities for a little while, but customers can send mail to us without it bouncing, and the world keeps turning.
Likewise, we've got DNS on site. We also have redundant DNS hosted off site at our ISP. It zone transfers, and all is well.
And then came the week(s) of server room reconfiguration. During the consolidation, some unracked machines were migrated onto racks. Our DNS server just so happened to be one such machine. It also happened to be nothing more than an average desktop machine (a WTF in and of itself). Can you guess what happens when a tech connects a 120V desktop power supply to a 240V PDU fed from a very large UPS?
Suffice it to say, DNS was no more. But hey! We've got off site DNS! We don't need to worry about that until Monday! Well, it turns out the TTL on our zone transfers was a bit lower than expected. Let's just say that all the mail spooling contingency plans in the world don't mean dick if you can't even get an MX record from DNS in the first place.
The moral of the story: a chain is only as strong as its weakest link. That was a fun morning of faking my way through rebuilding a DNS machine. I became experienced with it real quick. :P
Admin
I'm confused on what you mean by TTL on the zone transfers was too low? I'm not too sure how zone transfer work, but does it mean that it just replicates the dns info to the off site dns server? If the server is down, it cant' replicate anything, now can it/
Admin
Today's WTF provides valuable information.
A ton of other WTF stories have involved consultants who bill $300/hour, big famous expensive consulting companies, big famous expensive vendors, etc.
Today's WTF proves that Lowest Bidder Inc. gets the same results as the big guys. You don't have to pay enormous fees.
Admin
To Russ:
The DNS transfer worked. The TTL for the DNS transfer told the ISP how long to use their copies of the DNS records. When the TTL expired, the ISP obediently stopped using their copies (including the MX record).
Admin
Admin
A medium, surely?
Admin
"WorseThanFailure"? Nah, this story sounds much like just an ordinary failure.
Admin
Don't forget: 4) In the end, it turned out OK 'cos the server wasn't really that busted. 5) The end.
Admin
At least you have 800% more opportunities to get it.
Admin
Assuming server-grade hardware (and not toy ATA disks with write caches enabled), it should handle things just fine with other file systems, as well.
However, if it was an oldish server with lots of disks, I can see this kind of situation causing problems. Failing to spin up after being powered down is a common failure mode in disks that were otherwise fine, and it might even happen to several simultaneously after the system has been running continuously for years.
Admin
We can actually get one of those container sized 400 kVA generators up and running in 2 hrs. But then, we A. have those on site, B. have experience doing so and C. we ARE the electricity company.
Admin
That's cheating!
Admin
Yep - I was going to say - that's what 1 hard wired POTS line is for. Standard disaster recovery stuff
Admin
We had a power outage yesterday and the backup generator wouldn't start, resulting in some major stuff going down.
I guess I can make the front page, too!
Admin
Feel lucky. Gateway goes out friday afternoon. "Sorry, but the service people went home already. They will fix it in monday".
Admin
If a secondary DNS server transfers a zone from a primary server, it is supposed to periodically query the primary to make sure the data is still current. If the primary goes down, the secondary may continue to serve the zone data, but only for an amount of time indicated in the zone's SOA record which is called the zone expiration time. After zone expiration, the secondary is supposed to throw away the data. I think this is because the secondary cannot tell if the problem is that the primary is not allowing it to do zone transfers, in which case the secondary's data might be stale, or if it's just that the primary is down, in which case the secondary remains the best source of the data.
"TTL" in the DNS world means something else.
Admin
SAND IN VAGINA
Admin
10 minutes for decision from the boss, 5 minutes to find an employee with a pickup truck, half a hour drive to the store, 5 minutes to find it and get it to the car, 10 minutes of getting the invoice, another half a hour to get it back to the site, with 5-minute stop to buy fuel. Setting up a portable generator (gasoline, not diesel) takes maybe 15 minutes. Installing it is as easy as plugging the UPS into it.
This is all providing you get a blank cheque from the boss. Otherwise, purchase approval can take up to a month.
It's not a hi-tech solution, but if you want it fast and improvise, you don't have time for hi-tech.