- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
frist frist, just to be redundant
Admin
'Frazzled looking contractors', or quite literally frazzled?
Admin
A friend of mine worked on the traffic boards for the M25. The spec said two separate sets of power and comms, one down the hard shoulder; one down the central reservation. The contractor thought "it would be much cheaper to run both sets of cables down one trench". From here the stories converge.
Admin
I worked at the local electrical utility for a few years, looking after their control room. As you can imagine, redundancy was the word there too! In the age of coaxial Ethernet, we had redundant networking, redundant MMI (man machine interface) workstations, redundant ADM servers, redundant COMM servers, etc, etc. All fed from redundant power (blue and red), each of blue and red power circuits had their own 6kVA UPS. The two 6kVA UPS's were backed up by a 16kVA UPS, which could be fed from any of 2 independent electrical infeeds (2 different islands!), as well as by a generator. Should be good, right? Um ...
So, every Friday, the generator got tested. 9am, switch off the two infeeds, power on the genny, make sure it was working fine, then turn off and switch back to the primary infeed. Midnight on Friday, I get a call, everything has gone dark!
Rush out to find that the 3 UPS's are all drained, and so everything is off. Huh? Turns out, they forgot to turn the primary infeed back on after testing the genny. The UPS's were beeping like crazy all day and into the evening, and the folks in the control room didn't say anything about it because (well, their explanation was somewhat incoherent). Eventually the batteries were drained, and everything went silent. At which point I got the call . . .
Admin
Never underestimate the power of a backhoe to ruin the best laid plans of mice and men.
Admin
TRWTF is that management didn't dig in their heels say testing and training is a waste of time.
Admin
"The UPS's were beeping like crazy all day and into the evening, and the folks in the control room didn't say anything about it because (well, their explanation was somewhat incoherent). "
So these guys got fired and they all got work at Three Mile Island.
Admin
Ah yes, one place I worked used to record these as "JCB errors". I remember there were several in the time I was there.
Admin
Did the emergency lighting still work? It's “fun” when it turns out that that also depends on main power…
Admin
In my case they had parked a big flat screen television in front of the UPS warning panel and couldn't hear it or see it.
Admin
I worked where we had a different but somewhat similar situation with our data connections. This was one of those type of locations that had to be up 24/7/365. Data connections were also important, though not quite as important as power.
Plans were made. The new data center had all of the power redundancy protections in place. The specialized cooling systems in place, even a false roof to protect it from the otherwise non-removable sprinkler system that was part of the building (so that it would divert water if the actual sprinkler system was in place). The new fiber lines came into the room near the same point, but exited the room and the building on different sides, going down different streets to different central offices so that none of this digging in the wrong place could mess us up.
Unfortunately what we didn't know is that both of those COs had all their traffic routed through a different data center nearly 40 miles away - the same one, which apparently the carrier didn't do great testing on their own backup structures. A huge power failure and none of their generator's came on line. We ended up with two data connections that were fully functional with no place to route their traffic. Big telecom companies were TRWTF that day.
Admin
TRWTF was that this WTF was discovered before the site went live, and not two years later.
Admin
Similar story: a trading firm buys dual fiber circuits into New York so that they won't get cut off from the markets. A backhoe in New Jersey takes out both since the dual circuits were both running through the same trench.
Admin
They owed that backhoe a large load of thanks-- it helped them figure out the flaws before they became critical.
Admin
I once worked for a place that was really proud of their two data centers, for backup and failover. In the same flood plain. With redundant power vendors with substations in the same flood plain.
Guess what happened one spring?
Admin
I remember reading about the power supply for a hospital. There was battery backup for long enough for dual generators to start up, and the generators could provide enough power to run all essential lighting and equipment while the fuel lasted. The system was commissioned, tested and worked without fault over a long period. Then one day, after the hospital was fully operational (no pun intended), the mains power failed. The battery backup kicked in, the generators started and all was well... for a while - then the generators stopped and the batteries were already exhausted. Fortunately, the mains power was restored quite quickly. It took quite a while to work out the cause of the problem. There were pumps that fed fuel from the main fuel tanks to the smaller fuel reservoirs alongside the generators - they were wired into the mains - they should, of course, have been supplied from the generators.
Admin
Wasn't sure what a "backhoe" was and misread it as "blackhoe" and Google it on image search at work. My advice to you, don't do it.
P.S.: I'm glad "he were able" to figure it out. Might want to proof-read the grammar in your articles, just sayin'
Admin
*Googled (and yes I might want to proof-read grammar in my comments before posting them, I know...
Admin
Guys What's with the 24/7/365? I understand 24 hours a day. I understand 7 days a week. I'm having trouble with the 365 weeks a year. Either write 24/7/52 or 24/365 < / pet peeve>
Admin
Simpler: uptime needs to be 1/0. Downtime naturally 0/1.
Admin
Shouldn't that be 365 weeks/month?
Admin
The trouble is that there are more than 52 weeks a year. So either one or two days a year of a 24/7/52 regime can be downtime ...
Admin
So the lesson between the article and all of the comments: There is always a single point of failure. Even if that point is a nuclear warhead going off in orbit that EMPs all of your sites, there is always a single point of failure.
Admin
http://deredactie.be/cm/vrtnieuws.english/News/1.2355065
TLDR; when the mains power went out, a ups and a dieselgroup kicked in. A miswiring in the dieselgroup not only caused the critical machines to fall silent, but silent forever because of huge power spikes, rendering the machines broken....
Admin
Are we supposed to assume then that in leap years there can be a day of downtime?
Admin
Now now now... it's not called a backhoe. Everyone knows it's a "hydraulic cable finder".
Admin
24/7/364.2425 really...
Admin
So your manager got to talk with your ISP about the penalty clause of their up-time guarantee.
Admin
This is what is called an unintended single point of failure, when all redundant systems can be taken out in a single action. Thankfully it was identified prior to going live. BTW, grounding 2 huge power feeds through your excavation equipment is not recommended. Not good for the backhoe and not good for the power supply equipment.
Admin
If you want to get picky, it is closer to 365.2422 days/year. We will need to correct our algorithm for leap year placement in about 3000 years.
Admin
I worked once at a healthcare organization that was (obviously) very concerned about downtimes. Like basically every EMR of the era, ours was using a client/server model. Now, the actual EMR servers were on redundant and backup power supplies, etc., as you would expect. However, it was obvious that nobody actually ever tested the system. At the first power outage, we had the sickening realization that, while the EMR servers were all up and humming along great, the other set of servers, the ones hosting the app virtualization deploying the EMR client, were not on any sort of redundant power. So while technically the EMR server was "up", nobody had a client to actually access it with.
That whole comedy of errors continued as they redid the data center to put both the EMR and XenApp servers on the same redundant circuit. They got points for remembering to include the climate controls of both server rooms, so we didn't have to worry about the servers roasting themselves alive. However, someone really should have done the math for the total current draw of all devices now on this circuit compared to the current rating of the circuit itself.
Admin
Back in 2007 when the Internet2 was new (and still newsworthy) it was knocked offline when a homeless man lit a mattress on fire underneath the Longfellow bridge between Boston and Cambridge, MA and melted the fiber cables.
Admin
I dig this story.
Admin
At work we have a data center with a big UPS, a big generator, a big (and full, and often refreshed) fuel tank, the works...
It always passed all tests, both the two-yearly end-to-end test where they cut the mains (they usually found a few places where the startup current in some offices was high enough to trip the breaker, but never anything major or unexpected) and the monthly test generator (which always passed flawlessly).
That is, until there was an actual outage; the generator wouldn't start... As it turns out, the starter battery for the generator had gone bad since the last end-to-end test but combined with its charger there was just enough juice to get the generator to start... Of course, without mains there was no charger to boost the plastic cube containing lead and acid into something resembling a functional starter battery, and therefor no running generator...
Admin
I think the generator started from battery charger story might have been done before already.
Admin
When we were getting ready to install our servers in a new data center, they proudly showed us their redundant systems. They had room sized Caterpillar diesel generators and a battery backup room that looked like it belonged in a diesel/electric sub.
When we set up our servers, they asked why our racks had internal UPS, given the data center backups. We always wanted to have some of our redundancy to be based on something we controlled.
Sure enough, a few years later, the data center lost power, the generators didn't start and the battery backups caught fire. Our racks were the only ones unaffected due entirely to our on board UPS systems.
Admin
UPS in the loft in case of flooding. So if there's a flood, bugger the expensive mainframe and other equipment, just so long as our UPS is safe.
These fictional WTFs are terrible.
Admin
The real WTF is the post image. That has to be the worst-designed sign ever:
This just screams "the person who created this has never heard of the gestalt principles."
Even on the website the image is taken from, most of the Buried Cable signs are better, e.g. https://cdn.compliancesigns.com/media/osha-safety/300/OSHA-No-Digging-Sign-OCEP-14042_300.gif , https://cdn.compliancesigns.com/media/osha-safety/300/OSHA-No-Digging-Sign-ODEP-14046_300.gif , https://cdn.compliancesigns.com/media/osha-safety/300/OSHA-No-Digging-Sign-OCEP-14050_300.gif