- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I was running final test on a brand new survey software for a large hotels franchise here in Mexico. I was 21, was my first employ, and my first assignment. The sistem have to save some information into a sql database. date and time included.
Since all my tests were fine, we deployed nationwide, at the beginning of february or so. The sistem seemed to work fine until february 13th. With "no reason", the application crashed when it supouse to store data. But when I tried to reproduce the error in my development machine, all seemed to work just like the day before. When I analize the data, realize that all hotels were saving data not daily, but monthly. The reason? Date format. My development server had installed SQL server in english, but the production server had a spanish installation. The diference, appart of languaje, was that the english default date format is mm-dd-yy, but spanish one is dd-mm-yy. So, I was saving January 2nd, February 2nd etc. instead february 1st, february 2nd... and of course, there is no 13th month.
The real WTF was that I have to phone hotel by hotel, and explain how to solve the problem step by step. Many hotels do not have IT department, so I have to explain to very low trained people (some of them not even knew where the start button were) how to unregister a dll, download a new one and re-register it.
The good thing is that i learned a lot about date formats, testing and deployment. But I still feel chills when the phone rings.
As you probably noticed, I'm from Mexico and not english native speaker. Sorry for my mistakes.
Admin
No, your boss obviously found a way to optimize storage. Obcheesezation.
Admin
shouldn't watchdog be sending "ok, ok, ok , ok... Oy! not ok! alert! alert! panic! panic!", so when ok, ok... is not received, receiving end would go into "alert! panic!" mode by default? At least that's how I would do... but what do I know? I am NOT an EXTREMELY well paid consultant ;-)
Admin
So, you're the Captain Queeg of software development, eh?
Admin
I had a similar "problem" back i the day when a i was a junior programer:
I was working in a module that processed updates for all the costumers in a big health organization (millions of them). Most of the time the updates where minimal but some times the updates culd inform the system that the pacient could have medical attention or not.
the code in this module was like this:
package.processTasks(); package.processUpdates(); package.finishTasks();
I was working in the tasks modules, and the processUpdates method was the biggest and most time consuming. So i decided to comment the processUpdates methods so I could debug and test the Tasks Module faster.
I forgot to Uncomment the line and commited the changes.
Next day I received a call:
"There is something wrong with the software, someone almost died yesterday because his profile wasn't updated, and the system told us that the person in cuestion could'nt receive medical attention. Thank god he had the receipts that showed us that he could receive such attention."
Thats rigth, I forgot to uncomment one single line.... and someone almost died.
Now I'm against shipping software without extensive QA.
Admin
My worst production failure:
I once wrote a custom installer/uninstaller for Win95. We installed our app to Program Files\Company Name\Product Name. On uninstall we deleted the directory we attempted to delete both the Company Name and Product Name folders.
Well one enterprising user decided to customize the install directory to C:\Product Name. When our uninstall code tried to delete the parent folder of Product Name we ... well ... wiped his entire hard drive.
Lesson learned: remove directory IFF empty
And on a final note -- the user got a brand new computer from us out of the deal.
CAPTCHA: stinky
Admin
that's what commit is for. If you expect to update one row and it says 6000, back up.
Admin
Your English is fine. But, the real WTF is the USA date format.
Admin
what about setting the watchdog to a prod-only build, or allowing it to be disabled on debug builds?
Admin
funny, I always do my deletes like this:
delete from table where stuff = bad;
1 row updated
commit;
Admin
Admin
Well, I forgot to tell. I did write some functions to manage date format. The function's output format was mm-dd-yy for some reason (I can't remember why), and that's how I was enetring data to SQL. worked fine in english installation, but no in spanish. The real WTF should be: Why did they give me such project for me ALONE if I had such inexperience?
Admin
I actually installed gas turbines, and can tell you that just ONE turbine can take out half of a steel mill with a hiccup like this. Not just flaming metal, but:
(300psi of Natural Gas) + (Flame) = Hiroshima
Admin
Mananging dates doesn't require experience... it will allways be a pain in the ass.
Captcha: kungfu, the skills that everyone need to work with dates!!!!!
Admin
I have a reverse-steel mill disaster story. A continuous casting machine in a local mill was severely damaged because of a setup error by the supervisor (or so claimed all the union workers). To save face he had to blame the control software, so I was hired on a short contract to rewrite some of the graphics routines used by the control system. The status display had nothing to do with machine control, it was even on a different computer. But he got to cover his ass and I got to be the Big Bucks Consultant for a few weeks.
It was a fun job, every time I ran a test several football fields full of huge equipment woke up and get busy. No molten steel was involved, but when you're in the control room by yourself with all that under your control you can't help but let out a little Mad Scientist cackle now and then.
Admin
So this is that "undefined behavior" I keep hearing about in C++.
Admin
It sounds like this application ran on a PLC, instead of a computer. This means that the application was probably written in one of the IEC61131-1 programming languages, such as Structured Text or (gulp) Ladder Logic.
On the one hand, a very large percentage of the types of applications which can wreck millions of dollars of equipment, or kill someone, run on PLCs.
On the other hand, you would be appalled at how primitive some of the software design and development environments are for PLC programming.
I could easily see how something like this could happen.
Admin
Occasionally, during testing at remote sites, we would just unplug the ethernet cable on one of the HIMs during testing just to screw around with the QA team. We were all going to be there, anyway, so we didn't care about the delays. We'd all just scratch our heads and end up blaming it on one of the QA guys, who were always seemingly in a rush. We'd all have a few laughs. Including the QA guys. Good times.
Admin
Word. In College they actually said this out loud, pieces most be made with the lowest acceptable quality.
Admin
(grinning, ducking, and running)
Admin
Lesson: Always use &&
Admin
This was before by time but a company I worked for supported warehouse control software for a large perfume distributor.
Ninety bottles of perfume made up a box and nine boxes made up a crate. Each bottle retailed for about $50 each.
The warehouse was completely automated and loading and unloading stock was performed by loaders without any manual intervention.
In fact.. everything relied on the stock control system to know which slots in the warehouse are free and which are occupied.
I don't remember the reason now but the stock control database needed to be restored from backup. But instead of loading the end of day backup, a backup from the previous week was restored instead...
Admin
Admin
And to put it another way, would you trust your life and other people's lives with control software running on a Windows based PC?
But I sometimes do see first-hand what you are hinting at (vintage wtf worthy ladder-logic rungs).
Admin
Admin
No, the real WTF is that the production system will turn on without the watchdog enabled. The systems I’ve worked that had a watchdog timer require that you disable the watchdog in software and set a hardware jumper. Otherwise either the startup tests will fail because the watchdog doesn't generate a reset signal, or hardware doesn't let the mechanical parts of the system start up because the watchdog hasn't started checking in.
There isn’t much point in having a failsafe if you don’t bother to test it.
Admin
I've only been a professional programmer for 5 months. No big snafus for me yet. Yay.
Admin
Ah, you have never worked in the manufacturing arena - for a high-cycle machine like injection molders every second does matter. If you can shave off even a few seconds that can add several thousand more products made each day.
You can really say the same things about cars - why make any car go faster than 20mph - what is wrong with taking some extra time to get across town. The reason is that there is an advantage to getting somewhere faster.
Admin
Yea, that's what I do...I like to scroll through the select and make sure that what I see is what I want...I'm paranoid though, and I'm always afraid I've made some glaring SQL error if I don't double check it.
Admin
When the hardware really has to run, and the software really cannot be permitted to fail, then your first choice should probably be a PLC. Everything else is just a PC.
However, a lot of PLC software is written by engineers, not programmers. PLC software design is chock full of traps for the unwary. The applications are massively concurrent. They are basically Dijkstra Guarded Command Language programs on steroids. I have actually had a great deal of success modeling them using UML diagrams, and checking designs with the SPIN model checker, instead of fixing the race conditions and deadlocks with timers.....
Admin
Well we had less than 1M of space for the actual logic code. Thus you cannot go around checking every variable because then you would have no room to actually do anything. Those variables we did check where those considered far more dangerous. The flying ejector plate was fixed by reinforcing the end stops.
Admin
On one of the larger projects I worked on, on of the smaller teams was in another city. They had a guy on their team that everyone down there loved and everyone up here hated.
About midway through the product cycle, he checked some code in, using an API a buddy of mine wrote. This guy then asked my buddy about a problem he was having. My buddy wasted an entire day trying to figure out what the problem was. Based on the code that was checked in, an assert should have been firing. But it wasn't. My buddy finally figured out, that this dude had TURNED OFF the assert system for the entire product. Because his code was asserting, and he couldn't figure out why. And the rule was you couldn't check code in that was asserting.
A few months later, another friend of mine was fuming. He just spent several weeks fixing up code. Turns out someone turned off the memory leak detector for the entire system. My friend spent two weeks fixing up all the memory leaks that had been introduced after the detector was turned off. Yeah you guessed it, the same dude turned off the leak detection, when he created a memory leak and couldn't figure out how to fix it.
Thankfully these problems were caught before we shipped. But unfortunately both were caught by accident.
Admin
Disabling in debug can be a bad idea if your testing is 98% on debug and 2% on release (admittedly, such a testing profile is a bigger WTF). And I guess I just wouldn't like side-effects like that as part of the difference between debug & non-debug, but I don't feel too strongly about it.
Having a method to disable in debug is the way we actually do it.
Admin
(anybody else here in the under-256k-club?)
=) Everything's relative. But needless to say I'll reinforce your point: our run-time error checking is probably 1/4th yours.
Admin
That you know of. ;)
Some say the first one is the worst. That first time you look in the code and you see exactly what you did wrong. That there's nobody else responsible. It's all you. Ah, that email to production, "Priority: Stop all work on model 123, " just before running down there to make sure they go the message. You'll never forget your first.
And sometimes, late at night, even years later, you'll wake up wondering about how you could have done such a damn stupid thing.
I'd say the second time is the worst. Anyone can make one mistake. Sure, two mistakes are normal too. Over the course of your career, you're sure to make a couple of doozies. At least that's what you'll tell yourself.
But you start to wonder...
"Is it...me? Am I the fuckup? Do I know what I'm doing, or am I just wearing the juice?"
Then you realize that almost everybody is a fuckup, just in different ways.
Everybody.
Admin
Hah, I'm in that club. Believe it or not, you're still getting new members. PIC 10F202
750 BYTES of Flash 24 BYTES of RAM
That's not k.
Admin
Admin
Hmm... considering your DELETE without a WHERE... Sometimes I just love Oracle:
...and the table is just the way it was just one minute ago :)
Admin
bramster owned up and took responsibility for his mistake. This is utterly unlike the standard practice in the industry..... and must be aggressively stamped out before it becomes a trend!
State of the art is to blame the victim, and it's worked well for decades. Why change now?
Admin
It's the law in my country... you have to pay a percentage of your salary to receive medical attention, if you are unemployed.. you still have medical attention... but it is not the kind of medical attention you want.
Sorry for my horrible grammar.
Admin
This is exactly the reason why we have professional Software Engineers in Alberta. You get your iron ring exactly the same way a civil engineer or a mechanical engineer or whatever would, and you are held to the same code of moral responsibilities that, if one violates, causes you to lose your license to practice engineering.
This guy would be in serious trouble if he was a Software Engineer, and all the people who let this software get into a production situation where lives are at risk without sufficient testing would also be in some big crap.
Admin
Not to mention a TWO-WAY communication? If nobodyListening then shutdown; -- alternatively an alarm with countdown to shutdown.
Admin
And the European one (DD-MM-YY). The only logical is ISO: YYYY-MM-DD.
Admin
How does an error in the tenths digit of longitude correspond to about 800m? 170.5 degress is different from 170.0 degrees by about 500km. Did you mean tenths digit of the minutes, or something?
Admin
Admin
WTF?
Admin
The watchdog is the thing that has to listen for the "ok ok ok", and then kick something when it stops coming.
Admin
Admin
This pales in comparison to some of your stories.... but:
I accidentally deleted most of my department's website (a badly typed
rm
, and ctl-c'ed it after I realized it was taking entirely too long), shortly after starting a web programming job at my university.No big deal, just restore from a backup right? (the important, and frequently changing data was safely unharmed in a database) I went to the network admin and asked where the tape drive is. "What tape drive?" The backups? "What backups?"
The sweat started to pour... but when I calmed down I had an epiphany. The poor man's backup: GOOGLE CACHE to the rescue :)
Needless to say, we now have a backup system and policy, a dev system mirroring the production server, and source control.
Admin
My worst failure was renaming my popular site to some lame gibberish, and covering with an even lamer story about not wanting to offend my grandma. Seriously. I couldn't just make something up when I talked to my grandma, I had to rename the website.