- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Deletes and updates with no where clause are ok, but still only act on one table at a time. For real disaster, you need something bigger and more subtle.
The mysql command line client includes a history buffer which is saved whenever you exit it so you have a history of all commands over time. It also uses the standard readline commands, so the up arrow key retrieves the previous commands, but did you know that using the Home key (in some key mappings) retrieves the very first command from the history buffer?
So what happens when you setup your new database machine, and the first thing you do is copy and paste some DB setup commands which happen to include on the first line:
drop database xyz;
Then a few months later once the machine is all in production, you happen to be typing a command, and on your laptop keyboard Home is right near Enter, and you accidentally happen to clip the Home key as you're reaching for the Enter key? That's right, your current command is replaced by the first command in the history buffer and executed in an instant. Ouch.
Of course discovering that your nightly backups had been broken for 6 weeks doesn't help either.
The good news was that it was an InnoDB database, and with a bit of help from Heikki himself, we were able to retrieve everything. All the "drop database" did was delete the meta data files and the links to the "root pages" in the database, but all the data was still there.
He created a program for us that scanned the existing database files looking for "root pages" and dumping the raw binary data from them. By inspecting them we could work out what table they were for, then he created a special create table () syntax that allowed adding "rootpage=xxx" option that would recreate the metadata, but directly hook up to the key page in the innodb data file and voila, the data was all back. phew
The changes never went into the production version it seems, just a test version he sent us, pity, might save some other peoples skins sometime in the future as well...
Admin
Shit happens, and it doesn't come in teaspoons, it comes in truckloads, and a truck is always 'round the corner
:)
Admin
You still believe American factory workers make better quality clothes than Chinese ones and American programmers write better quality software than Indian engineers?
Admin
No kidding. I wish this were Digg so I could digg the parent up. At least no bug of mine has ever forced anyone to retranquilize a large carnivore.
Admin
this reminds me of a case when a co-worker of mine had commented like 30 lines of code while waiting for the database to be deployed
he then told another co-worker to remove the commentaries... and he did remove them... by deleting them 2 weeks later the first guy asked where the code was. "Well you wanted it removed so I removed it..."
Admin
Yes I agree. Tigers > humans. There's frigging 6 billion of us and not that many tigers, so we certainly can afford to loose some humans.
Admin
Hehe yeah. The bottom line is that man is egoistic, hedonistic and as lazy as possible. This is also why a communist system doesn't work and capitalist system sort of semi works by forcing us to get up off our asses and do something. What I mean by "semi-working" is that the great majority of people dont give two shits about their work what they are doing. This not only applies to software development but all areas of human activities. "As long as it pays the bills". I recentely had kitchen renovation and new floors installed and walls plastered. I had shit job done on each of those tasks.
If you want anything done properly you first have to teach yourself to be the master in that particular field and then do it yourself. Or get a friend hobbyist to do it for yourself.
On that note one would think that open sores software would be far better than commercial software. After all most open source software is created because of joy and exciment of creating something. Just too bad that only takes you through the "fun" bits of a particular program. Then the "boring" bits get less attention thus reducing the overall quality.
Meh
Admin
Yipes!
Worst thing I've done recently is this, in a big contract:
Block.unparent() Block.ParenttoRoot() Block.setparent(Block) //THIS should cause an infinite loop and a crash on the stack- but it didn't. Somehow.
which, for some reason, compiled and ran without issues. It had been in there 6 months before a major core upgrade caused a "crash-without-error" which took me weeks to sort out. I've never felt so embarassed since.... well, I'm not going to say....
Admin
Been there, done EXACTLY that! On a MyISAM database.. That took us down for a day because our hosting company was incompetent morons who fscked up the restore from backup. Fortunately we had our own "unofficial" backup.
Admin
Many years ago, I was writing the monitor/DA program for an Allen-Bradley PLC that operated the control rods on a nuclear reactor (a small one).
My team (of one, me) had to poll the PLC, read a 'big chunk o' data' (tm), check it, update the heartbeat on another machine, etc. One of the requirements (in writing) was that the new program had to be "at least as good" as the in-house mash-up they were using at the moment.
This in-house code sent separate requests for each sequential word of data to the PLC, hence was massively slow. My version read the whole block at once and was done in under a second.
It failed their QA testing because their in-house test team (who had incidentally developed the previous version) didn't believe I was reading the PLC. My program ran too quickly.
There being money at stake, we asked them to point out exactly what value was incorrect (i.e. to do their jobs). In the end they rather grudgingly admitted that the code was working, but decided that it must be 'hammering the PLC' and that wasn't acceptible.
It's worth pointing out here that the test team had full access to the structured, pure-block, 1-statement-per-line, 1-plus-comments-per-3-lines source code, and unlimited call on the developer and still didn't quite grasp that I was making one call to get all the data at once and not 1024 calls to get the same data word at a time.
To get paid, we slowed the monitor back down again. It still did the single, 1024 word read; it just waited 30 seconds before telling you.
Admin
Sounds like one of the previous WTF's on the site..... What was the title... "It must be wrong"
Admin
I would agree with you but for the fact that here in the USA, we "software engineers" are not the ones allowed to make decisions. Why would I place my license on the line because of the way every company wants to do business. Over here, if you wanted to keep the license you would be unemployed real fast. They would stop hiring engineers and simply hire developers, all while dropping the pay about 20%.
Admin
The one I programmed (a long, long time ago) had 1K bits of RAM, and I used 9K bytes of EPROM for code.
I was very happy when the Big Red Button worked first time, mostly because when we hit it that time, we actually needed it to work. (Idiot worker had just clambered onto a conveyor, right at the point when my software was about to start that conveyor and hurl its load into a lead smelter.)
Admin
So I landed at the Toronto airport at about 8am, ready to meet a client and walked over to the taxi stand to get a ride to their facility. The cabbie doesn't seem to understand, even though we're speakign the same language. He asked me to point out where to go on a map, but I'd never been there before. Finally I call back to my project manager and she read back the same address I had, but .... Ottawa?
I did manage to get to the client by noon and they were laughing so hard they didn't even mind me being so late.
Admin
had a jms system handling 200 million+ messages a day. The dequeuers would insert into a db. The disk on the db box ran out of space, which, for some unknown reason, went unnoticed for two weeks.
having no access to production machines, we (the developers) only noticed when we got complaints that the queues were backing up on gateways, preventing users from sending messages into the system.
When the SLA says never lose a message, that is a huge disaster. Unfortunately, we had to drop the database and start over.
On the plus side, those queues lasted quite a long time with jboss database persistence before causing problems...
Admin
Yes.
Admin
This was the exact cause of Brown Trouser Wednesday. I worked in a team of scientists generating information for a technical database. One Wednesday I came in, and nothing in the database was younger than 15 days old. Hmmmm. Call to support. "Oh, yes. We had to restore from backup last night". "Why has a fortnight's work disappeared?" "Errrr... dunno".
Investigation showed that 15 days previously one of the support team had been making modifications to the script which did the overnight backup. Yes, Virginia, he was editing the script in the live environment. So as to avoid having a tape mounted while testing, he had commented out the bit where the data was actually written to tape, and had forgotten to uncomment it again.
Admin
Actually, most PLC code is written by neither programmers nor Engineers. Most of its done by instrument technicians, operators, or the guy who lives in a van down by the river, though I doubt it was in this case. Unfortunately theres hardly any place to get good formal training on PLC programming in the US at least at the university level (you learn the hard way what not to do if you learn at all). The testing tools that are available for most PC programming languages aren't available for PLC programming. Combine that with the "lowest bidder" mentality of manufactoring today, and you end up with stories like these.
I'm tempted to call shinanigans on this one though. I thought turbine control systems were like boiler controls sytems, they have to be FM approved before released. Also, if there was a PLC providing the process control (usually there are two, one dedicated for the turbine as a safety measure along with hardwired interlocks, and another that is repsonsible for the system control) they very rarely crash and if they do its usually a plc problem not a programming one. In 10+ years I've known of maybe 5 or 6 PLC crashes, none related to programming (which is the point of PLCs).
As an aside, never fix race conditions or deadlocks with timers. Use flags and/or latches (or I think some people call them signals). Less overhead, more versitile, less headache.
ex. code block 1 {latch error} turbine_trouble := turbine_trouble or turbine_malfunction;
code block 2 shutdown_turbine := (shutdown_turbine or turbine_trouble) and not turbine_shutdown_complete); {reset latch from previous block} turnbine_trouble := false;
or in the first block you can say turbine_trouble := (turbine_trouble or turbine_malfunction) and not turbine_shutdown_complete;
depending on how comfortable you are with boolean logic.
If this story is true I can think of at least 2 people that should have been (and probably were) fired.
FWIW, I'm a chemical engineer.
Admin
-Harrow.
Admin
Pretty common to get rid of the WDT - they're a pain in the arse for debugging. Better to conditionally compile it, but who knows when you're gonna through some debug code onto a live system?
Admin
I have a similar almost-disaster story. I was called in to consult at a large sheet-metal bending shop. It was a rogue operation, run by one guy and his wife. Apparently OSHA had never been there. Both of them were missing fingers. None of the equipment had interlocks, stop buttons, or safety shields.
The machine I was supposed to program was a HUGE numerically-controlled drill, with something like 32 drill bits on a rotary head, and a large movable platform to position the metal being drilled.
This platform was made of steel, about three inches thick, and about the size of two ping-pong tables. It weighed TONS.
The platform was on a pair of 30 horsepower hydraulic positioners that could slew the platform in X and Y at about 60 feet per second, then stop on a dime within 3 ten-thousandsths of an inch. Cool stuff.
But there were no safety barriers around the platform, and no contact sensors, so if you happened to be standing in the way of the platform,that is, within its range of motion, you cuold be perfectly fine one second, then cut in half or decapitated by the moving platform a smidgen later.
I was there to consult on how to fix the platform driver code, as sometimes the feedback loop would become unstable and the platform would start shaking several inches back and forth, about twenty times a second. Impressive, seeing tons of steel shaking like that.
My first inclination was about to ruin out of there screaming, but hat wouldnt be "machio" plus then I wouldnt get my 1 hr initial consultation fee.
So instead I spent an hour making intelligent "Hmmm" noises while reading the manual, then I respectfully declined to have anything to do with this deathtrap!
Admin
My most embarassing production mistake is as follows...
I was working as a senior developer on a medical office management application. In mid December one year the bosses decided they wanted to 'suprise' our clients with a special Chirstmas release. Over my objections the the time frame I was given some ideas with the only real constraint that it had to be ready to ship by the 20th.
I made a number of superficial changes and decided to update the UI a bit. In my rush to get it deployed I failed to add a new dependancy into the install.
For the first and no doubt last time, nearly all of our client base decided to promptly install the new version. By the end of Christmas day my voicemail was completely full.
Admin
Admin
We tried to loose some humans in Iraq, but that just resulted in civil war...
captcha: sanitarium - a home for politicians
Admin
That might be a substantial argument if it cited code, rather than output text.
-- Michael Wojcik
Admin
Man, these Neko Case fans are everywhere.
-- Michael Wojcik
Admin
"Why the bloody hell are my wolves out in the ocean??"
Clearly, those were badwolf.
Admin
Oh come on! Don't leave us hanging. How many days did it take for the smell to go away? How many crates lived?
Admin
Not a computer WTF, but related. I was working for a small telephone company and they started doing the billing of the 15000 subscribers (some bills two or three sheets). Quite a production that set the whole company upside down for three days. Everyone was told to go there and envelope bills. "But my job is to clean the toilet" "Envelop bills!" "But my job is System Administrator of the ISP" "Envelop bills". When half has been enveloped, on guy noticed the date was wrong. They were issued for last month. Some people were penalized and the whole production restarted. This went into a couple of mail trucks. When it was being taken, one truck was stolen with all the production in it.
Admin
I spent years running machines like that.(decades ago) My boss used to get crabby when I took time to test the man-traps, e-stops and big red buttons. Sometimes they didn't work .
Admin
One of my CmpSci profs put this on the board (this was 27 years ago): FOR J = 12 ...
This was fortran, and he said that the intent was the loop to be done for j equal to 1 through 2. However, it is valid syntax to code without the comma, and the result is the loop is done once, just for the given value (twelve). That resulted in a satellite zooming out of orbit. Gone.
Admin
I cannot remember something REALLY embarrassing, but I've been told by a runaway rolling mill shooting 950°C steel plates at 40 m/s.
As far as I was told, the mill was run under complete manual control.
I can imagine how the dev must have felt. I am just glad I am not writing Level-1 software.
Admin
I meant: I have been TOLD about.
Captcha: sanitarium (wanna go there!!!!)
Admin
Step 1) Disconnect the machine power (maybe even remove the fuses? That guy doesn't seem stable.) Step 2) Ask if the platform had always been that way, or if it suddenly started doing that one day. Mechanical stuff wears out, things get loose. There is no such thing as graceful decline... either an electronic control program works or it doesn't. Step 3) Check for any issues a wrench could solve. Step 4) No guarantees if it always acted that way. You're not the manufacturer after all. Step 5) Only if nothing mechanical can be blamed, power back up and try lowering the proportional gain. If it still oscillates, put the original proportional back and then try increasing the integration time.
And so on.
Admin
Admin
The PIC architecture is an awesome WTF in itself, isn't it? Function call stacks a maximum of 8 calls deep; five kinds of memory, only two of which you can directly access, and one you can't access at all; RAM, I/O ports, and all but one register divided ad hoc into four switched memory banks... The memory map looks more like pitfall than any sensible arrangement.
Admin
In the late '80s, I was working for the phone company, on their IBM mainframes. I inadvertently brought down the entire database system (that handled all db work on this mainframe, which was the prod box for the company) not once, but twice, by demanding in a job before it was time to run it. D'oh!
Admin
OUT OF CHEESE ERROR!
Admin
But disabling in debug is no problem at all if yoou have a professional development environment with a testing and an acceptance stage between dev and production. No lives or multimillion dollar equipment depends on my code (I just build websites), but for our important customers, we do have such a testing/acceptance stage.
Even apart from that, when I check my code in, I always compare it to the current version in SVN, and make sure that all differences are intentional.
I'm amazed that it's really that easy to threaten lives in hospitals and steel mills with such simple mistakes.
Admin
Admin
yeah #warning TODO: is what I do...
or even better:
#if RELEASE //code to skip if debugging #endif
Admin
Consumer grade drives are like that, but then again the standard consumer grade computer only needs to last 3 years or so.
Server drives OTOH are made to spin up once and work for 10 years without failure. They cost more, but they do it.
Admin
That's great. Wasn't sure anyone actually used SPIN in the real world. Easiest course I had in grad school was called, "Software Engineering," but was really just a semester of using SPIN for various exercises.
The only shortcoming of SPIN is that there's no automated way of translating SPIN code into programming code. :)
Admin
If all the stickers have been mailed, how is it that there are still some available?