• Prime Mover (unregistered)

    A good article, the story behind which should be included in a mandatory in a software engineering degree.

    And at the other end of the scale of avoidable cock-ups, they recently gave Chernobyl a rescreening on UK TV.

    Frightening, moving and salutary.

  • i be guest (unregistered)

    hey, your twitter btn doesn't work...api rot or something?

  • (nodebb)

    Big yikes

  • Mary (surname redacted) (unregistered)

    I've worked on something like that. The hardware was carefully designed and thoroughly tested, then a contract programmer - me - wrote safety critical code that was released untested. (Not medical)

  • John Melville (unregistered)

    I am a physician who did a computer science degree before medical school. I frequently use the Therac-25 incident as an example of why we need dual experts who are trained in both fields. I must add two small points to this fantastic summary.

    1. The shadow of the Therac-25 is much longer than those who remember it. In my opinion, this incident set medical informatics back 20 years. Throughout the 80s and 90s there was just a feeling in medicine that computers were dangerous, even if the individual physicians didn't know why. This is why, when I was a resident in 2002-2006 we still were writing all of our orders and notes on paper. It wasn't until the US federal government slammed down the hammer in the mid 2000's and said no payment unless you adopt electronic health records, that computers made real inroads into clinical medicine.

    2. The medical profession, and the government agencies that regulate it, are accustomed to risk and have systems to manage it. The problem is that classical medicine is tuned to "continuous risks." If the Risk of 100 mg of aspirin is "1 risk unit" and the risk of 200 mg of aspirin is "2 risk units" then the risk of 150 mg of aspirin is strongly likely to be between 1 and 2, and it definitely won't be 1,000,000. The mechanisms we use to regulate medicine, with dosing trials, and pharmacokinetic studies, and so forth are based on this assumption that both benefit and harm are continuous functions of prescribed dose, and the physician's job is to find the sweet spot between them.

    When you let a computer handle a treatment you are exposed to a completely different kind of risk. Computers are inherently binary machines that we sometimes make simulate continuous functions. Because computers are binary, there is a potential for corner cases that expose erratic, and as this case shows, potentially fatal behavior. This is not new to computer science, but it is very foreign to medicine. Because of this, medicine has a built in blind spot in evaluating computer technology.

  • pudin9 (unregistered) in reply to Prime Mover

    "the story behind which should be included in a mandatory in a software engineering degree"

    For us, it actually was included, along with Ariane 5, Mars Climate Orbiter and a couple more, presented at the start of the Software Development course, to make us aware of our responsibility as software engineers.

    I remember that I was curious and started reading more about Therac-25, and I was absolutely TERRIFIED afterwards. I learned that there is no excuse for skipping on proper testing.

  • Kildetoft (unregistered)

    Rather than just asking what obstacles to quality the process you have provide, you should ask the similar but even more important question: "What obstacles to poor quality does the process provide". High quality should not just be a priority. The process should make it hard, or even impossible, to produce low quality.

  • MiserableOldGit (unregistered)

    i certainly commented on that original post with words to the effect that I hadn't heard of this incident, and was grateful for the read.

    But then I didn't do a Software engineering degree, I did a more traditional branch of engineering and threw in some software modules to spice it up.

    Certainly in (Civil) engineering we did look at failure quite a lot, and were encouraged to always be examining the causes of it, from the very start it was drummed into us that a lot more can be learned from examining a failure than a success. That's why I like this site.

    I might have commented that in twenty years in this sordid industry it has surprised how unwilling anyone is to dissect failures and understand actually what went wrong , beyond the scruffy political blamestorming, 'natch.

    The Therac example is brilliant, hidden underneath the technical and systemic errors are a whole mess of human factors and assumptions that still plague us now.

  • Robin (unregistered)

    Thanks for this. It's genuinely the first time I've heard this story. (Although I changed career to become a developer in my 30s,and never studied software engineering at university.)

    Although a totally different field, it reminds me a lot of the fatal consequences of some horrific spaghetti code in Toyota cars that I read about a few years ago. There's a good overview here, for those who are unaware: https://www.safetyresearch.net/blog/articles/toyota-unintended-acceleration-and-big-bowl-%E2%80%9Cspaghetti%E2%80%9D-code

  • Dave (unregistered) in reply to Robin

    Er, that wasn't actually a software problem. It was 99% people hitting the wrong pedal, and 1% pedal entrapment. Toyota were fined for not admitting (and trying to conceal) that they had inadequate oversight of software, but that wasn't a causative factor.

    It's very hard to spot a rare problem which has identical sympyoms to another problem that is a couple of orders of magnitude more common. But Toyota should have been able to prove it definitely wasn't a software issue once it was questioned.

  • (nodebb)

    Peter Neumann's comp.risks usenet group has been around for longer than www. It's a worthy companion to wtf.

    https://groups.google.com/g/comp.risks

  • (nodebb) in reply to Dave

    Yes, most people pushed the wrong pedal or the pedal got stuck. No, some people did experience unintended acceleration caused by software. People have to understand that in all modern cars, the accelerator pedal is a joystick which provides an input to the ECU (aka PCM) which actually controls the throttle. This underscores the dangers of systems like electronic steering and braking. Yes, computer control of the throttle provides benefits, but it also introduces risks. Keep that in mind when you're shopping for a self driving car. I, for one, will never get into a vehicle which has no steering wheel and brake pedal which are mechanically connected to the steering rack and brake pads, respectively, so that if all else fails, i have a chance to steer and brake forcefully.

  • MiserableOldGit (unregistered) in reply to Mr. TA

    Unfortunately even good mechanical brakes may struggle to stop a vehicle stuck in gear with the engine revving. I've experienced that .... the right thing to do (I suppose) would have been to take it out of drive, but I'm not entirely sure it would have let me. And that was a 2000 Lexus stuck in "high idle" , I wasn't battling a massive amount software there, and probably only a few dozen horses.

  • Prime Mover (unregistered) in reply to MiserableOldGit

    I've just come out of another depressing meeting where the same person has bemoaned the same bug introduced into the configuration datafile by the same team who made the same mistake that they made the previous week that caused it to incur the same critical outage to the same plant in the same way at the same time as it did the previous time.

    My input was: "If we can't guarantee that the configuration team can be trusted to enter the data in the correct format, and that's a question for another time, can we add some software in place that will detect whether the data is in the incorrect format before we actually feed it into the application?"

    "Stupid idea," snapped back one of the senior engineers, "that just gives us more software to maintain."

    "Okay, so what's the procedure by which the configuration team enter the data into the configuration data file?"

    "Using Notepad, of course," he replied, with a du-uh in his voice. "You're not going to suggest that someone writes a program just so these oiks can enter a simple set of configuration data into a stupid text file properly, are you?"

    So now it's the job of the support team to inspect the configuration file by eye to make sure it is in the correct format.

  • J.G.Harston (unregistered)

    I can't remember the exact details, but one problem implementation I read up on had a dial controller that you spun to get the correct power, but you could spin it both ways, and if you span it downwards it would go 2 1 0 9 8 7, and there was no interlock so you could change the power while it was active, so shooting out 9 Zoobles when twiddling from 2 to 6 by going backwards.

  • J.G.Harston (unregistered) in reply to Prime Mover

    gawd! Number One Rule on data processing: Validate EVERYTHING you are fed from outside, assume EVERYTHING out of your control is garbage.

  • Foo AKA Fooo (unregistered) in reply to John Melville

    ad 1.) Count yourself lucky! Part of the reason COVID control in Germany works badly is that infection reports often have to be exchanged by fucking FAX machines!

  • Ran And (unregistered)

    What were they thinking?

    This is exactly why college students exist and lab rats exist. Zap them, and bribe them off with a food ration. Find some "natural cause", like too much sun.

    If that's not good enough, do it out of the country, some place that deserves it. Or better yet, pull Pierre Currie. Take all the new risk, and sell it as bubble gum, cigarettes, toothpaste, even parlor games at bars where people can put in a nickel and zap each other. Odds are, they'll die in a drunken car wreck on the ride home.

    That internal market article, I was thinking, it really doesn't take that much effort to go full on activist liberal. Smile, don't be a jerk, fill out a form, and take $482.30 instead of a flat $500, and look, you're one of them now.

  • Ian (unregistered)

    A couple of things: first, this incident is behind a lot of the arguments in favor of making software development a branch of engineering, subject to the same rigors and responsibilities that other fields of engineering are. Civil engineers are legally responsible for the structures they sign-off, for example. Secondly: my father work for AECL, but as a mechanical engineer in the reactor section, for quite a while. You need to to understand that AECL is a crown corporation of the Canadian Government, making it one step removed for being a Government Ministry. Dad finally took early retirement when he received a poor performance review from a person he didn't even know. When he queried this, his was informed that this person was, in fact his direct supervisor, and had been for the past year. They'd just neglected to inform my father of the change in management. That was, and probably still is, the way AECL did things. and it always has been a company with a huge number of executives, mostly political appointees.

  • MiserableOldGit (unregistered) in reply to Prime Mover
    So now it's the job of the support team to inspect the configuration file by eye to make sure it is in the correct format.

    If it's anything like places I've been, I bet there was a manual check in the process back in the dim and distant history, until someone threw it out as a waste of manpower.

    Couldn't you write them a nice regex .... >titter ye not!<

    I remember being sent years ago to a meeting about "the dots" ... I was intrigued. This was call centre staff chucking in a full stop when new client rung up and didn't know their own postal code (zip). The automatic lookup had been turned off since the bizniz had decided the post office was charging too much to use the address verification service. Everyone was to look up an unknown address in one of the big directories that got issued quarterly, these were littered around the office, so some were using them. Of course a small percentage of addresses aren't in there (very new ones, generally).

    So they'd set the field to length greater than 0 ... the userbase had responded by just shoving a full stop in the field They responded by saying the length had to be greater than 1 ... the userbase had responded just shoving TWO full stops in the field

    I am dragged into a meeting with about 9 people, analysts, supervisors, project managers. I start with "Can we just put a mask in so it has to be a UK postcode? and then somehow punish those entering made up ones?" No ... sometimes there is no postcode we have to allow for that ... "Fine, well we can add a checkbox, or accept the dots, and then just have somebody go through the missing ones at the beginning of each quarter and fix them. So that's a filter or a report or something. " ... No, there's far too many, it's not just new addresses, you see the data entry guys are taking advantage because they are in a rush. "I got that, there's no software solution for that though, how do I help?" ... this meeting is about whether to make the field a minimum of three characters.

  • Best Of 2021 (unregistered)

    It's an easy mistake to make - I'm sure we've all written race condition errors, and that's with modern languages, not the tools available (particularly for hardware interfacing) in the 80s. The developer is not at fault here, what's at fault is the process, particularly around testing.

    Although this kind of 'only shows up when operated quickly by a trained operative' is quite hard to test for, to be fair. It is the sort of mistake that could legitimately make it out of the lab and into the real world, even in critical systems. The trouble with trying to rigorously test a control system is that it interacts with a human operator, and humans are very difficult to predict.

  • (nodebb) in reply to MiserableOldGit

    My electronic braking and steering point was not even specifically relating to the unintended acceleration. Yes typical 200hp engine stuck in WOT cannot be stopped with typical brake pads. However just in regular driving, imagine your electronic steering fails, or it gets hacked, or there's a hostile situation where you need physical control of your vehicle, be it accelerator, steering or braking, to preserve your security. Basically, mechanical brakes won't help against UA, but electronic brakes create yet another source of problems.

  • MiserableOldGit (unregistered) in reply to Mr. TA

    Oh I saw your point, I was just relating a surprising experience which it seems you're more than aware of!

    My engine had got stuck in high idle or limp mode or something. I could rev it to drive, but it dropped back and stuck on about 3.5k rpm ... which is a bit of a problem in an automatic, I figured I'd just keep driving until I could get it sorted. And then a kid ran out in front of me so I dropped the anchors. The ABS seemed to kick in, but the back wheels locked up and the front wheels were not fully stopping because of torque still in the drive shafts, and it wouldn't be that much from a 2.0L inline 6. Fortunately, the kid's dad, managed to grab him out my path as I was struggling to make the car swerve. He was not very happy with me, and I don't blame him. He questioned my decision to continue driving even though this fault had happened. In my (weak) defence I explained I had no idea of this consequence, you could say I hadn't thought things through properly (he did) ... I'd say the designers of the vehicle are also guilty of that.

    In a way this is nothing to do with article, but then it also is as one of the problems in this Therac example is that no-one thought a few steps beyond an "error occurred, so we do this" to "what then actually happens next, in the real world?"

    And a year or two ago I'd have scoffed at your Ludditism over drive by wire controls for brakes and steering, "They've had them in aeroplanes for years" ... ahem!

    Actually doesn't even need a direct hack to drive by wire systems, anything that can be indirectly compromised to mess up the power those PAS, PAB systems use is going to cause you a world of pain if you are actually driving at the time. A heavy electrical power drain could do it.

  • Vilx- (unregistered)

    This reminds me of The Phantom Duo: https://thedailywtf.com/articles/The-Phantom-Duo

  • SwineOne (unregistered) in reply to Ran And

    Whiskey Tango Foxtrot?

  • Denon (unregistered)

    For me the part of this story that's the most terrifying is that the manufacturers initially said it was impossible for the process to malfunction. Even today, how often are we told that some technological advance is perfectly safe and anyone who has concerns is being stupid?

  • BPFH (unregistered) in reply to Prime Mover

    "A good article, the story behind which should be included in a mandatory in a software engineering degree."

    In my case, it WAS included - roughly a decade after it happened (mid-to-late '90s).

    Today, I work on pharmacovigilance software, which has far, far less of a direct chance to cause death. We still have to have our software validated, though, which is a major pain in the a**. The Therac-25 is basically why.

  • You had me.. (unregistered)

    You had me, until that part about Racist Machines and Computers out for Our Jobs.

    Damn. Such a good, heart-touching, informative narrative -- and then SJW comes to attack.

  • Nick (unregistered) in reply to Best Of 2021

    Most UI-automation test systems would probably have triggered the race condition that happened when the UI was operated too quickly. After all, it’s easy to write a UI test that hits buttons as fast as possible.

    Unfortunately, the test engineer probably would have reacted by causing the test to perform data entry at a more “human” speed, rather than asking the developers to fix the problem.

  • (nodebb) in reply to Robin

    Of note is that both NASA and the NHTSA report that there was no software failure resulting in unintended acceleration.

    https://www.nhtsa.gov/sites/nhtsa.dot.gov/files/nhtsa-ua_report.pdf https://www.nhtsa.gov/staticfiles/nvs/pdf/NASA-UA_report.pdf

    One of the conclusions that the NHTSA reaches is that "we should be able to do this sort of investigation ourselves without having to involve NASA".

  • (nodebb)

    With such a (relatively) simple apparatus, capable of representing such a high risk of mortality to the patient, electro-mechanical (or even straight up mechanical) interlocks should have been included since day 1, and those cheapskates who didn't include even a single back-up system should be held accountable for their negligence. The aviation industry (whilst not completely innocent in this day and age) has had back-up mechanical components for critical flight controls (and more recently redundant backup systems), since the early years of flight.

  • Sigako (unregistered)

    Once I was asked by an allied research team to make our stimulation sequence able to call an infrared emitter - their hypothesis was that timed IR jolts at cannot-remember-where would improve learning. The API was able pretty dumb, it could accept two bytes at best, so I wrangled my brain for a while about coding all the modes and increments... Then I was informed the firmware is even dumber and capable of only a single mode and intensity no matter the input, with the rest of API being an unresponsive leftover of a gutted off-the-shelve product. That was a disappointment, but it got rid of the problem, so I finished the code next day and sent it to them.

    The next thing I hear about them is that they stupidly hard-coded too high an intensity into the emitter, and it was giving every participant a painful (but, happily, harmless) burn. To their credit, it was caught during the testing, and the ethics committee immediately swooped in. Not to their credit is the fact that they couldn't fix it, or modify in any way at all, despite allegedly making the damn thing themselves, and the whole project was scrapped.

  • Dave (unregistered) in reply to Mr. TA

    No, there was no software cause. That was thoroughly debunked, eventually.

    You're also completely wrong about how and where drive by wire is used. Most modern cars still have a throttle cable.

  • Robin (unregistered) in reply to Dave

    Sounds like you didn't read the link I provided. I have no knowledge of the matter beyond what I've read there and elsewhere, but it's clear from the expert witness in that particular case that the software was a mess and neglected important safety standards/procedures. Sure, the software bugs won't have caused unintended acceleration all that often (if they had it would have been caught by Toyota's testing), but it was proven that, due to the lack of failsafes, the software defects would, in certain circumstances, cause UA. And that these circumstances, while rare and impossible to reliably reproduce, would be essentially guaranteed to happen given the number of cars sold and the amount of miles driven by them.

    In addition, I understand that one thing that made that particular case special was that there was a massive skidmark on the road, proving that the victim was indeed braking and thereby disproving the speculative defence of "they must have pressed the accelerator when they thought they were braking".

    If you have additional information that came to light later, which somehow proves the software wasn't responsible, I'd be interested to hear it - but even then it's clear enough that the software was badly designed, to inadequate safety standards, with at least the potential to cause serious accidents. And that's why I was put in mind of it by this DailyWTF article.

  • MiserableOldGit (unregistered) in reply to Dave
    You're also completely wrong about how and where drive by wire is used. Most modern cars still have a throttle cable.

    I don't believe this is true, but it might depend on your definition of modern. In any case, the car being discussed does use potentiometers in place of a cable, and even some of the cable throttles have actuators that will override what your foot does.

    I agree with you the software cause wasn't proven, it was not thoroughly debunked at all though, the NASA report said it was possibly the cause of only a very small number of accidents. Doesn't excuse the shocking state of the software though, which was proven.

    In the end, there's no evidence the Camry was experiencing uncommanded acceleration events more frequently than other vehicles, so it's a moot point.

  • (nodebb) in reply to Dave

    Read up on the cases. There was the lawsuit where the driver frantically braked both using the pedal and parking (hand) brakes. There were news of police assisting a driver whose car was accelerating by itself but not in WOT, which allowed him to stop by basically jamming the brakes. In a few cases, it was the ECU - software or hardware, impossible to know for sure, but irrelevant - not the driver's fault.

    I agree, most of the reported cases were floor mats getting rolled up and pushing the wrong pedal.

    Having said that, Toyota is definitely not the only culprit, despite the media storm. Jeeps have terrible ECUs, too: it was demonstrated they can get hacked trivially. In fact, despite these issues, Toyota makes some of the most reliable cars in every way. I love Toyota. Just like Boeing, despite the terrible management in developing the 737 MAX, overall it makes great planes. We're not bashing any particular company here specifically.

    I would even go as far as entertain the possibility of the big three, GM Ford and Chrysler, colluding somehow with journalists at a few news outlets to drive anti Toyota hype. This is pure speculation, but it makes sense. Here you have a company which (together with other Japanese car makers) offers American buyers reliable cars without exploding engines and transmissions. The big three and the auto unions hate, and I mean HATE, Japanese imports.

    Regarding throttle control, 99.9% of new cars use throttle by wire, and have been since about 2010, and some makes even before that.

  • (nodebb)

    https://www.picoauto.com/library/training/electronic-throttle-control-drive-by-wire-or-fly-by-wire

    https://en.m.wikipedia.org/wiki/Electronic_throttle_control

    Look at the references of the wiki article. These systems were designed decades ago. BMW was first in 1988. That's 33 years ago. One can very confidently state that "virtually all modern cars use throttle by wire ".

  • MiserableOldGit (unregistered) in reply to Mr. TA
    There were news of police assisting a driver whose car was accelerating by itself but not in WOT, which allowed him to stop by basically jamming the brakes.

    That one turns out to be dodgy under analysis https://www.npr.org/sections/thetwo-way/2010/03/toyota_cast_doubts_on_james_si.html

    Doesn't mean there is not a problem, I love Toyota/Lexus too, but I recognise the story told about the shitty software, it shocks and surprises me, as does the Boeing story. And then it doesn't, because I work in this shitty industry, I've seen shitty code, and I know how everything just gets reused because it hasn't "yet" crashed and burned. And just like the example in this Therac story, there's an incredible amount of institutional inertia to overcome even when it does crash and burn.

    I'd take some convincing that the problem is not universal, even if the serious outcome is rare, but unless we see some sort of converging standard on black boxes for road vehicles we'll never really know.

  • Alon Altman (google) in reply to John Melville

    Computers did not create medical errors. Medical errors have existed since the dawn of medicine. Almost all medical errors have nothing to do with computers, but are also a non-continuous risk. e.g. a nurse misreading a unit and giving 10x or 100x the dose of medicine. Or, bad handwriting or just similar packaging causing a wrong drug to be dispensed. Or, incorrectly identifying a patient or mislabeling a limb to be treated.

    In fact electronic records could in many cases help reduce some of these errors. A computer could detect a drug is unlikely relevant to a given patient or condition and show a warning. It could also detect an unlikely dose. Printed and on-screen medical records reduce the chance of misreading a drug or patient name or a quantity. Barcode scanners and stickers can help verify the correct samples, drugs, and patients are processed.

  • Alon Altman (google) in reply to John Melville

    Computers did not create medical errors. Medical errors have existed since the dawn of medicine. Almost all medical errors have nothing to do with computers, but are also a non-continuous risk. e.g. a nurse misreading a unit and giving 10x or 100x the dose of medicine. Or, bad handwriting or just similar packaging causing a wrong drug to be dispensed. Or, incorrectly identifying a patient or mislabeling a limb to be treated.

    In fact electronic records could in many cases help reduce some of these errors. A computer could detect a drug is unlikely relevant to a given patient or condition and show a warning. It could also detect an unlikely dose. Printed and on-screen medical records reduce the chance of misreading a drug or patient name or a quantity. Barcode scanners and stickers can help verify the correct samples, drugs, and patients are processed.

  • Alon Altman (google) in reply to pudin9

    To be fair, testing is only one of the lessons of Therac-25. A more important lesson, that is echoed in the 737-MAX case, is always have physical/mechanical interlocks for safety-critical systems when at all possible, and have properly redundant systems otherwise.

    In the case of Therac-25, interlocks that existed in previous versions were removed.

    in the case of the 737-MAX, a new software system did not have a physical interlock and relied on a single input rather than the redundant inputs that were available.

  • David Mårtensson (unregistered) in reply to Prime Mover

    I have had something similar, it was not in any way critical or dangerous but involved a form with a date field, and "one" user that still failed to enter the dates in a consistent manner.

    I learned then that you should strive to avoid any input errors you can, any extra work it is to do that dwarfs the time it takes to find and fix the resulting errors caused.

    This was quite early in my career, later I found this story and others like it and every time I find this story, I read it again to remind my self to never ever be lazy on input and UX, even if its not critical.

    Because if I ever end up writing software for something critical I want to be in a habit of always trying to build it as good and as easy to use as I can with as many protections against user errors that I can get.

  • (nodebb) in reply to Prime Mover

    "You're not going to suggest that someone writes a program just so these oiks can enter a simple set of configuration data into a stupid text file properly, are you?"

    If they can't do it right, then yes! That's what programs are for!

  • ZZartin (unregistered)

    Who wants to bet the decision to remove the mechanical safeties involved something about it being cheaper....

  • Mitch (unregistered)

    Thank you very much for this summary of the Therac 25 case! Fortunately nowadays Medical device software is subject to a strong regulatory oversight by the FDA. Your comment on PROCESSES is so true. This is why a standard titled IEC 62304 software lifecycle processes is recognized by the FDA. One key requirement of this standard is to define a software safety class. Class A no injury possible from software failure. Class B non-serious injury. Class C serious injury or death. The second key requirement is to put in place a risk management process, which shall assume that the probability of software failure is equal to 100%. 100% means that it shall be assumed that latent bugs are present. That standard is surrounded by other technical reports and guidances recognized by the FDA, on software risk management, safety cases, software validation. And I can tell you that the FDA is very picky, when they review your software design and testing documentation. For the first version and for every design change. That’s good news for all of us. An adverse event like the Therac 25 is very unlikely today.

  • I dunno LOL ¯\(°_o)/¯ (unregistered) in reply to MiserableOldGit

    Even a direct physical linkage is no guarantee of absolute control.

    I had a "high idle" problem in a former vehicle. It turned out to be the rubber coating on the throttle cable had degraded over 10+ years, and fouled the cable path such that it didn't fully spring back. A few times it also got stuck momentarily while in gear at low speed, and I had to hit the brakes and shift into neutral. The solution was to scrape off the gunk.

    Yes, it was much easier to repair than a software problem, but still had to be diagnosed, and I recall that I had taken it in to be looked at, and it still kept failing. Eventually I noticed that the throttle was at the top front of the engine where I could inspect it with no tools, and saw the gunk on the cable.

  • Aaron (unregistered) in reply to Mr. TA

    How about the case where the software is right, and you are wrong? Let's say you hit a patch of black ice, and the software is trying its best to navigate through it. The human panics and slam the brakes. Should the human be allowed to crash?

  • Codes McCode (unregistered) in reply to Prime Mover

    In my case, it was part of my computer science degree: A professor used it as a case study in my UX/UI class on how a poorly designed user interface could lead to user error which could lead to death/maiming -- I did not get on well with said professor, as his opinion was that if it could be triggered by a user, it must be user error. And that somehow (magically), a "proper" UI could solve the issue.

  • jay (unregistered)

    Especially when the tester is not a professional tester, but someone who has been recruited ad hoc to do testing -- like when the people who will be using the software in production are asked to test a new release, or when we had no formal testing department so we just had to grab people -- they can have a totally wrong attitude. I've worked with testers who expect the software to be flawless, and if they find a bug, they say "I must have done something wrong" and repeat the test until it works. Thus, of course, ignoring the problem. Or they would just freak out when they found a problem. Literally, I've had people who were assigned to test software call me in a panic because they found a problem. I've taken to telling new or ad hoc testers, "Your job is not to prove that the software has no errors. Your job is to find the errors that are surely there."

    The best tester I ever worked with set himself a goal for everyone new release: He would find and report 100 bugs. Sometimes he had to stretch the definition of "bug", like calling a heading that was not properly centered a bug. But he always, always, met his goal. I once visited a company that had an excellent testing department. One thing they did was give prizes to the tester who found the most bugs. They weren't huge prizes, things like coupons for free pizza. But it helped create the right mindset. Your goal is to find as many bugs as you can. It is not a problem if you find an error. It is a problem if you don't.

  • jay (unregistered)

    Companies often get the idea, "If our programmers would just do their jobs right, there wouldn't be any bugs, so testing should be superfluous." The problem with this, of course, is that no one can be perfect.

    As a programmer, I've had quite a few times that I thought I had thoroughly tested new code. And then it failed because a user did something that just never occurred to me to do.

    For example, a web site I designed failed because a user typed his entire address -- street, city, state, and zip code -- into the zip code field. And my validation function failed because that exceeded the maximum size for the field in a database query. Easy to fix, of course: just check the length. But it just never occurred to me when testing to try and type an entire address into the zip code field. Why did someone do that? What did they think the fields labeled "Address", "City", and "State" were for? But someone did.

    Another time I coded that if you could get help by clicking on the "Help" button or by pressing the F10 key. Seemed straightforward enough. Worked great in testing. Literally our first customer pressed the F10 key and held it down so that it generated multiple key press events. So the first F10 took them to the help page, and then on the help page we got another F10. I had never considered the possibility that someone would press the Help key when already on the help page, and the program failed.

    I don't think a mistake on my part has ever killed or injured anyone -- thankfully. I used to work on systems for aircraft maintenance where I was concerned that a software bug could result in a plane crashing and killing people. These days I build web sites for hotels, so if I screw up the worst that is likely to happen is that someone's vacation is ruined.

Leave a comment on “The Therac-25 Incident”

Log In or post as a guest

Replying to comment #:

« Return to Article