- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
How had this not been picked up before? Also Frist
Admin
nobody uses repeat-until in the age of for_each, for_each_n and PERFORM VARYING
Admin
A bug fix that never got merged. Wow. /s
Admin
I'm don't believe that this qualifies as a Heisenbug. It has well defined of prerequisite (Debug level less than 4) that explicitly affects the code path, so it is just a matter of eventually tracking it down. Especially when the defect was reported as "I found where it was testing the wrong boolean."
On the other hand, I would definitely call this a Heisenbug https://stackoverflow.com/a/29298742/31326 as inserting the print statement affects the resolution of the variable used to compute the loop limits.
Admin
I'm don't believe that this qualifies as a Heisenbug. It has well defined of prerequisite (inserting a print statement) that explicitly affects the code path, so it is just a matter of eventually tracking it down.
Admin
A heisenbug is any bug that changes in some way when you try to find it. This definitely qualifies.
Admin
By that token, every issue I introduce by badly setting a configuration parameter could be described as a heisenberg.
Admin
A heisenbug is a bug that moves when you try to locate it, or changes place depending on who is trying to locate it, or has been previously located and "fixed" repeatedly, but continues to crop up in new places because some root cause has not been fixed. Neither of these are Heisenbugs.
An example of a heisenbug would be the "rogue pointer" issue I had a couple of years ago. It was a copied pointer where the target object got deleted, and led to random parts of program memory being read and written (though this should throw access violations, the way it was doing it got around those) causing horrible memory corruption. Every time it would pop up, we would just see the results of the corruption; that is that some block would go haywire and throw exceptions like nuts, and die. The stack trace would only show that a ton of bad data had shown up, so you would go and build a debug build, and the problem would not show up again for a few months, at which point it would be in some other module.
The great irony is that the rogue pointer was hugely beneficial to the company. It forced developers to go into random modules and look for obscure bugs, which they would invariably find and fix, then declare the problem fixed and move on. In its own way, it forced a periodic and random refactoring and maintenance of our code base. Until we found it, the ONE time it managed to show up in its OWN module.
Admin
@Kashim: It was a Heisenbonus!
Admin
Nah, heisenbug comes from the Heisenberg uncertainty principle, especially the famous double slit experiment where if you try to determine which slit the particles are going through, the interference pattern collapses.
Though I think there should be a term for what you're describing. Maybe a shaggy bug story, in that it's a story that goes on and on.
Even by your definition, it is: "Unfortunately, it'd only been fixed in one specific branch within source control—a branch that had never been merged to the trunk."
Admin
All heisenbugs have a deterministic fix, once you know the true root cause.
Admin
A Heisenbug is properly any bug where looking for it (e.g., by inserting any sort of debugging information or altering the wall-clock timing because of the breakpoints getting hit) makes the bug vanish. They're particularly common in threaded and timing-sensitive code, which is why it's an extremely good idea to program such code very defensively; you'll never debug it and remain sane, so the effort to get it provably right first time is well worth it.
Admin
I'm startled by the uncertainty here as to what a heisenbug is.
Admin
Perhaps a higgs-bugson (known to exist yet near-impossible to replicate or find).
Admin
Hah. Perfect.
Schrodinger's bug: Each time the program is run the bug may or may not take effect and you can't predict if it will or not before actually running the program.
Admin
Actually, a Schrodinbug is already defined: http://www.catb.org/jargon/html/S/schroedinbug.html
Admin
In an interesting coincidence, The Old New thing has recently been talking about poorly thought-out merges destroying good changes or reintroducing old bugs. See https://blogs.msdn.microsoft.com/oldnewthing/20180323-01/?p=98325 for Raymond's exposition of one problem set and https://blogs.msdn.microsoft.com/oldnewthing/20180709-00/?p=99195 for an equal time rebuttal from another team at MSFT.
IMO the [Debug vs. Release] class of bugs are a key reason to actually have your release builds thoroughly instrumented in some way. You can't afford to have total visibility or nil visibility. You need total or partial.
Admin
Reminds me of very log ago, doing COBOL exercises for school. I got the code working just fine, and then I added some more comments to the source code. Oops. Program no-workee no more! It turns out in COBOL, comments are STATEMENTS, so I ended up with a lot of conditionals controlling just a COMMENT. What a dumb language. It WAS theoretically machine-independent, but IBM and others very quickly added their own "features" which made it tempting to write in "IBM COBOL" and such.
Admin
Originally an asterisk in column 7 meant the whole line was a comment in COBOL.. If school didn't teach you that then the rest of the class wasn't to be trusted.
Admin
I was working on a Cortex M7 application that started randomly hanging, anywhere from four times in an hour to not once in 20 hours. After tearing my hair out for a cumulative week or two trying to nail it down, it turned out the microcontroller had an undocumented hardware defect that would COMPLETELY lock up the system--no code execution, no DMA, even no debugging(!)--until a hardware reset. The defect was triggered when two semi-independent events occurred on the exact same clock cycle, which on a 200+MHz core means within a few nanoseconds. So the problem was not only effectively random, depending on the precise timing of external events, but even when external events were ignored it was sensitive to program structure down to a single instruction as well as to the meddling of the debug probe.
Admin
Many decades ago I was a programmer on a military system. We got a call from a site which was doing their initial hardware and software acceptance testing that one of my programs was failing. My boss, his boss, and I flew put (3 planes, 6 stops, and about 100 miles of driving). When we walked into the computer room, a number of cabinets were open, and test leads were hanging from various circuit boards. The acceptance team explained that the hardware diagnostics were failing, but whenever they tried to examine the signals with an oscilloscope, the diagnostic passed..
So to save time, they decided to run the software acceptance and then get back to debugging the hardware. I objected, explaining that the code which seemed to be failing was doing some precision calculations and would see this behavior if the calculations were giving inexact results, and said they needed to get the hardware to work without all those loose test probes hanging out.
They eventually discovered that there was noise on some lines, and it was introducing random bit errors in certain arithmetic operations; the test leads were damping some of the noise to somehow not affect the calculations used in the hardware diagnostic, but when the application was being run we'd eventually get an inconsistent result which would make it fail.