Martin's company had written a set of command line tools which their internal analysts could then string together via shell scripts to do their work. It was finicky and fragile, but frankly didn't work too badly for most cases.
There was one tool, however, which seemed to be the source of an unfair number of problems. Eventually, Martin sat down with an analyst to see what was going wrong. The program would exit successfully, but wouldn't actually do any of the work it was supposed to. Instead of doing the normal thing and writing errors to STDERR, the tool wrote to a file. Which file, however, was determined by reading some shell variables, but the shell variables used by each of the tools were slightly different, because why would you build a consistent interface for your suite of analytical tools?
Eventually, Martin was able to figure out where the errors were going, and saw that it was failing to connect to the backend database. That was a simple matter- just fix the connection string- but why was it exiting successfully when it couldn't connect?
/* Connect to Oracle*/
iStatus = BDconnect();
if(iStatus != SUCCESS_STATUS)
{
write_log(ERROR_LOG_FILE,"BDconnect() FAILED sql:%d",iStatus);
exit(EXIT_SUCCESS);
}
It exited with a successful return code, and thus the shell scripts the analysts were using assumed, wrongly, that the application had succeeded. It wasn't too much to fix this specific case, but as it turned out, this "exit with success even when you fail" was an endemic pattern across many of these tools.