In 800BC, if you had a difficult, thorny question, you might climb the slopes of Mount Parnassus with a baby goat, find the Castalian Spring, and approach the Pythia- the priestess of Apollo who served as the Oracle at Delphi. On the seventh day of the month, the Pythia would sanctify her body by bathing in the waters of the spring, drinking water from the Cassotis- a portion of the spring where a naiad was said to dwell- while the high priest would sprinkle holy water about the temple. Thus purified, they’d take your baby goat, lay it before the fires of Hestia, and cut it open to read your answer from its entrails. Unless it trembled the wrong way- that was a bad omen; they’d throw an exception and tell you to try again next month.
All that ritual and pomp and circumstance creates something I call “The Oracle Effect”. We supply some inputs, and then a fancy, complicated, and difficult to understand ritual is applied to them. The end result is an answer or prediction, and our brains have an interesting blind spot: we tend to weigh the answer more by the complexity and difficulty of the process than the actual value of the process. Ritual creates realism.
Which brings us to this study from Pro Publica, which discusses the use of “risk assessment” algorithms in the criminal justice system. Our “goat” in this case, is a collection of demographic and philosophical questions applied to a defendant. The ritual is a complicated statistical analysis (the mechanics of which are only known to the software vendor), and the end result is a simple number: on a scale of 1–10, how likely is this person to commit another crime in the future?
Spoiler Alert: the software isn’t very good at this, and also may be racist. Very few people are looking at the actual accuracy of the results though, and instead, are focused on the process. The ritual looks pretty good, and thus, judges tend to use this risk assessment as a tool to help sentencing- high risk defendants get tougher sentences.
Or take the US Transportation Security Administration’s use of “behavioral threat analysis”, which use a complicated analysis of “microexpressions” to identify terrorists in a quick glance. It’s about as effective as using phrenology to detect potential criminals. It doesn’t work, but there’s a process! And speaking of phrenology, a startup hopes to bring that back, and just like these other methods, their process has to remain a secret. “Gilboa said he… will never make his classifiers that predict negative traits available to the general public.”
It’s not just “secret” processes, though. This applies to any sufficiently complicated ritual. In a more prosaic example, I’ve implemented a number of “dashboards” in my career. The goal is usually to take some complicated set of statistics about a business process or project and boil them down to a small collection of “stoplight” indicators: red, yellow or green lights. No matter how the dashboard starts out, before long, the process for manipulating the data gets complicated, arcane, and downright broken until everything always shows green, because that’s what management wants to see. The fact that the output has no real meaning doesn’t matter- there’s a complicated, difficult to understand process, which puts a layer of perceived truth on that shiny little green light.
Even simple rituals can feed into this Oracle Effect. For example, PayPal doesn’t want to handle transactions for ISIS, which isn’t unreasonable, but how do you detect which transactions are made by honest citizens, and which by militants? What about just blocking transactions containing the letters “isis”? This seems like a pretty simple algorithm, but think about the amount of data flowing through it, and suddenly, it picks up the air of ritual- we have a magic incantation that keeps us from processing transactions for militants.
Using algorithms and decision-support systems isn’t bad. It’s not even bad if they’re complicated! They’re solving a complicated problem, and we’d expect the resulting system to reflect at least some of that complexity. A recent conference hosted at NYU Law spent time discussing how we could actually avoid biases in policing by using well-designed algorithms, despite also pointing out the risks and dangers to human rights. These sorts of decision-making tools can make things better- or worse. They're just a tool.
A few years ago, I stumbled across an ancient book on expert systems, because I find reading technology books from the 1980s enlightening. One of the points the book stressed was that an expert system/decision support system had two jobs: the first was to make a decision within its area of expertise, but the second was to be able to explain its reasoning. The output isn’t useful if it can’t be justified.
Imagine, if you will, a toddler tagging along at the hem of the Pythia’s ceremonial robes. As the priestess goes through the ritual, this toddler interrupts at each step, to ask, “Why?” The Oracle Effect is, at its core, an appeal to authority fallacy in a funny hat. The antidote, then, is to refuse to accept the statements of authorities, no matter how fancy their ritual is. It’s not enough to say, “oh, well, the software should be open source, then!”, because that’s frankly neither necessary or sufficient. Decision-support systems can feel free to keep their inner workings secret, so long as they can also provide a convincing argument to justify their conclusions. And we know all too well that merely reviewing the code of a complicated system is not enough to understand how the system actually operates on real-world data.
Many of our readers write un-sexy, line-of-business applications, below the waterline of the “Software Development Iceberg” (for all the Googles and Microsofts and Facebooks, most software development happens internal to companies that don’t make or sell software, to solve their internal business problems). These applications generate data and reports that will be used to make decisions. And this brings us to the call-to-action portion of this soapbox. Work against the Oracle Effect by building software systems that do not provide conclusions, but arguments. Resist throwing some KPIs on a dashboard without working with your users to understand how this feeds into their decision-making. Keep in mind how your software is going to be used, and make sure its value, its reasoning is transparent, even if your code is proprietary. And make sure the software products you use can also answer the important question about every step of their process: Why?