Do you trust software to make decisions for you? Would you trust it if you’d written the software yourself? Consider the case of Cade Massey, a professor at University of Pennsylvania’s Wharton School, and Rufus Peabody, a former student and professional sports gambler. Together, the two men have built a well-known system for predicting the outcomes of college and professional football games known as the Massey-Peabody rankings. Their picks rely purely on statistics – except the one day they chose to ignore the system. They shouldn’t have.
In October 2013, the University of Texas was due to play its rival, the University of Oklahoma. With Oklahoma beating Texas by an average of 40 points in the two preceding years, the line in Las Vegas pointed to a 14-point win for Oklahoma. The Massey-Peabody model, on the other hand, foresaw a less dramatic win – nine or 10 points instead of 14. Massey, a Texan, was familiar with his team’s capacity for heartbreak and insisted for the first time ever on overriding the system. But then the Longhorns won, 36 to 20.
Massey’s mistake illustrates a tendency that is becoming more relevant in the digital age, when algorithms recommend the books people buy, find the fastest routes through traffic, and tailor the stories users see on their social media feeds. Algorithms, in the broad sense of the word, have several advantages over humans when it comes to decision-making. Humans routinely consider factors that aren’t relevant and ignore those that are. They change how much they weight different variables over time and let emotion and hunches influence those changes. Algorithms aren’t perfect, either. But here’s the funny thing: Massey’s research shows that people tend to be much less tolerant of software errors than of their own flubs.
In a 2014 study, participants were asked to estimate how successful a group of MBA applicants would be later in life based on their test scores, interview and essay ratings, and work experience. All participants except those in a control group were allowed a so-called learning period. One group made their own judgments about 15 students’ future performance and got feedback about how accurate they were, a second group watched the performance of a statistical model, and the third group saw how both their own judgments and those of the statistical model stacked up against real life.
In the next set of 10 trials, researchers told the participants that they would be paid $1 each time their prediction was within 5 percent of the real outcome and asked whether they wanted to use the statistical model or rely on their own judgment. In both the control group and the group that saw feedback on their own estimates, two-thirds of participants used the model. In the two groups that saw the model at work, however, only about one-fourth of participants used it.
Is that because the model was bad? Quite the contrary. The algorithm performed 15 percent better than humans, and those who eschewed it would have made 29 percent more money had they not done so. Survey questions later showed that participants didn’t lose faith in their own reasoning when they made bad predictions, but they stopped trusting the algorithm when it inevitably made some estimates that were wide of the mark. People who saw an algorithm make mistakes stopped believing in it.
Massey’s team found that people perceive themselves to be better than algorithms at picking out exceptions to a rule, learning from their mistakes, and improving over time. In a second experiment, they decided to investigate whether people would warm up to algorithms if they had some control over them. They asked more than 800 participants to estimate high school students’ performance on a standardized math test given certain information about the students. This time, participants learned up front that the model’s predictions were off by 17.5 percentage points on average.
The study then grouped participants into four groups – a control group that would have to accept the algorithm’s predictions as they were if they chose to use it and groups that could adjust the model’s conclusions up or down by two percentage points, five percentage points, or 10 percentage points – and asked whether they wanted the algorithm’s help. Only 47 percent of people in the group that couldn’t change the algorithm’s recommendations agreed to use it, compared to 71 percent of those who could adjust by two or five percentage points and 68 percent of those who could adjust by 10 points.
What does it all mean? Plenty of research shows that statistical models are more accurate than gut-level judgments, but that doesn’t do much good if people are dismissive of models that aren’t 100 percent accurate. Allowing human beings to feel a small level of control may help them feel more comfortable with relying on statistical models – even those with an acknowledged capacity for error – over their gut instincts. The reason Massey’s research is so important is that impartial algorithms have the potential to help solve serious social problems such as eliminating bias in hiring and college admissions. Other academic research shows that hiring officers tend to rate applicants who are more like them – those who went to similar schools, have similar hobbies, and yes, those who are the same race or gender as they are – more positively than applicants who are different in major ways. That’s one mistake a machine wouldn’t make.