-
Forecasts are noisy. Professional forecasters offer highly variable predictions about likely sales of a new product, likely growth in the unemployment rate, the likelihood of bankruptcy for troubled companies, and just about everything else. Not only do they disagree with each other, but they also disagree with themselves. For example, when the same software developers were asked on two separate days to estimate the completion time for the same task, the hours they projected differed by 71%, on average.
First, assume that your first estimate is off the mark. Second, think about a few reasons why that could be. Which assumptions and considerations could have been wrong? Third, what do these new considerations imply? Was the first estimate rather too high or too low? Fourth, based on this new perspective, make a second, alternative estimate.
-
Judges have been found more likely to grant parole at the beginning of the day or after a food break than immediately before such a break. If judges are hungry, they are tougher.
A study of thousands of juvenile court decisions found that when the local football team loses a game on the weekend, the judges make harsher decisions on Monday (and, to a lesser extent, for the rest of the week). Black defenders disproportionately bear the brunt of that increased harshness. A different study looked at 1.5 million judicial decisions over three decades and similarly found that judges are more severe on days that follow a loos by the local city's football team than they are on days that follow a win.
A study of six million decisions made by judges in France over twelve years found that defendants are given more leniency on their birthday. (The defendant's birthday, that is; we suspect that judges might be more lenient on their own birthdays as well, but as far as we know, that hypothesis has not been tested.) Even something as irrelevant as outside temperature can influence judges. A review of 207,000 immigration court decisions over four years found a significant effect of daily temperature variations: when it is hot outside, people are less likely to get asylum. If you are suffering political persecution in your home country and want asylum elsewhere, you should hope and maybe even pray that your hearing falls on a cool day.
-
Suppose that a small group consisting of, say, ten people is deciding whether to adopt some bold new initiative. If one or two advocates speak first, they might well shift the entire room in their preferred direction. The same is true if skeptics speak first. At least this is so if people are influenced by one another - and they usually are. For this reason, otherwise similar groups might end up making very different judgments simply because of who spoke first and initiated the equivalent of early downloads. The popularity of "Best Mistakes" and "I Am Error" have close analogues in professional judgments of all kinds. And if groups do not hear the analogue to the popularity rankings of such songs - loud enthusiasm, say, for that bold initiative - the initiative might not go anywhere, simply because those who supported it did not voice their opinion.
The model-of-the-judge studies reinforce Meehl's conclusion that the subtlety is largely wasted. Complexity and richness do not generally lead to more accurate prediction.
-
In short, replacing you with a model of you does two things: it eliminates your pattern noise. The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment - when they exist - are generally not sufficient to compensate for the detrimental effects of noise. You may believe that you are subtler, more insightful, and more nuanced than the linear caricature of your thinking. But in fact, you are mostly noisier.
People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted - usually they do not add to the accuracy of simple models.
-
The easiest way to aggregate several forecasts is to average them. Averaging is mathematically guaranteed to reduce noise: specifically, it divides it by the square root of the number of judgment s averaged. This means that if you average one hundred judgments, you will reduce noise by 90%, and if you average four hundred judgments, you will reduce it by 95% - essentially eliminating it. This statistical law is the engine of the wisdom-of-crowds approach.
Because averaging does nothing to reduce bias, its effect on total error (MSE) depends on the proportions of bias and noise in it. This is why the wisdom of crowds works best when judgments are independent, and therefore less likely to contain shared biases. Empirically, ample evidence suggests that averaging multiple forecasts greatly increases accuracy, for instance in the "consensus" forecast of economic forecasters of stock analysts. With respect to sales forecasting, weather forecasting, and economic forecasting, the unweighted average of a group of forecasters outperforms most and sometimes all individual forecasts. Averaging forecasts obtained by different methods has the same effect: in an analysis of thirty empirical comparisons in diverse domains, combined forecasts reduced errors by an average of 12.5%
-
Another formal process for aggregating diverse views is known as the Delphi method. In its classic form, this method involves multiple rounds during which the participants submit estimates (or votes) to a moderator and remain anonymous to one another. At each new round, the participants provide reasons for their estimates and respond to the reasons given by others, still anonymously. The process encourages estimates to converge (and sometimes forces them to do so by requiring new judgments to fall within a specific range of the distribution of previous-round judgments). The method benefits both from aggregation and social learning.
The Delphi method has worked well in many situations, but it can be challenging to implement. A simpler version, mini-Delphi, can be deployed within a single meeting. Also called estimate-talk-estimate, it requires participants first to produce separate (and silent) estimates, then to explain and justify them, and finally to make a new estimate in response to the estimates and explanations of others. the consensus judgment si the average of the individual estimates obtained in the second round.
-
In short, doctors are significantly more likely to order cancer screenings early in the morning than late in the afternoon. In a large sample, the order rates of breast and colon screening tests were highest at 8 am, at 63.7%. They decreased throughout the morning to 48.7% at 11 am. They increased to 56.2% at noon - and then decreased to 47.8% at 5 PM. It follows that patients with appointment times later in the day were less likely to receive guideline-recommended cancer screening.
How can we explain such findings? A possible answer is that physicians almost inevitably run behind in clinic after seeing patients with complex medical problems that require more than the usual twenty-minute slot.
|
|