Superforecasting - The Art & Science of Prediction
by Philip Tetlock and Dan Gardner, 2015Buy it on Amazon
Table of Contents
- Chapter 1: An Optimistic Skeptic
- Chapter 2: The Illusion of Knowledge
- Chapter 3: Keeping Score
- Chapter 4: Superforecasters
- Chapter 5: Supersmart?
- Chapter 6: Superquant?
- Putting it all together
This book describes the practices and skills of Superforecasters, people who absorb knowledge from many sources, crunch it again and again and synthesize it all in one prediction: most of the time they’ll be right.
Chapter 1: An Optimistic Skeptic
Forecasting is common: all humans use this skill for orderly, clock-like things like “at what time will the time rise do so?” and chaotic, cloud-like things like “what will happen to the world economy”?
So is the future really predictable? This is a false dichotomy. We live in a world of clocks and clouds, and a vast jumble of other metaphors.
Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.
Chapter 2: The Illusion of Knowledge
Randomized medicine trials are today common and seem stunningly obvious.
Yet, before the 20th century they were the exception, not the rule: the doctors’ inclination to believing their treatments were working, meant strong evidence and rigorous experimentation were unthinkable. A clear example of System 1 Confirmation Bias.
Medicine had never been scientific, due to the lack of doubt in the doctor’s mind; in a way, doubt it’s what propels science forward.
“Doubt is not a fearful thing,” the physicist Richard Feynman observed, “but a thing of very great value. (…) it is of paramount important, in order to make progress, that we recognise ignorance and doubt. Because we have the doubt, we then propose looking in new directions for new ideas.”
Thinking, Fast and Slow
Daniel Kahneman, author of “Thinking, Fast and Slow”, proposed a “Two Systems” model for human brains; whenever a decision has to be made, the following systems act:
- System 1 acts first and quickly; it’s what keeps us alive in dangerous situation when acting heuristically / instinctively is necessary;
- System 2 comes into play later; System 2 evaluates the decision taken by System 1 and, if it doesn’t hold to judgement, forbids it.
System 1 is a kind of tip-of-your-nose-perspective; its decision making process shows the following issues:
- Confirmation Bias: When reconciling System 1 and System 2 seems impossible, humans can grab on to the first plausible explanation without fact checking it. This “suspension of disbelief” is called confirmation bias; a poor way to make sense of a complex world, but a superb way to satisfy our brain’s desire for order.
- Bait and Switch: When faced with a very hard question that can’t be answered without more data, humans tend to replace it with a simpler one. This can cause us to jump to conclusions.
System 1 also has benefits:
- Snap Judgement: Learning from past experiences, System 1 can provide creative solutions and intuitions with just a few cues, a way of reasoning that would be beyond the reach of System 2 calculations.
Together, the pros and cons above mean that this tip-of-your-nose-perspective intuitions can fail as spectacularly as succeed. These issues can be mitigated by:
- Recognising whether we are operating an environment with enough valid cues (e.g. a house on fire) or not (e.g. the stock market);
- Double-checking our intuitions again and again, when we have time to make a decision.
Similarly, there is today little science, let alone experimentation in forecasting; the outcomes of these bad forecasts can be harder to pinpoint, but our world perspectives are led by them astray to missed opportunities, monetary losses, even death and war.
Luckily, we know there is a healthy solution: a spoonful of doubt.
Chapter 3: Keeping Score
To find the best methods for forecasting, we need to objectively squeeze out as much ambiguity as humanly possible in our evaluations, both of the forecasts and the forecasters.
Forecasts need to provide clarity on the expected outcomes and their timelines.
In most cases, forecasts are presented without either; this way, whenever forecasts are made, they are impossible to evaluate, making forecasters unaccountable whenever their prediction results wrong.
Forecasters can this way stretch timelines and probability estimates to one side or another, making the prediction work for them, a “wrong-side-of-maybe” fallacy.
Sherman Kent recommends the following language when dealing with estimates:
- 100%: Certain
- 93% (+- 6%): Almost certain
- 75% (+- 12%): Probable
- 50% (+- 10%): Chances about even
- 30% (+- 10%): Probably not
- 7% (+- 5%): Almost certainly not
- 0%: Impossible
This enables estimators to “think about thinking”, a process known as metacognition, making the forecasters better at distinguishing finer degrees of uncertainty, but can be met with many cultural difficulties.
Since we can’t repeat history, it’s impossible for us to judge forecasters on one probabilistic forecast only; we’ll need to evaluate them on many probabilistic experiments, evaluating their “calibration”.
The Brier score is a score function measure for the accuracy of probabilistic predictions, providing better, lower scores to more accurate and confident evaluators; 0.0 representing omniscient knowledge and 0.5 monkey-like luck.
Yet, the goodness of a Brier score depends on the difficulty of the forecasts; a Brier score of 0.2 can be very good for a forecaster predicting very complex forecasts correctly, or very bad for one predicting obvious outcomes wrong. A level playing ground is required.
Hedgehogs and Foxes
In 2005, Philip Telock published the Expert Political Judgment(EPJ study showing forecasts, even the ones by experts, can often be as good, if not worse, than ‘a chimp throwing darts at a board’. It was the results of 20 years of collaboration with 284 professionals on roughly twenty-eight thousand forecasts, ranging from one to five to ten years out.
In the results, two spectrums of forecasters could be recognised:
- Hedgehogs: these were forecasters who organised their thinking around one Big Idea; they would advance their analysis with piles of reasons on why they were right and others were wrong and, committed to their conclusions, they’d reluctant to changing their mind, even when clearly wrong.
- Foxes: these were forecasters who drew on many analytical tools, gathering information from as many sources as possible, aggregating it and synthesizing it in one conclusion.
Some reverently call “the wisdom of crowds” the miracle of aggregation but it is easy to demystify. The key is recognizing that useful information is often dispersed widely, with one person possessing a scrap, another holding a more importance piece, a third having a few bits, and so on….they translate whatever information they had into a number….All the valid information pointed in one direction but the errors had different sources and pointed in different directions….How well aggregation works depends on what you are aggregating. Aggregating the judgments of many people who know nothing produces a lot of nothing……but aggregating the judgments of an equal number of people who know lots about lots of different things is most effective because the collective pool of information becomes much bigger….Now look at how foxes approaches forecasting. They deploy not one analytical idea but many and seek out information not from one source but many. Then they synthesize it all into a single conclusion. In a word, they aggregate……within one skull.
As mentioned above, this is a spectrum: in most cases nobody acted purely as a hedgehog or a fox; how people actually behaved depended a lot on the situation and forecasting question.
Yet, the EPJ study provides us with real insights into what an effective forecasters does.
Chapter 4: Superforecasters
In 2011, after the disastrous result of the wrong forecasts on the Iraq’s weapons of mass distruction, the American Intelligence Advanced Research Projects Activity (IARPA) sponsored a forecasting tournament, aiming to find best practices for forecasting that could be reused by the Intelligence Community (IC) at large.
Between September 2011 and 2015, the participating teams competed against each other and a control group, answering thousands of geopolitical questions on timeframes from a month to a year away. The questions were neither so easy most people could answer correctly nor so hard nobody on the planet could.
Over the next 4 years, the Good Judgment Project (GJP) team, led by Philip Tetlock outperformed all other teams, and ultimately won the competition. The approach was simple:
- They ranked their forecasters based on their Brier Score, and identified the top 2%
- Gathered forecasts from all of the Forecasters
- Averaged the forecasts, giving the top forecasters’ predictions extra weight
- Extremized the forecast by nudging it closer to 100% or 0%, to simulate a forecast made with all available information
But were the Superforecasters super-skilled or super-lucky? This is a false dichotomy: as most things in life, predicting always involves luck and skill.
Yet, from one year to another 70% of the Superforecasters stayed in the top 2%, with the Superforecasters group actually improving their lead on all other forecasters. The chance of such consistency arising in people competing on coin-flipping is 1 in 100,000,000, but for Superforecasters is 1 in 3.
We can hence get two conclusions:
- Superforecasters aren’t infallible
- Superforecasters aren’t just lucky
Which raised the big question, why are they so good?
Chapter 5: Supersmart?
Regular forecasters scored higher on intelligence and knowledge tests than 70% of the US population; Superforecasters scored higher than 80%. Yet, having the requisite inteligence and knowledge is not enough: it’s how you use them.
Outside then Inside
When making estimates or answering complex questions, humans tend to start with the tip-of-your-nose-perspective, with the risk of being swayed by a number that may have little or no meaning.
Kahneman and Tversky showed you could influence people’s judgement merely by exposing them to a number – any number, even on that is obviously meaningless.
To pick a good anchor, start with the Outside View and then the Inside View:
- Outside View: If available, look up the Base Rate for an event (how common something is within a broader class), e.g. what % of people have pets in America
- Inside View: What specifics in this particular forecasts increase or decrease it’s probability? What facts support this prediction being true or false?
Thesis, Antithesis, Synthesis
Once you have an outside view and inside view, they can be merged into a single prediction, but that’s not the end. Superforecasters treat their beliefs as hypothesis to be tested, not treasures to be guarded.
Superforecasters are Actively open-minded: in the interest of improving their predictions, they submit them to others’ who have different ideological orientation, geographic location and subject matter expertise.
Chapter 6: Superquant?
Most Superforecasters have a background in numeracy, but seldom use arcane math and complex calculations in their forecasts.
What seems to make a difference is their probabilistic thinking: the ability to understand the fine-grained differences in probabilities, like distinguishing a 60% and a 80% prediction.
Probability for the Stone Age
Back in the day, when humans didn’t need (or didn’t have time) for careful consideration, there were only three dials for dealing with the unknown: “gonna happen”, “not gonna happen” and “maybe”.
The confusion caused by these three settings dials is pervasive, as most people revert to it as often as possible, blurring the line between a 100% and 80% forecasts; only when probabilities that are closer to even, like 60% and 40%, come along they kind of get the idea.
Probability for the Information Age
Scientists relish, or at least accept, uncertainty, because they know certainty in models of reality can be illusory.
Scientific facts that look as solid as rock for one generation are crushed by the advances of the next. This means the Yes and No dials are flawed, as they express certainty, and must go. The only one that’s left is Maybe.
Yet, for it to be usable, we have to be more specific: “maybe” might mean 1%, 2%, 3%… or 10% 20% or 30%… or 80% 85% or 95%. The finer grained we are the better.
The data from the tournament backs up this practice; Superforecasters which provided more fine-grained estimates (20% vs 25% vs 30%) fared better than the ones with lower resolution (20% vs 30% vs 40%).
Putting it all together
- Other notes from https://www.amazon.com/gp/aw/review/B00Y78X7HY/R3Q2RDFGVUFXZ8
- Other notes from https://capitalideasonline.com/wordpress/superforecasting/
- Book in pdf https://www.edge.org/documents/Tetlock_Superforecasting/Tetlock_Gardner_Superforecasting_6.22.pdf