That’s not being perverse. It’s about pausing when “things seem too good to be true” and taking time out to ask if “there has probably been a mistake”. (Gunn, @ 2 mins)
[U]ntil the Romans ultimately removed the right of the Sanhedrin to confer death sentences, a defendant unanimously condemned by the judges would be acquitted [14, Sanhedrin 17a], the Talmud stating ‘If the Sanhedrin unanimously find guilty, he is acquitted. Why? — Because we have learned by tradition that sentence must be postponed till the morrow in hope of finding new points in favour of the defence’.
That practice could be interpreted as the Jewish judges being intuitively aware that suspicions about the process should be raised if the final result appears too perfect . . .
[I]f too many judges agree, the system has failed and should not be considered reliable. (Gunn et al 2016)
Or even more simply,
They intuitively reasoned that when something seems too good to be true, most likely a mistake was made. (Zyga, 2016)
The opening quotation above is from a footnote to a chapter by Gregory Doudna in a newly published volume in honour of Thomas L. Thompson, Biblical Narratives, Archaeology & Historicity: Essays in Honour of Thomas L. Thompson. Doudna’s footnote continues:
I thought of what I have come to call Thompson’s Rule when I encountered this scientific study showing that, as counterintuitive as it sounds, unanimous agreement actually does reduce confidence of correctness in conclusions in a wide variety of disciplines (Gunn et al. 2016).
The paper by Gunn and others is Too good to be true: when overwhelming evidence fails to convince. The argument of the paper (with my bolding in all quotations):
Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith; however, rarely is consideration given to whether a systemic failure has occurred. Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence and (iii) cryptographic primality testing. In this paper, we investigate the effects of small error rates in a set of measurements or observations. We find that even with very low systemic failure rates, high confidence is surprisingly difficult to achieve . . . .
Sometimes as we find more and more agreement we can begin to lose confidence in those results. Gunn begins with a simple example in a presentation he gave in 2016 (link is to youtube video). Here is the key slide:
With a noisy voltmeter attempting to measure a very small voltage (nanovoltage) one would expect some variation in each attempted measurement. Without the variation, we can conclude something is wrong rather than that we have a precise measurement.
The recent Volkswagen scandal is a good example. The company fraudulently programmed a computer chip to run the engine in a mode that minimized diesel fuel emissions during emission tests. But in reality, the emissions did not meet standards when the cars were running on the road. The low emissions were too consistent and ‘too good to be true.’ The emissions team that outed Volkswagen initially got suspicious when they found that emissions were almost at the same level whether a car was new or five years old! The consistency betrayed the systemic bias introduced by the nefarious computer chip. (Zyga 2016)
Then there was the Phantom of Heilbronn or the serial killer “Woman Without a Face“. Police spent eight to fifteen years searching for a woman whom DNA connected to 40 crime scenes (murders to burglaries) in France, Germany and Austria. Her DNA was identified at six murder scenes. A three million euro reward was offered. It turned out that the swabs used to collect the DNA from the crime scenes had been inadvertently contaminated at their production point by the same woman.
Consider, also, election results. What do we normally suspect when we hear of a dictator receiving over 90% of the vote?
We have all encountered someone who has argued that “all the evidence” supports their new pet hypothesis to explain, say, Christianity’s origins. I have never been able to persuade them, as far as I know, that reading “all the evidence” with a bias they either cannot see or think is entirely valid.
Ironically, scholars like Bart Ehrman who attempt to deny a historical and even slightly significant “Jesus myth” view among scholars are doing their case a disservice. By insisting that there is and that there has been no valid or reasonable contrary view ever raised, such scholars are undermining confidence in the case for the historicity of Jesus. If they could accept the challenges from serious thinkers over the past near two centuries, and acknowledge the ideological pressure inherent in “biblical studies” for academics to conform within certain parameters of orthodox faith, then they could begin to not look quite so like those politicians who claim 90% of the vote, or like those police chasing a phantom woman serial killer for eight years across Europe, of the dishonest VW executives . . . .
Here’s another interesting application: dangerous and utterly counterintuitive….
The researchers demonstrated the paradox in the case of a modern-day police line-up, in which witnesses try to identify the suspect out of a line-up of several people. The researchers showed that, as the group of unanimously agreeing witnesses increases, the chance of them being correct decreases until it is no better than a random guess.
In police line-ups, the systemic error may be any kind of bias, such as how the line-up is presented to the witnesses or a personal bias held by the witnesses themselves. Importantly, the researchers showed that even a tiny bit of bias can have a very large impact on the results overall. Specifically, they show that when only 1% of the line-ups exhibit a bias toward a particular suspect, the probability that the witnesses are correct begins to decrease after only three unanimous identifications. Counterintuitively, if one of the many witnesses were to identify a different suspect, then the probability that the other witnesses were correct would substantially increase.
The mathematical reason for why this happens is found using Bayesian analysis, which can be understood in a simplistic way by looking at a biased coin. If a biased coin is designed to land on heads 55% of the time, then you would be able to tell after recording enough coin tosses that heads comes up more often than tails. The results would not indicate that the laws of probability for a binary system have changed, but that this particular system has failed. In a similar way, getting a large group of unanimous witnesses is so unlikely, according to the laws of probability, that it’s more likely that the system is unreliable.
The researchers say that this paradox crops up more often than we might think. Large, unanimous agreement does remain a good thing in certain cases, but only when there is zero or near-zero bias. Abbott gives an example in which witnesses must identify an apple in a line-up of bananas—a task that is so easy, it is nearly impossible to get wrong, and therefore large, unanimous agreement becomes much more likely.
Removing the bias:
On the other hand, a criminal line-up is much more complicated than one with an apple among bananas. Experiments with simulated crimes have shown misidentification rates as high as 48% in cases where the witnesses see the perpetrator only briefly as he runs away from a crime scene. In these situations, it would be highly unlikely to find large, unanimous agreement. But in a situation where the witnesses had each been independently held hostage by the perpetrator at gunpoint for a month, the misidentification rate would be much lower than 48%, and so the magnitude of the effect would likely be closer to that of the banana line-up than the one with briefly seen criminals. (Zyga)
Doudna, Gregory L. “Is Josephus’s John the Baptist Passage a Chronologically Dislocated Passage of the Death of Hyrcanus?” In Biblical Narratives, Archaeology and Historicity: Essays In Honour of Thomas L. Thompson, edited by Lukasz Niesiolowski-Spanò and Emanuel Pfoh, 119–37. Library of Hebrew Bible / Old Testament Studies. New York: T&T Clark, 2020.
Ellis, David. “Overwhelming Evidence? It’s Probably a Bad Thing,” Phys.org, January 12, 2016. https://phys.org/news/2016-01-overwhelming-evidence-bad.html.
Gunn, Lachlan J., François Chapeau-Blondeau, Mark D. McDonnell, Bruce R. Davis, Andrew Allison, and Derek Abbott. “Too Good to Be True: When Overwhelming Evidence Fails to Convince.” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 472, no. 2187 (March 31, 2016): 20150748. https://doi.org/10.1098/rspa.2015.0748.
— Too Good To Be True: When Bayes Transforms Abundant Success to Abject Failure. University of Adelaide, 2016. https://www.youtube.com/watch?v=Uz6xUjJHTII.
Zyga, Lisa. “Why Too Much Evidence Can Be a Bad Thing.” Phys.org, January 4, 2016. https://phys.org/news/2016-01-evidence-bad.html.
If you enjoyed this post, please consider donating to Vridar. Thanks!