How Statistics Became a Model-blind Data-reduction Enterprise? Karl Pearson

In the last blog post, we have covered Francies Galton and his Galton Board. In this post, we will talk about Karl Peason.

Pearson was affected by Galton’s idea on correlation. He believes correlation is bigger than causation. Causation was reduced to nothing more than a special case of correlation. He said, “That a certain sequence has occurred and reoccurred in the past is a matter of experience of which we give expression in the concept causation … Science in no case can demonstrate any inherent necessity in a sequence, nor prove with absolute certainty that it must be repeated.” In his eye, causation is just a matter of repetition of some sequence of events (something happened then something else happens).

Even Pearson thinks only correlation is enough to do science, there are obvious cases where some correlations are nonsense. For example, there is a strong correlation between a nation’s per capita chocolate consumption and its number of Nobel Prize winners. This looks silly and the more reasonable explanation is that the wealth of a nation affects chocolate consumption and the probability of winning the Nobel Prize. But, this is a causal explanation, not correlation, which put Pearson in an awkward position.

The example given is now called confounding correlations. There is more “nonsense correlation”. For example, in time-series data, it is found that England’s mortality rate and percentage of marriages conducted in a year are highly correlated. Was God punishing marriage-happy people? Instead of finding why “nonsense correlation” occurs, they stopped with correlation. The author said it was a missed opportunity.

In the next blog post, we will talk about Sewall Wright, a man who challenged the causality-avoiding culture.