Cornfield’s inequality Causation v.s. Correlation

In the last post, I have shared about a debate:

Today, I would like to show the proof on Corenfield’s Inequality, an inequality that settled the debate. I have read the original proof on the paper. Although the author said it is obvious, I don’t think so (always find reading maths discouraging LOL).

I spent some time understanding the first paragraph. Maybe it would be better to show you the causal diagram:

Think of A as smoking, D as lung cancer, and B as the confounder. Much better now.

The next difficulty is understanding R1 and R2. Why they are expressed like that? In the beginning, I used some complicated steps to reproduce the formula, like the chain rule of probability, conditional independence (like the steps in this post). But, after I look at the above diagram, I start to realize the “obvious” way.

You want to probability of D given A, but there is no direct link between the two. So, you can make use of B. P(B|A)*P(D|B) is one of the ways to link A and D. There is another way, P(B’|A)*P(D|B’). So the sum of the two gives P(D|A). Similarly for P(D|A’). After realizing that, the following steps are “obvious”. You just need some algebraic steps to complete the proof.