Bayes’ Rule Demystified: Five Intuitive Perspectives

Bayes’ Rule Demystified: Five Intuitive Perspectives
Different Ways to Look at Bayes' Rule

Example

Consider the following scenario:

  • The probability that a person has a disease is 0.01.
  • The probability that a person with the disease tests positive is 0.90.
  • The probability that a person without the disease tests positive is 0.10.

Question: What is the probability that a person who tests positive actually has the disease?

First Approach: Counting

Assuming 1,000 people are tested:

  • \(1000 \times 0.01 = 10\) people have the disease.
  • \(1000 \times 0.99 = 990\) people do not have the disease.
  • Of the 10 people with the disease, \(10 \times 0.90 = 9\) people test positive.
  • Of the 990 people without the disease, \(990 \times 0.10 = 99\) people test positive.
  • Therefore, out of \(9 + 99 = 108\) people who tested positive, only 9 people actually have the disease.

Thus, the probability that a person who tests positive actually has the disease is:

\[\frac{9}{108} \approx 0.0833.\]

More Resources:

Second Approach: Using Bayes' Rule

Bayes' Rule states:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}.\]

Where:

  • \(A\) is the event that a person has the disease.
  • \(B\) is the event that a person tests positive.
  • \(P(A) = 0.01\) is the prior probability of having the disease.
  • \(P(B|A) = 0.90\) is the probability of testing positive given that the person has the disease.
  • \(P(B)\) is the total probability of testing positive.

We calculate \(P(B)\) using the law of total probability:

\[P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) = (0.90)(0.01) + (0.10)(0.99) = 0.009 + 0.099 = 0.108.\]

Therefore:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} = \frac{(0.90)(0.01)}{0.108} \approx 0.0833.\]

This result matches the counting method.

Proof of Bayes' Rule

Starting from the definition of conditional probability:

\[P(A|B) = \frac{P(A \cap B)}{P(B)}.\]

Similarly:

\[P(B|A) = \frac{P(A \cap B)}{P(A)} \implies P(A \cap B) = P(B|A) \cdot P(A).\]

Substituting back:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}.\]

Third Approach: Odds Form of Bayes' Rule

Bayes' Rule can also be expressed in terms of odds:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{P(A)}{P(\neg A)} \cdot \frac{P(B|A)}{P(B|\neg A)}.\]

Here, the posterior odds equal the prior odds multiplied by the likelihood ratio.

Using our example:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{0.01}{0.99} \cdot \frac{0.90}{0.10} = \frac{0.009}{0.099} = \frac{9}{99} = \frac{1}{11}.\]

Thus, the odds of having the disease given a positive test are \(1:11\). To find the probability:

\[P(A|B) = \frac{\text{odds}}{1 + \text{odds}} = \frac{1/11}{1 + 1/11} = \frac{1}{12} \approx 0.0833.\]

Note: Since \(P(A|B) + P(\neg A|B) = 1\), we have:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{P(A|B)}{1 - P(A|B)}.\]

Solving for \(P(A|B)\):

\[P(A|B) = \frac{\text{odds}}{1 + \text{odds}}.\]

The odds form is convenient when comparing relative probabilities and can simplify calculations by focusing on ratios.

More Resources:

Fourth Approach: Eliminating Probability Mass

Using the same example, we construct the following table:

Disease No Disease Total
Test Positive 9 99 108
Test Negative 1 891 892
Total 10 990 1,000

Dividing each entry by 1,000 to get probabilities:

Disease No Disease Total
Test Positive 0.009 0.099 0.108
Test Negative 0.001 0.891 0.892
Total 0.01 0.99 1

By focusing on the "Test Positive" row, we eliminate the probability mass associated with "Test Negative":

Disease No Disease Total
Test Positive 0.009 0.099 0.108

The probability that a person who tests positive actually has the disease is:

\[\frac{0.009}{0.108} \approx 0.0833.\]

This method mirrors the counting approach and offers another perspective on Bayes' Rule.

More Resources:

Fifth Approach: Log Odds

The log odds form of Bayes' Rule is:

\[\log \frac{P(A|B)}{P(\neg A|B)} = \log \frac{P(A)}{P(\neg A)} + \log \frac{P(B|A)}{P(B|\neg A)}.\]

This transforms multiplication into addition, which can simplify calculations, especially when dealing with multiple pieces of evidence. For example, each bit of evidence doubles the odds in favor of an event.

Notes:

  • In information theory, the Shannon information content of an outcome \(x\) is defined as:

    \[h(x) = \log_2 \left( \frac{1}{P(x)} \right).\]

    For example, if \(P(x) = 0.5\), then \(h(x) = 1\) bit.
  • The difference in self-information between two outcomes is:

    \[I(x_1, x_2) = h(x_1) - h(x_2) = \log_2 \left( \frac{P(x_2)}{P(x_1)} \right).\]

More Resources:

Other References

In future blog posts, I will explore the fascinating implications of Bayes' Rule, including its applications in machine learning, scientific reasoning, and everyday decision making. Stay tuned as we delve deeper into how this fundamental principle shapes our understanding of probability and inference.