maths

Bayes’ Rule Demystified: Five Intuitive Perspectives

Different Ways to Look at Bayes' Rule

Example

Consider the following scenario:

The probability that a person has a disease is 0.01.
The probability that a person with the disease tests positive is 0.90.
The probability that a person without the disease tests positive is 0.10.

Question: What is the probability that a person who tests positive actually has the disease?

First Approach: Counting

Assuming 1,000 people are tested:

\(1000 \times 0.01 = 10\) people have the disease.
\(1000 \times 0.99 = 990\) people do not have the disease.
Of the 10 people with the disease, \(10 \times 0.90 = 9\) people test positive.
Of the 990 people without the disease, \(990 \times 0.10 = 99\) people test positive.
Therefore, out of \(9 + 99 = 108\) people who tested positive, only 9 people actually have the disease.

Thus, the probability that a person who tests positive actually has the disease is:

\[\frac{9}{108} \approx 0.0833.\]

More Resources:

Frequency diagrams: A first look at Bayes

Second Approach: Using Bayes' Rule

Bayes' Rule states:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}.\]

Where:

\(A\) is the event that a person has the disease.
\(B\) is the event that a person tests positive.
\(P(A) = 0.01\) is the prior probability of having the disease.
\(P(B|A) = 0.90\) is the probability of testing positive given that the person has the disease.
\(P(B)\) is the total probability of testing positive.

We calculate \(P(B)\) using the law of total probability:

\[P(B) = P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A) = (0.90)(0.01) + (0.10)(0.99) = 0.009 + 0.099 = 0.108.\]

Therefore:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} = \frac{(0.90)(0.01)}{0.108} \approx 0.0833.\]

This result matches the counting method.

Proof of Bayes' Rule

Starting from the definition of conditional probability:

\[P(A|B) = \frac{P(A \cap B)}{P(B)}.\]

Similarly:

\[P(B|A) = \frac{P(A \cap B)}{P(A)} \implies P(A \cap B) = P(B|A) \cdot P(A).\]

Substituting back:

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}.\]

Third Approach: Odds Form of Bayes' Rule

Bayes' Rule can also be expressed in terms of odds:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{P(A)}{P(\neg A)} \cdot \frac{P(B|A)}{P(B|\neg A)}.\]

Here, the posterior odds equal the prior odds multiplied by the likelihood ratio.

Using our example:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{0.01}{0.99} \cdot \frac{0.90}{0.10} = \frac{0.009}{0.099} = \frac{9}{99} = \frac{1}{11}.\]

Thus, the odds of having the disease given a positive test are \(1:11\). To find the probability:

\[P(A|B) = \frac{\text{odds}}{1 + \text{odds}} = \frac{1/11}{1 + 1/11} = \frac{1}{12} \approx 0.0833.\]

Note: Since \(P(A|B) + P(\neg A|B) = 1\), we have:

\[\frac{P(A|B)}{P(\neg A|B)} = \frac{P(A|B)}{1 - P(A|B)}.\]

Solving for \(P(A|B)\):

\[P(A|B) = \frac{\text{odds}}{1 + \text{odds}}.\]

The odds form is convenient when comparing relative probabilities and can simplify calculations by focusing on ratios.

More Resources:

Fourth Approach: Eliminating Probability Mass

Using the same example, we construct the following table:

	Disease	No Disease	Total
Test Positive	9	99	108
Test Negative	1	891	892
Total	10	990	1,000

Dividing each entry by 1,000 to get probabilities:

	Disease	No Disease	Total
Test Positive	0.009	0.099	0.108
Test Negative	0.001	0.891	0.892
Total	0.01	0.99	1

By focusing on the "Test Positive" row, we eliminate the probability mass associated with "Test Negative":

	Disease	No Disease	Total
Test Positive	0.009	0.099	0.108

The probability that a person who tests positive actually has the disease is:

\[\frac{0.009}{0.108} \approx 0.0833.\]

This method mirrors the counting approach and offers another perspective on Bayes' Rule.

More Resources:

Belief revision as probability elimination

Fifth Approach: Log Odds

The log odds form of Bayes' Rule is:

\[\log \frac{P(A|B)}{P(\neg A|B)} = \log \frac{P(A)}{P(\neg A)} + \log \frac{P(B|A)}{P(B|\neg A)}.\]

This transforms multiplication into addition, which can simplify calculations, especially when dealing with multiple pieces of evidence. For example, each bit of evidence doubles the odds in favor of an event.

Notes:

In information theory, the Shannon information content of an outcome \(x\) is defined as:
\[h(x) = \log_2 \left( \frac{1}{P(x)} \right).\]
For example, if \(P(x) = 0.5\), then \(h(x) = 1\) bit.
The difference in self-information between two outcomes is:
\[I(x_1, x_2) = h(x_1) - h(x_2) = \log_2 \left( \frac{P(x_2)}{P(x_1)} \right).\]

More Resources:

Bayes' Rule: Log-odds form

Other References

Share Likelihood Ratios, Not Posterior Beliefs

In future blog posts, I will explore the fascinating implications of Bayes' Rule, including its applications in machine learning, scientific reasoning, and everyday decision making. Stay tuned as we delve deeper into how this fundamental principle shapes our understanding of probability and inference.

Bayes’ Rule Demystified: Five Intuitive Perspectives

Example

First Approach: Counting

Second Approach: Using Bayes' Rule

Proof of Bayes' Rule

Third Approach: Odds Form of Bayes' Rule

Fourth Approach: Eliminating Probability Mass

Fifth Approach: Log Odds

Other References

Read next

[Write to Learn Series] The Heart and Soul Of Linear Algebra: Replacement Theorem

Reading Math Symbols

Math and Machine Learning Resources for Self Learner

Comments ()

Example

First Approach: Counting

Second Approach: Using Bayes' Rule

Proof of Bayes' Rule

Third Approach: Odds Form of Bayes' Rule

Fourth Approach: Eliminating Probability Mass

Fifth Approach: Log Odds

Other References

Read next

Comments ( )

Comments ()