Bayes Theorem (Key)¶

Bayes' Theorem gives us a way to invert conditional probabilities. The formula comes from the definition of conditional probability

$$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$$

this implies the following

$$P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)$$

Solving for $P(A|B)$ we get

$$P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}$$

Though this is the final form, in practice you will need to compute $P(B)$ using the following

$$P(B) = P(B|A)P(A) + P(B|\overline{A})P(\overline{A})$$

which says the probability of $B$ is the sum of the probability of $B$ intersect $A$ and $B$ intersect not $A$. ($A$ is either true or false so these are the only two options)

Exercise 1: Plot a Venn Diagram¶

Using matplotlib, draw a simple Venn diagram representing two sets $A$, $B$ with a non-null intersection.

InĀ [33]:
!pip install matplotlib_venn
# run this and then delete the cell so it doesn't get saved
Requirement already satisfied: matplotlib_venn in ./env/lib/python3.11/site-packages (1.1.1)
Requirement already satisfied: matplotlib in ./env/lib/python3.11/site-packages (from matplotlib_venn) (3.9.2)
Requirement already satisfied: numpy in ./env/lib/python3.11/site-packages (from matplotlib_venn) (2.1.0)
Requirement already satisfied: scipy in ./env/lib/python3.11/site-packages (from matplotlib_venn) (1.14.1)
Requirement already satisfied: contourpy>=1.0.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (1.3.0)
Requirement already satisfied: cycler>=0.10 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (4.53.1)
Requirement already satisfied: kiwisolver>=1.3.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (1.4.5)
Requirement already satisfied: packaging>=20.0 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (24.1)
Requirement already satisfied: pillow>=8 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (3.1.4)
Requirement already satisfied: python-dateutil>=2.7 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in ./env/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->matplotlib_venn) (1.16.0)

[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
InĀ [34]:
# Solution from google search embedded generative-AI

import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn3

# For a 2-set Venn diagram
set1 = {1, 2, 3, 4, 5,10,20}
set2 = {3, 4, 5,6,7}

venn2([set1, set2], ('Set 1', 'Set 2'))
plt.show()
No description has been provided for this image

Exercise 2: Compute Bayes' Probabilities¶

We want to replicate the computation carried out in class. If a doctor performs a test that has a given accuracy, for a disease with a given incidence rate, determine the probability that a randomly selected person with a positive test result has the disease. You are given accuracy and incidence as input, both in the range $(0,1]$

InĀ [35]:
def get_bayes_probability(acc, inc):
    prob_pos_given_sick = acc
    prob_pos_given_not_sick = 1-acc
    prob_sick = inc
    prob_not_sick = 1-inc
    prob_pos = prob_pos_given_sick * prob_sick + \
               prob_pos_given_not_sick * prob_not_sick
    bayes = (prob_pos_given_sick)*prob_sick / prob_pos
    return bayes

Check some results below. The first one comes from class

InĀ [36]:
get_bayes_probability(0.97,0.001)
Out[36]:
0.031351001939237205
InĀ [37]:
get_bayes_probability(0.97,0.01)
Out[37]:
0.24619289340101508
InĀ [38]:
get_bayes_probability(0.97,0.1)
Out[38]:
0.7822580645161289
InĀ [39]:
get_bayes_probability(0.99,0.001)
Out[39]:
0.09016393442622944
InĀ [40]:
get_bayes_probability(0.50,0.001)
Out[40]:
0.001

Exercise 3: Plot¶

You will create two plots in the section. For a fixed incidence rate, plot the bayes probability as the accuracy of the test ranges from 0 to 100%.

Then, for a fixed accuracy, plot the bayes probability as the incidence rate increases.

Note, to avoid 1/0 errors you'll probably want to not go all the way to 0 or 1.

State a conclusion about the results. What's the correlation? What do you observe? What do you think about accuracy measures for tests now?

Hint: create two arrays X,Y (python lists) of the same length containing the X values in one array and the Y values in another. List comprehensions are the best way to do this in python, though a for loop is fine too (append to an initially empty list)

then use plt.plot(X,Y)

InĀ [41]:
from matplotlib import pyplot as plt
InĀ [42]:
acc = 0.97
X = [i/100.0 for i in range(1,100)]
Y = [get_bayes_probability(acc, inc/100) for inc in range(1,100)]
InĀ [43]:
plt.plot(X,Y);
No description has been provided for this image
InĀ [44]:
inc = 0.97
X = [i/100.0 for i in range(1,100)]
Y = [get_bayes_probability(acc/100, inc) for acc in range(1,100)]
InĀ [45]:
plt.plot(X,Y);
No description has been provided for this image

Now go back and beautify your plots. Add a title and a legend. Some axis labels. Maybe read about matplotlib styles and change up the colors. Try a different type of plot. Just experiment some. Results below.

InĀ [46]:
plt.title("Bayes Probability as a function of Incidence")
plt.xlabel("Incidence Rate of Infection")
plt.ylabel("Prob of having disease")
plt.plot(X,Y)
Out[46]:
[<matplotlib.lines.Line2D at 0x7d9932f8d010>]
No description has been provided for this image
InĀ [47]:
# from https://www.dunderdata.com/blog/view-all-available-matplotlib-styles

import matplotlib.pyplot as plt
import numpy as np

x = np.arange(-2, 8, .1)
y = .1 * x ** 3 - x ** 2 + 3 * x + 2

fig = plt.figure(dpi=100, figsize=(10, 20), tight_layout=True)
available = ['default'] + plt.style.available
for i, style in enumerate(available):
    with plt.style.context(style):
        ax = fig.add_subplot(10, 3, i + 1)
        ax.plot(x, y)
    ax.set_title(style)
No description has been provided for this image
InĀ [48]:
with plt.style.context("fivethirtyeight"):
    plt.title("Bayes Probability as a function of Incidence")
    plt.xlabel("Incidence Rate of Infection")
    plt.ylabel("Prob of having disease")
    plt.plot(X,Y)
No description has been provided for this image
InĀ [49]:
with plt.style.context("ggplot"):
    plt.title("Bayes Probability as a function of Incidence")
    plt.xlabel("Incidence Rate of Infection")
    plt.ylabel("Prob of having disease")
    plt.plot(X,Y)
No description has been provided for this image