Bayes Theorem (Key)¶
Bayes' Theorem gives us a way to invert conditional probabilities. The formula comes from the definition of conditional probability
$$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$$
this implies the following
$$P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)$$
Solving for $P(A|B)$ we get
$$P(A|B) = \dfrac{P(B|A)P(A)}{P(B)}$$
Though this is the final form, in practice you will need to compute $P(B)$ using the following
$$P(B) = P(B|A)P(A) + P(B|\overline{A})P(\overline{A})$$
which says the probability of $B$ is the sum of the probability of $B$ intersect $A$ and $B$ intersect not $A$. ($A$ is either true or false so these are the only two options)
Exercise 1: Plot a Venn Diagram¶
Using matplotlib, draw a simple Venn diagram representing two sets $A$, $B$ with a non-null intersection.
!pip install matplotlib_venn
# run this and then delete the cell so it doesn't get saved
Requirement already satisfied: matplotlib_venn in ./env/lib/python3.11/site-packages (1.1.1) Requirement already satisfied: matplotlib in ./env/lib/python3.11/site-packages (from matplotlib_venn) (3.9.2) Requirement already satisfied: numpy in ./env/lib/python3.11/site-packages (from matplotlib_venn) (2.1.0) Requirement already satisfied: scipy in ./env/lib/python3.11/site-packages (from matplotlib_venn) (1.14.1) Requirement already satisfied: contourpy>=1.0.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (1.3.0) Requirement already satisfied: cycler>=0.10 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (4.53.1) Requirement already satisfied: kiwisolver>=1.3.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (1.4.5) Requirement already satisfied: packaging>=20.0 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (24.1) Requirement already satisfied: pillow>=8 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (10.4.0) Requirement already satisfied: pyparsing>=2.3.1 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (3.1.4) Requirement already satisfied: python-dateutil>=2.7 in ./env/lib/python3.11/site-packages (from matplotlib->matplotlib_venn) (2.9.0.post0) Requirement already satisfied: six>=1.5 in ./env/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->matplotlib_venn) (1.16.0) [notice] A new release of pip is available: 24.0 -> 24.2 [notice] To update, run: pip install --upgrade pip
# Solution from google search embedded generative-AI
import matplotlib.pyplot as plt
from matplotlib_venn import venn2, venn3
# For a 2-set Venn diagram
set1 = {1, 2, 3, 4, 5,10,20}
set2 = {3, 4, 5,6,7}
venn2([set1, set2], ('Set 1', 'Set 2'))
plt.show()
Exercise 2: Compute Bayes' Probabilities¶
We want to replicate the computation carried out in class. If a doctor performs a test that has a given accuracy, for a disease with a given incidence rate, determine the probability that a randomly selected person with a positive test result has the disease. You are given accuracy and incidence as input, both in the range $(0,1]$
def get_bayes_probability(acc, inc):
prob_pos_given_sick = acc
prob_pos_given_not_sick = 1-acc
prob_sick = inc
prob_not_sick = 1-inc
prob_pos = prob_pos_given_sick * prob_sick + \
prob_pos_given_not_sick * prob_not_sick
bayes = (prob_pos_given_sick)*prob_sick / prob_pos
return bayes
Check some results below. The first one comes from class
get_bayes_probability(0.97,0.001)
0.031351001939237205
get_bayes_probability(0.97,0.01)
0.24619289340101508
get_bayes_probability(0.97,0.1)
0.7822580645161289
get_bayes_probability(0.99,0.001)
0.09016393442622944
get_bayes_probability(0.50,0.001)
0.001
Exercise 3: Plot¶
You will create two plots in the section. For a fixed incidence rate, plot the bayes probability as the accuracy of the test ranges from 0 to 100%.
Then, for a fixed accuracy, plot the bayes probability as the incidence rate increases.
Note, to avoid 1/0 errors you'll probably want to not go all the way to 0 or 1.
State a conclusion about the results. What's the correlation? What do you observe? What do you think about accuracy measures for tests now?
Hint: create two arrays X,Y
(python lists) of the same length containing the X values in one array and the Y values in another. List comprehensions are the best way to do this in python, though a for loop is fine too (append to an initially empty list)
then use plt.plot(X,Y)
from matplotlib import pyplot as plt
acc = 0.97
X = [i/100.0 for i in range(1,100)]
Y = [get_bayes_probability(acc, inc/100) for inc in range(1,100)]
plt.plot(X,Y);
inc = 0.97
X = [i/100.0 for i in range(1,100)]
Y = [get_bayes_probability(acc/100, inc) for acc in range(1,100)]
plt.plot(X,Y);
Now go back and beautify your plots. Add a title and a legend. Some axis labels. Maybe read about matplotlib styles and change up the colors. Try a different type of plot. Just experiment some. Results below.
plt.title("Bayes Probability as a function of Incidence")
plt.xlabel("Incidence Rate of Infection")
plt.ylabel("Prob of having disease")
plt.plot(X,Y)
[<matplotlib.lines.Line2D at 0x7d9932f8d010>]
# from https://www.dunderdata.com/blog/view-all-available-matplotlib-styles
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-2, 8, .1)
y = .1 * x ** 3 - x ** 2 + 3 * x + 2
fig = plt.figure(dpi=100, figsize=(10, 20), tight_layout=True)
available = ['default'] + plt.style.available
for i, style in enumerate(available):
with plt.style.context(style):
ax = fig.add_subplot(10, 3, i + 1)
ax.plot(x, y)
ax.set_title(style)
with plt.style.context("fivethirtyeight"):
plt.title("Bayes Probability as a function of Incidence")
plt.xlabel("Incidence Rate of Infection")
plt.ylabel("Prob of having disease")
plt.plot(X,Y)
with plt.style.context("ggplot"):
plt.title("Bayes Probability as a function of Incidence")
plt.xlabel("Incidence Rate of Infection")
plt.ylabel("Prob of having disease")
plt.plot(X,Y)