Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Two

Spring 2026, UC Berkeley

In the last lecture, we started with the following simple example of Bayesian inference.

Example 1: Testing and Covid

Let Θ\Theta denote the binary parameter which represents whether I truly have Covid or not (Θ=1\Theta = 1 when I have Covid and Θ=0\Theta = 0 when I don’t). Let XX denote the binary outcome of the Covid test so that X=1X = 1 represents the positive test. We need to calculate the probability:

P{Θ=1test data and background information}\P \left\{\Theta = 1 \mid \text{test data} \text{ and } \text{background information} \right\}

where test data is simply X=1X = 1, and the background information refers to things like “I have been strictly quarantining for the past 3 weeks”, “I do not have symptoms such as fever” etc.

We used the probability model (below BB stands for background information):

prior: P(Θ=1B)=0.02likelihood: P(X=1Θ=1,B)=0.99   P(X=1Θ=0,B)=0.04.\begin{split} & \text{prior: } \P(\Theta = 1 \mid B) = 0.02 \\ & \text{likelihood: } \P(X = 1\mid \Theta = 1, B) = 0.99 ~~~ \P(X = 1 \mid \Theta = 0, B) = 0.04. \end{split}

With these probability assignments, we use the Bayes rule to compute (4) as

P(Θ=1X=1,B)=P(X=1Θ=1,B)P(Θ=1B)P(X=1Θ=1,B)P(Θ=1B)+P(X=1Θ=0,B)P(Θ=0B)=0.990.020.990.02+0.040.98=0.3356.\begin{align*} \P(\Theta = 1 \mid X = 1, B) &= \frac{\P(X = 1 \mid \Theta = 1, B) \P(\Theta = 1 \mid B)}{\P(X = 1 \mid \Theta = 1, B) \P(\Theta = 1 \mid B) + \P(X = 1 \mid \Theta = 0, B) \P(\Theta = 0 \mid B)} \\ &= \frac{0.99 * 0.02}{0.99 * 0.02 + 0.04*0.98} = 0.3356. \end{align*}

This probability is not very high even though the test has very good false positive and false negative rates. This is because the prior probability P(Θ=1B)\P(\Theta = 1 \mid B) is very low (0.02). So, even with the positive test result, it is more likely than not that we are covid-free.

Here is an alternative method of reasoning in this problem. We can formulate this as a hypothesis testing problem with

H0:Θ=0  versus  H1:Θ=1H_0 : \Theta = 0 ~~ \text{versus} ~~ H_1: \Theta = 1

The null hypothesis represents not having covid. The pp-value in the above testing problem equals:

P{X=1H0}=P(X=1Θ=0)=0.04.\P\{X = 1 | H_0\} = \P(X = 1 | \Theta = 0) = 0.04.

Usage of the naive standard cutoff 0.05 on the pp-value would now lead to rejection of the null hypothesis and declaring that I have Covid. On the other hand, the previous argument (based on probability theory) gave a much higher probability to me not having Covid.

This pp-value based method does not even make use of the information given on P(Θ=1B)\P(\Theta = 1 \mid B) and P(X=1Θ=1,B)\P(X = 1|\Theta = 1, B). It only makes use of P(X=1Θ=0)\P(X = 1|\Theta = 0). Note that what we are after is P(Θ=0X=1)\P(\Theta = 0|X = 1) while the pp-value is P{X=1Θ=0}\P\{X = 1 \mid \Theta = 0\}. In general, P(AB)\P(A|B) and P(BA)\P(B|A) can be quite different. The correct way of relating them is via the Bayes rule. Without using Bayes rule, one cannot argue that P(AB)\P(A \mid B) is large or small using largeness or smallness of P(BA)\P(B \mid A).

Consider, for example, the case where AA represents the event that a person is dead and BB represents the event that they were hanged. Here P(AB)\P(A \mid B) is quite close to one while P(BA)\P(B \mid A) is quite close to zero.

It is therefore quite problematic that one can say something about Θ=0X=1\Theta = 0|X = 1 from only P(X=1Θ=0)\P(X = 1|\Theta = 0).

Methods such as testing based on pp-values (and putting arbitrary cutoffs on them) are not based on probability. They are examples of frequentist reasoning.

Example 2: Spots on a patient

Here the unknown parameter is Θ\Theta which represents the disease status. Θ\Theta can take the three values smallpox, chickenpox, neither of them.

The data is that the patient has spots.

We need to calculate the probability:

P{Θ=smallpoxspots+B}\P\{\Theta = \mathrm{smallpox} \mid \mathrm{spots} + B\}

where BB again represents background information representing other symptoms (e.g., fever) that the patient has. Here is one probability assignment which allows us to calculate this probability:

P{spotssmallpox,B}=0.9    P{spotschickenpox,B}=0.8    P{spotsneither,B}=0\P\{\mathrm{spots} \mid \mathrm{smallpox}, B\} = 0.9 ~~~~ \P\{\mathrm{spots} \mid \mathrm{chickenpox}, B\} = 0.8 ~~~~ \P\{\mathrm{spots} \mid \mathrm{neither}, B\} = 0

and

P{smallpoxB}=0.001    P{chickenpoxB}=0.1    P{neitherB}=0.899.\P\{\mathrm{smallpox} \mid B\} = 0.001 ~~~~ \P\{\mathrm{chickenpox} \mid B\} = 0.1 ~~~~ \P\{\mathrm{neither} \mid B\} = 0.899.

Here “neither” refers to an underlying cause for the patient’s condition that is neither smallpox nor chickenpox.

Using this assignment, the required probability (3) can be calculated via Bayes rule and this leads to

P{smallpoxspots,B}0.011    P{chickenpoxspots,B}0.988    P{neitherspots,B}=0.\P\{\mathrm{smallpox} \mid \mathrm{spots}, B\} \approx 0.011 ~~~~ \P\{\mathrm{chickenpox} \mid \mathrm{spots}, B\} \approx 0.988 ~~~~ \P\{\mathrm{neither} \mid \mathrm{spots}, B\} = 0.

So probability theory with the assignment (4) and (5) says that it is highly likely that the patient has chickenpox (smallpox is basically ruled out because it is extremely rare).

Alternative Solution: Maximum Likelihood

Here is an alternative way of solving this problem using maximum likelihood estimation. The maximum likelihood estimate in this case is smallpox\mathrm{smallpox} because smallpox leads to a higher probability (0.9) of the observed data (spots) compared to chickenpox (0.8). Maximum Likelihood (widely used in statistics) is not based on probability theory and also seems to be based on the wrong conditional probabilities P{spotssmallpox}\P\{\text{spots}|\text{smallpox}\} and P{spotschickenpox}\P\{\text{spots}|\text{chickenpox}\} while we really should be calculating P{smallpoxspots}\P\{\text{smallpox} | \text{spots}\} and P{chickenpoxspots}\P\{\text{chickenpox} | \text{spots}\}.

For the modeling part in Bayesian applications, for the most part, we shall use standard models based on normal distributions, more generally, exponential families. Here is a simple example to illustrate the use of the normal distribution.

Example 3: Inference from measurements

Suppose a scientist makes 6 numerical measurements 26.6, 38.5, 34.4, 34, 31, 23.6 on an unknown real-valued physical quantity θ\theta. On the basis of these measurements, what can be inferred about θ\theta?

Here is the Bayesian solution to this problem. The first step is modeling where we have to write the likelihood and prior. The likelihood represents the probability of the observed data conditional on parameter values. Here the main parameter is θ\theta. In order to write the probability of the observed data, it is helpful to introduce another parameter σ\sigma which represents the scale of the noise inherent in the measurement process.

So our parameter vector is (θ,σ)(\theta, \sigma). We work with the normal likelihood:

Likelihood=i=1n12πσexp((xiθ)22σ2).\text{Likelihood} = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x_i - \theta)^2}{2 \sigma^2} \right).

where n=6n = 6 and x1=26.6,x2=38.5,x3=34.4,x4=34,x5=31,x6=23.6x_1 = 26.6, x_2 = 38.5, x_3 = 34.4, x_4 = 34, x_5 = 31, x_6 = 23.6 denote the observed data points. More formally, you can arrive at this likelihood in the following way. Denote potential measurements by X1,,XnX_1, \dots, X_n. Each actual measurement will have some rounding error so the data point 26.6 should be understood as belonging to the interval [26.6δ,26.6+δ][26.6 - \delta, 26.6 + \delta] for some small rounding error δ\delta. So the likelihood is:

likelihood=P{observed dataθ,σ}=P{X1[x1δ,x1+δ],,Xn[xnδ,xn+δ]θ,σ}.\begin{align*} \text{likelihood} &= \P\{\text{observed data} \mid \theta, \sigma\} \\ &= \P \left\{X_1 \in [x_1 - \delta, x_1 + \delta], \dots, \X_n \in [x_n - \delta, x_n + \delta] \mid \theta, \sigma \right\}. \end{align*}

Assuming δ\delta is small, we can use probability-density approximation to write

likelihoodδnfX1,,Xnθ,σ(x1,,xn).\begin{align*} \text{likelihood} \approx \delta^n f_{X_1, \dots, X_n \mid \theta, \sigma}(x_1, \dots, x_n). \end{align*}

We are now assuming that:

fX1,,Xnθ,σ(x1,,xn)=i=1n12πσexp((xiθ)22σ2).\begin{align} f_{X_1, \dots, X_n \mid \theta, \sigma}(x_1, \dots, x_n) = \prod_{i=1}^n \frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x_i - \theta)^2}{2 \sigma^2} \right). \end{align}

This leads to the likelihood (6) (note that δn\delta^n is being dropped as it is a constant of proportionality which does not affect any further calculations).

We will complete this example in the next lecture.