Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture One

What is Bayesian Statistics?

Bayesian Statistics = Probability Theory

This is the central message of the book “Probability Theory – the logic of Science” by E. T. Jaynes which is arguably the most important book on Bayesian Statistics.

How does Bayesian Statistics work?

Probability works in the following way. We are interested in knowing whether a certain proposition is true, and we do not have access to full information that would allow us to conclusively determine whether the proposition is true or not. Probability theory allows us to determine a number between 0 and 1 representing how likely it is that the proposition is true based on the available information. This is achieved by the following two steps:

  1. Step One: The available information that we either possess or that we assume for the sake of argument is converted into numerical assignments for the probabilities of certain basic or elementary propositions. This step is often referred to as the modeling step.

  2. Step Two: Based on the probability model, we calculate probabilities of the propositions of interest using the rules of probability.

In the context of Bayesian statistics, the unknown proposition is usually written in terms of a variable Θ\Theta e.g., ΘA\Theta \in A for some set AA. The goal is to calculate the probability:

P{ΘAobserved data and other background information}.\P \{\Theta \in A \mid \text{observed data and other background information}\}.

The modeling step involves specification of:

P{observed dataΘ=θ and background information}\P\{\text{observed data} \mid \Theta = \theta \text{ and background information} \}

as well as

P{Θ=θbackground information} or fΘbackground information(θ)\P\{\Theta = \theta \mid \text{background information}\} \text{ or } f_{\Theta \mid \text{background information}}(\theta)

It is okay to simply think of all these as just probabilities. However, it is common to use the following terminology. (3) is called the prior model (or prior distribution), (2) is called the likelihood, and (1) is called posterior probability.

Rules of Probability

Probabilities are assigned to propositions (also known as events). Every probability is conditional on some information (this could be available information or some information that we assume for the sake of argument). We shall denote the probability of a proposition AA conditioned on some information II by P(AI)\P(A \mid I). When the information II is clear from context, we sometimes omit it and write the probability P(AI)\P(A \mid I) as simply P(A)\P(A). Even when we do this, it should always be kept in mind that probabilities are always conditioned on some information.

  1. The probability of a proposition always lies between 0 and 1. The probability of an impossible proposition is 0 and the probability of a certain proposition is 1.

  2. Product Rule: P(ABI)=P(AI)P(BA,I)=P(BI)P(AB,I)\P(A \cap B \mid I) = \P(A \mid I) \P(B \mid A, I) = \P(B \mid I) \P(A \mid B, I). Here ABA \cap B is the proposition: “both AA and BB are true”. Also P(AB,I)\P(A \mid B, I) is the probability of AA conditioned on the truth of the proposition BB as well as the information II. A direct consequence of the product rule is:

    P(BA,I)=P(AB,I)P(BI)P(AI).\P(B \mid A, I) = \frac{\P(A \mid B, I) \P(B \mid I)}{\P(A \mid I)}.

    The above formula is known as the Bayes rule.

  3. Sum Rule: If a proposition AA is broken down into disjoint propositions A1,A2,A_1, A_2, \dots, then

    P(AI)=iP(AiI)\P(A \mid I) = \sum_{i}\P(A_i \mid I)

    Disjoint here means that no two of AiA_is can happen simultaneously. For the sum rule, we need

    A happensexactly one of Ai happens.A \text{ happens} \Leftrightarrow \text{exactly one of } A_i \text{ happens}.

We shall see some justification for these rules later. All other rules of probability follow as a consequence of these rules. For example, the Bayes rule states that:

P(BA,I)=P(AB,I)P(BI)P(AB,I)P(BI)+P(ABc,I)P(BcI)\P\left(B \mid A, I \right) = \frac{\P\left(A \mid B, I \right) \P \left(B \mid I \right)}{\P\left(A \mid B, I \right) \P \left(B \mid I \right) + \P\left(A \mid B^c, I \right) \P \left(B^c \mid I \right)}

Example 1: Testing and Covid

Let Θ\Theta denote the binary parameter which represents whether I truly have Covid or not (Θ=1\Theta = 1 when I have Covid and Θ=0\Theta = 0 when I don’t). Let XX denote the binary outcome of the Covid test so that X=1X = 1 represents the positive test. We need to calculate the probability:

P{Θ=1test data and background information}\P \left\{\Theta = 1 \mid \text{test data} \text{ and } \text{background information} \right\}

where test data is simply X=1X = 1, and the background information refers to things like “I have been strictly quarantining for the past 3 weeks”, “I do not have symptoms such as fever” etc.

In order to calculate the posterior probability (4), we need to introduce probability assumptions. Consider the following model (below BB stands for background information)

P(Θ=1B)=0.02   P(X=1Θ=1,B)=0.99   P(X=1Θ=0,B)=0.04.\P(\Theta = 1 \mid B) = 0.02 ~~~ \P(X = 1\mid \Theta = 1, B) = 0.99 ~~~ \P(X = 1 \mid \Theta = 0, B) = 0.04.

P(Θ=1B)\P(\Theta = 1 \mid B) represents the probability of Covid based on background information alone (this is the prior). The fact that it is low (0.02) is meaningful when I know that I have been largely isolating myself for the past few weeks.

P(X=1Θ=1,B)=0.99\P(X = 1 \mid \Theta = 1, B) = 0.99 (true positive rate) and P(X=1Θ=0,B)=0.04\P(X = 1 \mid \Theta = 0, B) = 0.04 (false positive rate) represent the likelihood.

With these probability assignments, we use the Bayes rule to compute (4) as

P(Θ=1X=1,B)=P(X=1Θ=1,B)P(Θ=1B)P(X=1Θ=1,B)P(Θ=1B)+P(X=0Θ=0,B)P(Θ=0B)=0.990.020.990.02+0.040.98=0.3356.\begin{align*} \P(\Theta = 1 \mid X = 1, B) &= \frac{\P(X = 1 \mid \Theta = 1, B) \P(\Theta = 1 \mid B)}{\P(X = 1 \mid \Theta = 1, B) \P(\Theta = 1 \mid B) + \P(X = 0 \mid \Theta = 0, B) \P(\Theta = 0 \mid B)} \\ &= \frac{0.99 * 0.02}{0.99 * 0.02 + 0.04*0.98} = 0.3356. \end{align*}

Thus, under the assumed probability model, there is a 33.56% chance that I am truly covid positive given the positive test. Note that 0.3356 (33.56%33.56 \%) is not very high even though the test has very good false positive and false negative rates. This is because P(Θ=1B)\P(\Theta = 1 \mid B) (which can be interpreted as probability of having Covid without taking into the account the test result) is very low (0.02).