In Lectures 1 and 2, we studied the following problem.
Let Θ denote the binary parameter which represents whether I truly have Covid or not (Θ=1 when I have Covid and Θ=0 when I don’t). Let X denote the binary outcome of the Covid test so that X=1 represents the positive test. We need to calculate the probability:
where test data is simply X=1, and the background information refers to things like “I have been strictly quarantining for the past 3 weeks”, “I do not have symptoms such as fever” etc.
We used the probability model (below B stands for background information):
This probability is not very high even though the test has very good false positive and false negative rates. This is because the prior probability P(Θ=1∣B) is very low (0.02). So, even with the positive test result, it is more likely than not that we are covid-free.
On the other hand, if we pose the problem as hypothesis testing:
H0:Θ=0versusH1:Θ=1
and calculate the p-value as
P{X=1∣H0}=P(X=1∣Θ=0)=0.04,
we get a different result (if we use the standard cutoff 0.05 on the p-value). This leads to rejection of the null hypothesis and declaring that I have Covid.
The use of p-values has been linked to serious issues such as lack of reproducibility (see for example the paper "The reproducibility of research and the misinterpretation of p-values" by David Colquhoun). In this context, we can calculate the probability of reproducibility of the positive test as follows. Let X2 denote the outcome of the second test (and X1=X will now denote the outcome of the first test):
This assumption means that conditional on my Covid status Θ, the two test outcomes X1 and X2 are independent. Using this assignment, it is straightforward to calculate the reproducibility probability as follows (note that we already calculated P(Θ=1∣X1=1,B)=1−P(Θ=0∣X1=1,B)=0.3356)
In Bayesian statistics, the rules of probability are used mostly for the following:
to compute the marginal distribution of X based on knowledge of the conditional distribution of X given Θ=θ (i.e., the likelihood) as well as the marginal distribution of Θ (i.e., the prior)
to compute the conditional distribution of Θ given X=x (i.e., the posterior) based on the same knowledge of the conditional distribution of X given Θ=θ (i.e., the likelihood) as well as the marginal distribution of Θ (i.e., the prior).
The formula for the first item above is sometimes called the Law of Total Probability (LTP), while the formula for the second item is called the Bayes Rule. The precise formulae differ according to whether X and Θ are discrete or continuous. It is natural to consider the following four separate cases.
f0 is the standard normal density and f1 is a Laplace (double-exponential) density. Both densities have the same maximal value of 1/2π. Based on the information given, calculate the conditional distribution of Θ given X1=x1,X2=x2,…,X6=x6 (i.e., n=6) where
Here is the statistical context for this question. We observe data x1,…,xn with n=6. We want to use one of the models f0 or f1 for this data. The random variable Θ is used to describe the choice of the model. We want to treat both the models on an equal footing so we assumed that Θ has the uniform prior distribution on {0,1}.
To calculate the conditional distribution of Θ given the data, we use the formula (6) because Θ is discrete and the data X1,…,Xn are continuous. This gives
Plugging in the above formula the data values given in (7) for x1,…,x6, we obtain
P{Θ=0∣X1=x1,…,X6=x6}=0.72 and P{Θ=1∣X1=x1,…,X6=x6}=0.28
Thus, conditioning on the data, we have a 72% probability for the normal model compared to 28% probability for the Laplace model. So we would prefer to use the normal distribution here.
Now suppose that we add in an additional observation x7=5. It can be checked that
P{Θ=0∣X1=x1,…,X7=x7}=0.001 and P{Θ=1∣X1=x1,…,X7=x7}=0.999
Now there is overwhelming preference for the Laplace model. This is because x7=5 is an outlying observation to which the Laplace model gives much higher probability compared to the Normal model owing to heavy tails of the Laplace density.