We looked at frequentist solutions in the last lecture. These are generally based on p-values. Calculation of these p-values require consideration of alternative datasets that could have appeared (but did not actually appear). As we discussed last time, such calculations would violate the likelihood principle.
Here is a Bayesian solution to this problem. The goal is to calculate:
P{fairness∣data}
where data refers to TTTTHTHTTTTH. By the Bayes rule, we can write
This is actually a very strong assumption in favor of fairness because a coin can be not fair in many many variety of ways. So to assume that the probability of fairness is the same as the combined probability of the many variety of ways in which the coin can be non-fair seems quite strong.
Let us now come to P{data∣not fair}. If the coin is not fair, we can assume that it has a heads probability of p and that the coin tosses are still independent. We can then write
Note that this Bayesian probability calculation does not depend at all on whether the number of tosses (n=12) was decided a priori or whether it was decided to toss until getting 3 heads. It is the same for both those cases.
Also note that the Bayesian approach (based on (1) and (2)) is only slightly supporting the alternative hypothesis (roughly 60% to the null 40%) while the frequentist p-values are fairly small indicating more evidence for the alternative. This discrepancy also persists when the sample size is large. Consider the following example.
Here is another example of hypothesis testing or model selection in the Bayesian framework. This example comes from the book MacKay (2003, Chapter 28).
Most people would look at the sequence and guess the next number as 15. In other words, they recognize that the data form an arithmetic progression. If they are willing to consider alternative models, then we can pose the question as that of model selection, and use Bayesian methods (probability) to formally answer it.
Here are two possible models:
Model 1: Arithmetic Progression i.e., a1=α and an+1=an+β.
Model 2: Random (will be made precise later).
A Bayesian solution to this problem will attempt to calculate
P{Modeli∣data}for i=1,2.
What probability assignments would we need to calculate the above? We can use the Bayes Rule to write
We now need to calculate P{data∣Modeli} for i=1,2. For i=1, we have (below α and β are the parameters in Model 1):
P{data∣Model1}=P{α=−1,β=4}
To calculate the above, we need to make a probability assignment for the probability with which α and β take various values. MacKay (2003, Chapter 28) assumes that α and β are integer-valued that they are independently uniformly distributed over the set {−50,−49,…,49,50} which has cardinality 101. Then
For the second model, we need to specify what we mean by “random”. We shall take this to mean that a1,a2,a3,a4 are independently distributed according to the uniform distribution on {−50,−49,…,49,50}. Then
Plugging in the above value (as well as (4) and (5)) in (3), we get
P{Model1∣data}≈101−2×0.5+101−4×0.5101−2×0.5=0.999902 and P{Model2∣data}=.000098
This analysis clearly favors Model 1 compared to Model 2. The most interesting feature about this analysis is that
P{Model1∣data}≫P{Model2∣data}
even though
P{Model1}=P{Model2}.
In other words, we did not dogmatically assert that the data was generated by an arithmetic progression but we gave a fair chance to the two models to explain the observed sequence.
In this example, some people argue in favor of Model 1 on the basis that Model 1 is “simpler” than Model 2. The Bayesian analysis above (based on probability theory) does not invoke any vague notion of simplicity but does some formal calculations which in this case preferred Model 1 to Model 2. In another situation, Model 2 may well be the preferred model.
One can consider other alternative models in this problem. For example, MacKay (2003, Chapter 28) considered the following cubic model:
Model 3 (Cubic): These numbers were generated by the formula: a1=a and an+1=ban3+can2+d for an integer a and rational numbers b,c,d.
This cubic model explains the given data perfectly if and only if its four parameters a,b,c,d are chosen as a=−1,b=−1/11,c=9/11,d=23/11. As a result,
P{data∣Model3}=P{a=−1,b=−1/11,c=9/11,d=23/11}.
In order to explicitly calculate the above, we need to make probability assignments for a,b,c,d. MacKay (2003) makes the following probability assignment: we assume that these four parameters are independent with a being uniform on {−50,−49,…,49,50} and b,c,d having the distribution of x/y where x∼Unif{−50,−49,…,49,50} and y∼Unif{1,…,50} are independent. Under this assignment:
In the above, we used P(b=−1/11)=4⋅(1/101)⋅(1/50) because −1/11=−2/22=−3/33=−4/44 and each of these has the probability (1/101)⋅(1/50). A similar reasoning is used for P{c=9/11} and P{d=23/11}.
If Model 1, 2, 3 are the only three models considered, the Bayes rule (3) becomes
Our preference for Model 1 is still as strong as before (when we only considered the two models Model 1 and Model 2).
The analysis given here depends on the specific choices of priors used for the three models. One can of course use alternative priors but the qualitative preference for Model 1 is unlikely to change for most reasonable prior choices.