with w known and fixed (e.g., w=0.3 or 0.7 as in the simulations) and μ0,μ1 being unknown. We will describe the Gibbs sampler algorithm in this setting first, and then generalize it to the case where w is unknown and we have variance parameters σ02 and σ12.
This posterior cannot be evaluated in closed form and numerical methods need to be used. A standard approach is to use the Gibbs sampler with augmentation. First observe that the model (1) can be rewritten in the following way:
zi∼i.i.dBernoulli(w) and yi∣zi=1∼N(μ1,1) and yi∣zi=0∼N(μ0,1).
It should be clear that, under the above model, the marginal distribution of yi coincides with (1). z1,…,zn can be thought of as unobserved latent variables which represent which of the two populations (corresponding to the distributions N(μ0,1) and N(μ1,1) respectively) the observation yi comes from.
Gibbs sampler is implemented for jointly sampling from the posterior of μ0,μ1,z1,…,zn given the data. This requires being able to sample from the full conditionals
where y=(y1,…,yn) is the data and z=(z1,…,zn). It is easy to see that these full conditionals can be written in closed form as follows. Given μ0,μ1,y1,…,yn, the variables z1,…,zn are independent with
As we saw in the simulations in the last couple of lectures, initialization is very important. The log-likelihood can have multiple modes only one of which is the correct mode (in the sense of having large likelihood). If the Gibbs sampler is not properly initialized, the algorithm would sample from a spurious peak.
The proportion parameter w is constrained to lie in [0,1] so it is natural to take the Beta prior for it. A standard choice here is a=b=1 (which corresponds to the uniform prior). For μ0,μ1, we are using the N(m,s2) prior. Standard choice is m=0 and s very large. The Inverse Gamma prior IG(α,β) corresponds to the density:
The standard uninformative prior for σ is ∝σ−1I{σ>0}. The corresponding prior density for σ2 is also ∝x−1I{x>0}. This is a special case of IG(α,β) corresponding to α=β=0.
The prior (12) is therefore uses conjugate families while including the standard uninformative choices as special cases. The reason for conjugacy is that the full conditional distributions corresponding to the posterior distribution can be written in closed form. We shall look at the formulae in the next lecture.