Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Twenty Eight

Spring 2026, UC Berkeley

Gibbs Sampler for Mixture Models

We observe real-valued data y1,,yny_1, \dots, y_n and consider the model:

yii.i.dpN(μ1,1)+(1p)N(μ2,1)\begin{align} y_i \overset{\text{i.i.d}}{\sim} p N(\mu_1, 1) + (1 - p) N(\mu_2, 1) \end{align}

with pp known and fixed (e.g., p=0.3p = 0.3 as in the simulations) and μ1,μ2\mu_1, \mu_2 being unknown. The Gibbs sampler formulae can be easily generalized to more realistic cases (e.g., when pp is unknown, variances are different from 1 etc. as we will see later). The log-likelihood corresponding to (1) is:

i=1nlog(pϕ(yi,μ1,1)+(1p)ϕ(yi,μ2,1)),\begin{align*} \sum_{i=1}^n \log \left(p \phi(y_i, \mu_1, 1) + (1 - p) \phi(y_i, \mu_2, 1) \right), \end{align*}

where ϕ(x,μ,σ2)\phi(x, \mu, \sigma^2) is the normal density with mean μ\mu, variance σ2\sigma^2 evaluated at xx.

We will take a standard prior for μ1,μ2\mu_1, \mu_2 e.g., μ1,μ2i.i.dN(0,C)\mu_1, \mu_2 \overset{\text{i.i.d}}{\sim} N(0, C) for a large CC. The posterior of μ1,μ2\mu_1, \mu_2 is then:

π(μ1,μ2data)ϕ(μ1,0,C)ϕ(μ2,0,C)i=1n(pϕ(yi,μ1,1)+(1p)ϕ(yi,μ2,1)).\begin{align*} \pi(\mu_1, \mu_2 \mid \text{data}) \propto \phi(\mu_1, 0, C) \phi(\mu_2, 0, C) \prod_{i=1}^n \left(p \phi(y_i, \mu_1, 1) + (1 - p) \phi(y_i, \mu_2, 1) \right). \end{align*}

This posterior cannot be evaluated in closed form and numerical methods need to be used. The Gibbs sampler is one computational tool that is applicable here. The standard approach of using the Gibbs sampler in this problem proceeds via augmentation. This is explained below.

First observe that the model (1) can be rewritten in the following way:

zii.i.dBernoulli(p)   and   yizi=1N(μ1,1)   and   yizi=0N(μ2,1).\begin{align*} z_i \overset{\text{i.i.d}}{\sim} \text{Bernoulli}(p) ~~ \text{ and } ~~ y_i \mid z_i = 1 \sim N(\mu_1, 1) ~~ \text{ and } ~~ y_i \mid z_i = 0 \sim N(\mu_2, 1). \end{align*}

It should be clear that, under the above model, the marginal distribution of yiy_i coincides with (1). z1,,znz_1, \dots, z_n can be thought of as unobserved latent variables which represent which of the two populations (corresponding to the distributions N(μ1,1)N(\mu_1, 1) and N(μ2,1)N(\mu_2, 1) respectively) the observation yiy_i comes from.

Gibbs sampler is implemented for jointly sampling from the posterior of μ1,μ2,z1,,zn\mu_1, \mu_2, z_1, \dots, z_n given the data. This requires being able to sample from the full conditionals

zμ1,μ2,y   and   (μ1,μ2)z,y.\begin{align*} z \mid \mu_1, \mu_2, y ~~ \text{ and } ~~ (\mu_1, \mu_2) \mid z, y. \end{align*}

where y=(y1,,yn)y = (y_1, \dots, y_n) is the data and z=(z1,,zn)z = (z_1, \dots, z_n). It is easy to see that these full conditionals can be written in closed form as follows. Given μ1,μ2,y1,,yn\mu_1, \mu_2, y_1, \dots, y_n, the variables z1,,znz_1, \dots, z_n are independent with

ziμ1,μ2,yBernoulli(pϕ(yi,μ1,1)pϕ(yi,μ1,1)+(1p)ϕ(yi,μ2,1)).\begin{align*} z_i \mid \mu_1, \mu_2, y \sim \text{Bernoulli} \left(\frac{p \phi(y_i, \mu_1, 1)}{p \phi(y_i, \mu_1, 1) + (1 - p) \phi(y_i, \mu_2, 1)}\right). \end{align*}

In the limit CC \rightarrow \infty, given z1,,zn,y1,,ynz_1, \dots, z_n, y_1, \dots, y_n, the variables μ1,μ2\mu_1, \mu_2 are independent with

μ1z,yN(i=1nyizii=1nzi,1i=1nzi)   and   μ2z,yN(i=1nyi(1zi)i=1n(1zi),1i=1n(1zi))\begin{align*} \mu_1 \mid z, y \sim N \left(\frac{\sum_{i=1}^n y_i z_i}{\sum_{i=1}^n z_i}, \frac{1}{\sum_{i=1}^n z_i} \right) ~~ \text{ and } ~~ \mu_2 \mid z, y \sim N \left(\frac{\sum_{i=1}^n y_i (1 - z_i)}{\sum_{i=1}^n (1 -z_i)}, \frac{1}{\sum_{i=1}^n (1 - z_i)} \right) \end{align*}

Based on these full conditional distributions, the Gibbs sampler can be easily implemented. We shall give the exact form of the algorithm in the next lecture. It also turns out that the EM algorithm for computing the MLE is similar to the Gibbs sampler. We shall also look at this in the next lecture.