Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Thirty

Spring 2026, UC Berkeley

The Gibbs Sampler for Gaussian Mixture Models

We observe real-valued data y1,,yny_1, \dots, y_n and we consider the model:

yii.i.dj=1kwjN(μj,σj2)\begin{align*} y_i \overset{\text{i.i.d}}{\sim} \sum_{j=1}^k w_j N(\mu_j, \sigma_j^2) \end{align*}

with unknown parameters (w1,,wk)(w_1, \dots, w_k) and (μj,σj2)(\mu_j, \sigma_j^2) for j=1,,kj = 1, \dots, k. (w1,,wk)(w_1, \dots, w_k) is a probablity vector (i.e., wj0w_j \geq 0 and jwj=1\sum_j w_j = 1).

We use the following priors:

(w1,,wk)Dirichlet(a1,,ak)μ1,,μki.i.dN(m,s2)σ12,,σk2i.i.dIG(α,β).\begin{align*} & (w_1, \dots, w_k) \sim \text{Dirichlet}(a_1, \dots, a_k) \\ & \mu_1, \dots, \mu_k \overset{\text{i.i.d}}{\sim} N(m, s^2) \\ & \sigma_1^2, \dots, \sigma_k^2 \overset{\text{i.i.d}}{\sim} IG(\alpha, \beta). \end{align*}

The default choices are aj=1a_j = 1 for all jj, m=0m = 0 and s2s^2 to be large, and α,β\alpha, \beta to be near zero. Note that the density of the Inverse Gamma distribution IG(α,β)IG(\alpha, \beta) is given by xα1exp(β/x)I{x>0}x^{-\alpha - 1} \exp(-\beta/x) I\{x > 0\}. It is easy to check that this is simply the density of X1X^{-1} where XGamma(α,β)X \sim \text{Gamma}(\alpha, \beta).

To use the Gibbs sampler, we introduce latent variables z1,,znz_1, \dots, z_n which represent group memberships. Specifically, ziz_i represents which of the kk Gaussians the observation yiy_i is coming from. More precisely, ziz_i takes the values 1,,k1, \dots, k with probabilities w1,,wkw_1, \dots, w_k (i.e., ziCategorical(w)z_i \sim \text{Categorical}(w)), and

yizi=jN(μj,σj2).\begin{align*} y_i \mid z_i = j \sim N(\mu_j, \sigma_j^2). \end{align*}

The Gibbs sampler will then jointly sample from all the unknown variables:

w1,,wk,μ1,σ12,,μk,σk2,z1,,zny1,,yn\begin{align*} w_1, \dots, w_k, \mu_1, \sigma_1^2, \dots, \mu_k, \sigma_k^2, z_1, \dots, z_n \mid y_1, \dots, y_n \end{align*}

The full conditionals corresponding to all the variables can be written in closed form as described below. We will use the notation w=(w1,,wk)w = (w_1, \dots, w_k), μ=(μ1,,μk)\mu = (\mu_1, \dots, \mu_k) and σ=(σ1,,σk)\sigma = (\sigma_1, \dots, \sigma_k), also y=(y1,,yn)y = (y_1, \dots, y_n) and z=(z1,,zn)z = (z_1, \dots, z_n).

For the full conditional corresponding to z1,,znz_1, \dots, z_n (i.e., the conditional distribution of z1,,znz_1, \dots, z_n given y1,,yny_1, \dots, y_n as well as w1,,wk,μ1,σ12,,μk,σk2w_1, \dots, w_k, \mu_1, \sigma_1^2, \dots, \mu_k, \sigma_k^2) is given by:

P{zi=jyi,w,μ,σ}=rij:=wjϕ(yi,μj,σj2)j=1kwjϕ(yi,μj,σj2).\begin{align*} \P \left\{z_i = j \mid y_i, w, \mu, \sigma \right\} = r_{ij} := \frac{w_j \phi(y_i, \mu_j, \sigma_j^2)}{\sum_{j=1}^k w_j \phi(y_i, \mu_j, \sigma_j^2)}. \end{align*}

In other words,

ziyi,w,μ,σCategorical(ri)  where ri=(ri1,,rik).\begin{align*} z_i \mid y_i, w, \mu, \sigma \sim \text{Categorical}(r_i) ~~ \text{where $r_i = (r_{i1}, \dots, r_{ik})$}. \end{align*}

The full conditional of w=(w1,,wk)w = (w_1, \dots, w_k) (i.e., the conditional distribution of ww given z,y,μ,σz, y, \mu, \sigma) is:

wz,y,μ,σDirichlet(a1+n1,,ak+nk)  where nj:=i=1nI{zi=j}.\begin{align*} w \mid z, y, \mu, \sigma \sim \text{Dirichlet}(a_1 + n_1, \dots, a_k + n_k) ~~ \text{where $n_j := \sum_{i=1}^n I\{z_i = j\}$}. \end{align*}

The full conditional of μj\mu_j (i.e., the conditional distribution of μj\mu_j given z,y,σz, y, \sigma) is:

μjz,y,σN(m/s2+i:zi=jyi/σj21/s2+nj/σj2,11/s2+nj/σj2).\begin{align*} \mu_j \mid z, y, \sigma \sim N \left(\frac{m/s^2 + \sum_{i : z_i = j} y_i/\sigma_j^2}{1/s^2 + n_j/\sigma_j^2}, \frac{1}{1/s^2 + n_j/\sigma_j^2} \right). \end{align*}

Note that we are also conditioning on σj\sigma_j above. If we do not condition on σj\sigma_j, then the conditional distribution of μj\mu_j will be tt-distributed and not normal distributed.

The full conditional of σj2\sigma_j^2 (i.e., the conditional distribution of σj2\sigma_j^2 given z,y,μz, y, \mu) is:

σj2z,y,σIG(α+nj2,β+12i:zi=j(yiμj)2),\begin{align*} \sigma_j^2 \mid z, y, \sigma \sim IG \left(\alpha + \frac{n_j}{2}, \beta + \frac{1}{2} \sum_{i : z_i = j} (y_i - \mu_j)^2 \right), \end{align*}

where again nj=i=1nI{zi=j}n_j = \sum_{i=1}^n I\{z_i = j\}.

So the Gibbs sampler algorithm is given by:

  1. Initialize the parameters w(0),μ(0),σ(0)w^{(0)}, \mu^{(0)}, \sigma^{(0)}.

  2. Repeat the following for t=0,1,2,t = 0, 1, 2, \dots:

    1. Generate z1(t),,zn(t)z_1^{(t)}, \dots, z_n^{(t)} via:

      zi(t)Categorical(wj(t)ϕ(yi,μj(t),(σj(t))2)j=1kwj(t)ϕ(yi,μj(t),(σj(t))2),j=1,,k)\begin{align*} z_i^{(t)} \sim \text{Categorical} \left(\frac{w_j^{(t)} \phi \left(y_i, \mu^{(t)}_j, (\sigma_j^{(t)})^2 \right)}{\sum_{j=1}^k w_j^{(t)} \phi \left(y_i, \mu^{(t)}_j, (\sigma_j^{(t)})^2 \right)}, j = 1, \dots, k \right) \end{align*}
    2. Calculate

      nj(t):=i=1nI{zi(t)=j}   for j=1,,k.\begin{align*} n_j^{(t)} := \sum_{i=1}^n I\{z_i^{(t)} = j\} ~~ \text{ for $j = 1, \dots, k$}. \end{align*}
    3. Generate w(t+1)=(w1(t+1),,wk(t+1))w^{(t+1)} = (w_1^{(t+1)}, \dots, w_k^{(t+1)}) via

      w(t+1)Dirichlet(a1+n1(t),,ak+nk(t)).\begin{align*} w^{(t+1)} \sim \text{Dirichlet}(a_1 + n_1^{(t)}, \dots, a_k + n_k^{(t)}). \end{align*}
    4. Generate μ1(t+1),,μk(t+1)\mu_1^{(t+1)}, \dots, \mu_k^{(t+1)} via

      μj(t+1)N(m/s2+i:zi(t)=jyi/(σj(t))21/s2+nj(t)/(σj(t))2,11/s2+nj(t)/(σj(t))2).\begin{align*} \mu_j^{(t+1)} \sim N \left(\frac{m/s^2 + \sum_{i : z^{(t)}_i = j} y_i/(\sigma_j^{(t)})^2}{1/s^2 + n_j^{(t)}/(\sigma_j^{(t)})^2}, \frac{1}{1/s^2 + n^{(t)}_j/(\sigma_j^{(t)})^2} \right). \end{align*}
    5. Generate σ1(t+1),,σk(t+1)\sigma_1^{(t+1)}, \dots, \sigma_k^{(t+1)} via

      (σj2)(t+1)IG(α+nj(t)2,β+12i:zi(t)=j(yiμj(t+1))2),\begin{align*} (\sigma_j^2)^{(t+1)} \sim IG \left(\alpha + \frac{n^{(t)}_j}{2}, \beta + \frac{1}{2} \sum_{i : z^{(t)}_i = j} (y_i - \mu^{(t+1)}_j)^2 \right), \end{align*}

The EM Algorithm

The EM algorithm can be seen as a deterministic variant of the Gibbs sampler. We shall look at the EM in more generality later. In the case of fitting finite normal mixtures, the EM algorithm is given by the following:

  1. Initialize the parameters w(0),μ(0),σ(0)w^{(0)}, \mu^{(0)}, \sigma^{(0)}.

  2. Repeat the following for t=0,1,2,t = 0, 1, 2, \dots until convergence:

    1. E-step. Compute the responsibility of component jj for observation ii:

      rij(t):=wj(t)ϕ(yi,μj(t),(σj(t))2)l=1kwl(t)ϕ(yi,μl(t),(σl(t))2),j=1,,k.\begin{align*} r_{ij}^{(t)} := \frac{w_j^{(t)} \phi\left(y_i, \mu_j^{(t)}, (\sigma_j^{(t)})^2\right)} {\sum_{l=1}^k w_l^{(t)} \phi\left(y_i, \mu_l^{(t)}, (\sigma_l^{(t)})^2\right)}, \quad j = 1, \dots, k. \end{align*}
    2. Compute the effective counts

      nj(t):=i=1nrij(t),j=1,,k.\begin{align*} n_j^{(t)} := \sum_{i=1}^n r_{ij}^{(t)}, \quad j = 1, \dots, k. \end{align*}
    3. M-step. Update the weights:

      wj(t+1):=nj(t)n,j=1,,k.\begin{align*} w_j^{(t+1)} := \frac{n_j^{(t)}}{n}, \quad j = 1, \dots, k. \end{align*}
    4. Update the means:

      μj(t+1):=i=1nrij(t)yinj(t),j=1,,k.\begin{align*} \mu_j^{(t+1)} := \frac{\sum_{i=1}^n r_{ij}^{(t)}\, y_i}{n_j^{(t)}}, \quad j = 1, \dots, k. \end{align*}
    5. Update the variances:

      (σj2)(t+1):=i=1nrij(t)(yiμj(t+1))2nj(t),j=1,,k.\begin{align*} (\sigma_j^2)^{(t+1)} := \frac{\sum_{i=1}^n r_{ij}^{(t)} \left(y_i - \mu_j^{(t+1)}\right)^2}{n_j^{(t)}}, \quad j = 1, \dots, k. \end{align*}

There are the following close connections between the EM algorithm and the Gibbs sampler: