Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Twenty Four

Spring 2026, UC Berkeley

Our next topic is Bayesian computation. The two main classes of techniques are Markov Chain Monte Carlo (MCMC) and Variational Inference (VI). We will start discussing MCMC today.

Markov Chain Monte Carlo

The goal is to obtain samples from a target distribution with density π\pi (in our applications, π\pi will be the posterior density). We assume that π\pi is known only upto a normalizing constant. The normalizing constant can be determined by integrating the unnormalized π\pi (over its domain) but this integration is assumed to be difficult.

We want to generate θ(0),θ(1),,θ(N)\theta^{(0)}, \theta^{(1)}, \dots, \theta^{(N)} such that

1Nt=1Ng(θ(t))g(θ)π(θ)dθ.\frac{1}{N} \sum_{t=1}^N g(\theta^{(t)}) \approx \int g(\theta) \pi(\theta) d\theta.

for all functions gg.

θ(t)\theta^{(t)} is generated sequentially or iteratively for t=0,1,2,t = 0, 1, 2, \dots with θ(t)\theta^{(t)} (and some external randomness) used to generate θ(t+1)\theta^{(t+1)}.

The simplest MCMC algorithm is the Random Walk Metropolis Algorithm.

The Random Walk Metropolis Algorithm

The random-walk Metropolis algorithm works by generating proposals via a random walk-step from the current state and then accepting or rejecting them with certain probability. Suppose that the current state is θ(t)\theta^{(t)}. It then proposes to move to the state θ(t)+Z\theta^{(t)} + Z where ZZ is a random vector having a symmetric density around 0. Note that the density of the proposal θ(t)+Z\theta^{(t)} + Z is given by q(θ(t),)q(\theta^{(t)}, \cdot) where

q(x,y)=fZ(yx).q(x, y) = f_Z(y - x).

with fZf_Z denoting the density of ZZ. Because fZf_Z is symmetric about 0, we have

q(x,y)=q(y,x).q(x, y) = q(y, x).

The function q(,)q(\cdot, \cdot) is commonly referred to as the candidate-generating density or proposal-generating density. We say that qq is symmetric if (2) holds. Clearly random walk Metropolis with a symmetric fZf_{Z} uses symmetric proposal-generating densities. With this notation, the random-walk Metropolis algorithm can be rephrased as follows. For xx and yy, let

α(x,y):=min(π(y)π(x),1).\alpha(x, y) := \min \left(\frac{\pi(y)}{\pi(x)}, 1\right).

α(x,y)\alpha(x, y) is referred to as the acceptance probability while going from xx to yy.

  1. Initialize with an arbitrary value θ(0)\theta^{(0)}.

  2. Repeat the following for t=1,,Nt = 1, \dots, N:

    1. Let x=θ(t)x = \theta^{(t)}.

    2. Generate a candidate or proposal yy from the symmetric candidate generating density q(x,)q(x, \cdot) and a uniform random number uUnif(0,1)u \sim \text{Unif}(0, 1).

    3. If uα(x,y)u \geq \alpha(x, y), then set θ(t+1)=x\theta^{(t + 1)} = x. We say in this case that the proposed move from xx to yy is rejected and we stay at xx.

    4. If u<α(x,y)u < \alpha(x, y), then set θ(t+1)=y\theta^{(t+1)} = y. We say in this case that the proposed move from xx to yy is accepted.

  3. Return the values θ(1),,θ(N)\theta^{(1)}, \dots, \theta^{(N)}.

The above algorithm is known as the Metropolis Algorithm. It works for any symmetric (i.e., satisfying (2)) proposal-generating density q(,)q(\cdot, \cdot). The Random-Walk Metropolis algorithm is a special case of this where q(x,y)q(x, y) is of the form fZ(xy)f_{Z}(x - y) for a symmetric (about 0) density fZf_{Z}.

The Metropolis algorithm can be further generalized to deal with non-symmetric proposal generating densities. This generalization is referred to as the Metropolis-Hastings algorithm. We shall look at this algorithm in the next lecture. We will try to answer the following questions in the coming lectures:

  1. What is the intuition behind the form of these algorithms?

  2. Why do they actually work and give samples satisfying (1)?

To answer these questions, we need to know a little bit about Markov Chains. The algorithms above output samples θ(0),,θ(N)\theta^{(0)}, \dots, \theta^{(N)}. Clearly, θ(t+1)\theta^{(t+1)} is generated using only the value θ(t)\theta^{(t)}. In other words, the values θ(0),,θ(t1)\theta^{(0)}, \dots, \theta^{(t-1)} are not used at all in order to generate θ(t+1)\theta^{(t+1)}. Thus θ(0),,θ(N)\theta^{(0)}, \dots, \theta^{(N)} can be seen as a realization of a sequence of random vectors Θ(0),,Θ(N)\Theta^{(0)}, \dots, \Theta^{(N)} which form a Markov Chain. Properties of Markov Chains are used to justify (1), as we shall see in the coming lectures.