Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Twenty Five

Spring 2026, UC Berkeley

Markov Chains and Stationary Distributions

Let SS be a state space. A sequence of random variables Θ(0),Θ(1),\Theta^{(0)}, \Theta^{(1)}, \dots taking values in SS is a Markov Chain on SS provided the conditional distribution of Θ(t+1)\Theta^{(t+1)} given Θ(t)=θ(t),Θ(t1)=θ(t1),,Θ(0)=θ(0)\Theta^{(t)} = \theta^{(t)}, \Theta^{(t-1)} = \theta^{(t-1)}, \dots, \Theta^{(0)} = \theta^{(0)} only depends on θ(t)\theta^{(t)}. Further, if this conditional distribution is the same for all tt, then the Markov chain is said to be time-homogeneous. We will only deal with time-homogeneous Markov Chains in this course.

A (time-homogeneous) Markov Chain is described via the conditional distribution of Θ(t+1)\Theta^{(t+1)} given Θ(t)=x\Theta^{(t)} = x. If SS is finite or countable, then these conditional distributions can be described by the probabilities:

P(x,y):=P{Θ(t+1)=yΘ(t)=x}.\begin{align*} P(x, y) := \P\{\Theta^{(t+1)} = y \mid \Theta^{(t)} = x\}. \end{align*}

Clearly P(x,y)0P(x, y) \geq 0 and ySP(x,y)=1\sum_{y \in S} P(x, y) = 1 for every xSx \in S.

If SS is an open subset of Rd\R^d, then one usually describes the conditional distribution of Θ(t+1)\Theta^{(t+1)} given Θ(t)=x\Theta^{(t)} = x via the conditional density function P(x,y)P(x, y):

P{Θ(t+1)AΘ(t)=x}=AP(x,y)dy.\begin{align*} \P\{\Theta^{(t+1)} \in A \mid \Theta^{(t)} = x\} = \int_A P(x, y) dy. \end{align*}

Here P(x,y)0P(x, y) \geq 0 and SP(x,y)dy=1\int_S P(x, y) dy = 1 for all xSx \in S.

Stationary Distribution

A probability distribution π\pi on SS is said to be the stationary or invariant distribution of the Markov chain Θ(0),Θ(1),\Theta^{(0)}, \Theta^{(1)}, \dots if the following holds:

Θ(t)π    Θ(t+1)π.\begin{align*} \Theta^{(t)} \sim \pi \implies \Theta^{(t+1)} \sim \pi. \end{align*}

If SS is discrete, this can be written as:

π(y)=xSπ(x)P(x,y)  for every yS.\begin{align} \pi(y) = \sum_{x \in S} \pi(x) P(x, y) ~~ \text{for every $y \in S$}. \end{align}

In the continuous case, this becomes:

π(y)=Sπ(x)P(x,y)dx  for every yS.\begin{align} \pi(y) = \int_S \pi(x) P(x, y) dx ~~ \text{for every $y \in S$}. \end{align}

Ergodic Theorem

Suppose {Θ(t)}\{\Theta^{(t)} \} is a time-homogeneous Markov chain that is irreducible. Suppose π\pi is the stationary distribution of the Markov Chain (irreducibility guarantess uniqueness of the stationary distribution). Then

limN1Nt=1Ng(Θ(t))=g(θ)dπ(θ)\begin{align*} \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{t=1}^N g(\Theta^{(t)}) = \int g(\theta) d\pi(\theta) \end{align*}

almost surely. For a reference to this result, see RobertsRosenthal2004. For a comprehensive book-length reference, see MeynTweedie2009 (Chapter 17 of this book contains this result).

When the state space is finite, irreducibility means that for every pair of points xx and yy in the state space, there is a positive probability that the chain goes from xx to yy. For continuous state spaces, irreducibility is somewhat technical to define. If you are interested, see again the aforementioned references: RobertsRosenthal2004MeynTweedie2009.

Detailed Balance Condition

Note that (4) is equivalent to:

xSπ(y)P(y,x)=xSπ(x)P(x,y).\begin{align*} \sum_{x \in S} \pi(y) P(y, x) = \sum_{x \in S} \pi(x) P(x, y). \end{align*}

This is because xSP(y,x)=1\sum_{x \in S} P(y, x) = 1. Thus a sufficient condition for π\pi being the stationary distribution for the Markov chain given by PP is:

π(y)P(y,x)=π(x)P(x,y)  for all x,yS.\begin{align} \pi(y) P(y, x) = \pi(x) P(x, y) ~~\text{for all $x, y \in S$}. \end{align}

Note that one gets the same condition by also arguing from (5) (in this case, P(x,y)P(x, y) and P(y,x)P(y, x) appearing in (8) should be interpreted as conditional pdfs as opposed to pmfs).

Here is a simple example of a Markov Chain satisfying detailed balance.

Our next example of a Markov Chain with detailed balance is the Metropolis-Hastings chain.

Metropolis-Hastings

Suppose π\pi is a probability on a state space SS. If SS is discrete, π\pi should be interpreted as a pmf and if SS is continuous, π\pi should be interpreted as a pdf. Let Q(x,y)Q(x, y) be some transition probability. Again if SS is discrete, interpret Q(x,y)Q(x, y) as a pmf over yy for each fixed xx. If SS is continuous, interpret Q(x,y)Q(x, y) as a pdf over yy for each fixed xx.

We do not assume that π()\pi(\cdot) and Q(,)Q(\cdot, \cdot) are related in any way. For example, there is no reason for QQ to satisfy detailed balance with respect to π\pi. In other words, for xx and yy, it may very well happen that

π(x)Q(x,y)π(y)Q(y,x).\begin{align*} \pi(x) Q(x, y) \neq \pi(y) Q(y, x). \end{align*}

The Metropolis-Hastings uses Q(,)Q(\cdot, \cdot) and π()\pi(\cdot) to construct a new Markov chain {Θ(t)}\{\Theta^{(t)}\} which moves as follows. Given Θ(t)=x\Theta^{(t)} = x, first use Q(x,)Q(x, \cdot) to generate yy, then take Θ(t+1)\Theta^{(t+1)} to be:

Undefined control sequence: \1 at position 123: …(x,y)}\right), \̲1̲em]
x, & \text{…

\Theta^{(t+1)} =
\begin{cases}
y, & \text{with probability } \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right), \1em]
x, & \text{with probability } 1 - \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right).
\end{cases}

The key is to realize that this new Markov chain satisfies detailed balance with respect to π\pi. To see this, we need to verify:

π(x)P(x,y)=π(y)P(y,x)   for all x,yS.\begin{align} \pi(x) P(x, y) = \pi(y) P(y, x) ~~\text{ for all $x, y \in S$}. \end{align}

If y=xy = x, there is nothing to prove. So let us assume that yxy \neq x. In this case, it is easy to see that

P(x,y)=Q(x,y)min ⁣(1,π(y)Q(y,x)π(x)Q(x,y)).\begin{align*} P(x, y) = Q(x, y) \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right). \end{align*}

Consequently

π(x)P(x,y)=π(x)Q(x,y)min ⁣(1,π(y)Q(y,x)π(x)Q(x,y))=min ⁣(π(x)Q(x,y),π(y)Q(y,x)).\begin{align*} \pi(x) P(x, y) = \pi(x) Q(x, y) \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right) = \min\!\left(\pi(x)\,Q(x,y), \pi(y) Q(y, x)\right). \end{align*}

Similarly π(y)P(y,x)=min ⁣(π(x)Q(x,y),π(y)Q(y,x))\pi(y) P(y, x) = \min\!\left(\pi(x)\,Q(x,y), \pi(y) Q(y, x)\right), which proves (13).

The Metropolis-Hastings Algorithm can be interpreted as a device for converting any Markov Chain into one which satisfies detailed balance with respect to π\pi. It simply adds an acceptance step to the Markov Chain given by Q(,)Q(\cdot, \cdot) with acceptance probability:

min(1,π(y)Q(y,x)π(x)Q(x,y)).\begin{align*} \min \left(1, \frac{\pi(y) Q(y, x)}{\pi(x) Q(x, y)} \right). \end{align*}

A simple special case of the Metropolis-Hastings arises when Q(,)Q(\cdot, \cdot) is symmetric in the sense that Q(x,y)=Q(y,x)Q(x, y) = Q(y, x). In this case, the acceptance probability is simply min(1,π(y)/π(x))\min(1, \pi(y)/\pi(x)). For example, when QQ is given by a random walk Markov chain: y=x+zy = x + z where zz has a symmetric distribution around 0 such as zN(0,Σ)z \sim N(0, \Sigma) or zuniform(j=1d[aj,aj])z \sim \text{uniform}(\prod_{j=1}^d [-a_j, a_j]). In this special case, the Metropolis-Hastings chain is simply referred to as the Metropolis Chain.

The Gibbs Sampler

Our third example of a Markov Chain satisfying detailed balance with respect to π\pi is the Gibbs sampler. Suppose the state space SS consists of bivariate pairs (x1,x2)(x_1, x_2). Suppose π()\pi(\cdot) is such that its conditionals π12(x2)\pi_{1 \mid 2}(\cdot \mid x_2) and π21(x1)\pi_{2 \mid 1} (\cdot \mid x_1) are easy to sample from. Here π12(x2)\pi_{1 \mid 2}(\cdot \mid x_2) is the conditional distribution of x1x_1 given x2x_2 when (x1,x2)π(x_1, x_2) \sim \pi. Similarly π21(x1)\pi_{2 \mid 1}(\cdot \mid x_1) is the conditional distribution of x2x_2 given x1x_1 when (x1,x2)π(x_1, x_2) \sim \pi.

The Gibbs sampler employs the following Markov Chain. Given Θ(t)=(x1,x2)\Theta^{(t)} = (x_1, x_2), first sample one of the indices 1 and 2 with equal probability 0.5. If we get 1, then use the update:

(x1,x2)(x1,x2)   where x1π12(x2).\begin{align*} (x_1, x_2) \mapsto (x_1', x_2) ~~ \text{ where $x_1' \sim \pi_{1 \mid 2}(\cdot \mid x_2)$}. \end{align*}

If we get 2, then use the update:

(x1,x2)(x1,x2)   where x2π21(x1).\begin{align*} (x_1, x_2) \mapsto (x_1, x_2') ~~ \text{ where $x_2' \sim \pi_{2 \mid 1}(\cdot \mid x_1)$}. \end{align*}

It turns out that this chain satisfies detailed balance with respect to π\pi. For this, we need to prove that

π(x1,x2)P((x1,x2),(x1,x2))=π(x1,x2)P((x1,x2),(x1,x2)).\begin{align*} \pi(x_1, x_2) P((x_1, x_2), (x_1', x_2')) = \pi(x_1', x_2') P((x_1', x_2'), (x_1, x_2)). \end{align*}

The transition (x1,x2)(x1,x2)(x_1, x_2) \rightarrow (x_1', x_2') will only be possible if one of x1=x1x_1 = x_1' or x2=x2x_2 = x_2' hold true. Let x1=x1x_1 = x_1' so we need to verify:

π(x1,x2)P((x1,x2),(x1,x2))=π(x1,x2)P((x1,x2),(x1,x2)).\begin{align} \pi(x_1, x_2) P((x_1, x_2), (x_1, x_2')) = \pi(x_1, x_2') P((x_1, x_2'), (x_1, x_2)). \end{align}

From the description of the Gibbs chain, it is clear that

P((x1,x2),(x1,x2))=12π21(x2x1)=12π(x1,x2)π1(x1)\begin{align*} P((x_1, x_2), (x_1, x_2')) = \frac{1}{2} \pi_{2 \mid 1}(x_2' \mid x_1) = \frac{1}{2} \frac{\pi(x_1, x_2')}{\pi_1(x_1)} \end{align*}

where π1()\pi_1(\cdot) is the marginal distribution of X1X_1 when (X1,X2)π(X_1, X_2) \sim \pi. As a result

π(x1,x2)P((x1,x2),(x1,x2))=12π(x1,x2)π(x1,x2)π1(x1).\begin{align*} \pi(x_1, x_2) P((x_1, x_2), (x_1, x_2')) = \frac{1}{2} \frac{\pi(x_1, x_2)\pi(x_1, x_2')}{\pi_1(x_1)}. \end{align*}

Clearly the right hand side above is symmetric in x2x_2 and x2x_2' which proves (20). One can similarly verify detailed balance when x2=x2x_2 = x_2'.