Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Thirty Three

Spring 2026, UC Berkeley

Hamiltonian Monte Carlo

In HMC, the proposal yy is generated by solving the following ODE:

x(0)=x    x˙(0)=z    x¨(t)=logπ(x(t))    y=x(σ).\begin{align} x(0) = x ~~~~ \dot{x}(0) = z ~~~~ \ddot{x}(t) = \nabla \log \pi(x(t)) ~~~~ y = x(\sigma). \end{align}

Note that if logπ(x(t))\nabla \log \pi(x(t)) is replaced by logπ(x)\nabla \log \pi(x), then it is easy to check that yy equals the MALA proposal y=x+(σ2/2)logπ(x)+σzy = x + (\sigma^2/2)\nabla \log \pi(x) + \sigma z.

The ODE (30) can be written in the following alternative first order form:

(x˙(t)v˙(t))=(v(t)logπ(x(t))) with initialization (x(0)v(0))=(xz)\begin{align} \begin{pmatrix} \dot{x}(t) \\ \dot{v}(t) \end{pmatrix} = \begin{pmatrix} v(t) \\ \nabla \log \pi(x(t)) \end{pmatrix} ~\text{with initialization}~ \begin{pmatrix} x(0) \\ v(0) \end{pmatrix} = \begin{pmatrix} x \\ z \end{pmatrix} \end{align}

It is easy to see that (30) and (2) are equivalent. Here is another common way of writing (2). Let

H(x,v)=H(x1,,xd,v1,,vd):=logπ(x)+12v2\begin{align} H(x, v) = H(x_1, \dots, x_d, v_1, \dots, v_d) := -\log \pi(x) + \frac{1}{2}\|v\|^2 \end{align}

This quantity H(x,v)H(x, v) is called the Hamiltonian associated with the ODE (2). It is then easy to see that

Hx=(Hx1,,Hxd)T=logπ(x)\begin{align*} \frac{\partial H}{\partial x} = \left(\frac{\partial H}{\partial x_1}, \dots, \frac{\partial H}{\partial x_d} \right)^T = -\nabla \log \pi(x) \end{align*}

and

Hv=(Hv1,,Hvd)T=v.\begin{align*} \frac{\partial H}{\partial v} = \left(\frac{\partial H}{\partial v_1}, \dots, \frac{\partial H}{\partial v_d} \right)^T = v. \end{align*}

As a result, (2) is equivalent to:

(x˙(t)v˙(t))=(HvHx) with initialization (x(0)v(0))=(xz)\begin{align} \begin{pmatrix} \dot{x}(t) \\ \dot{v}(t) \end{pmatrix} = \begin{pmatrix} \frac{\partial H}{\partial v} \\ -\frac{\partial H}{\partial x} \end{pmatrix} ~\text{with initialization}~ \begin{pmatrix} x(0) \\ v(0) \end{pmatrix} = \begin{pmatrix} x \\ z \end{pmatrix} \end{align}

One important property of this ODE is that the Hamiltonian H(x(t),v(t))H(x(t), v(t)) remains constant in tt for all times tt:

H(x(t),v(t))=constant=H(x(0),v(0))=H(x,z)=logπ(x)+12z2.\begin{align} H(x(t), v(t)) = \text{constant} = H(x(0), v(0)) = H(x, z) = -\log \pi(x) + \frac{1}{2} \|z\|^2. \end{align}

To see (7), just note that

ddtH(x(t),v(t))=i=1d(Hxix˙i(t)+Hviv˙i(t))=i=1d(HxiHvi+Hvi(Hxi))=i=1d(HxiHviHviHxi)=0\begin{align*} \frac{d}{dt} H(x(t), v(t)) &= \sum_{i=1}^d \left( \frac{\partial H}{\partial x_i} \dot{x}_i(t) + \frac{\partial H}{\partial v_i} \dot{v}_i(t) \right) \\ &= \sum_{i=1}^d \left(\frac{\partial H}{\partial x_i} \frac{\partial H}{\partial v_i} + \frac{\partial H}{\partial v_i} \left(- \frac{\partial H}{\partial x_i}\right) \right) = \sum_{i=1}^d \left(\frac{\partial H}{\partial x_i} \frac{\partial H}{\partial v_i} - \frac{\partial H}{\partial v_i} \frac{\partial H}{\partial x_i} \right) = 0 \end{align*}

Stationarity of π\pi for the HMC

The transition kernel given by (30) satisfies detailed balance with respect to π\pi for every fixed σ>0\sigma > 0. A consequence of this is that π\pi is invariant (stationary) for this Markov Chain (for every σ\sigma). We will sketch a proof of this invariance (and skip the proof of detailed balance).

Stationarity of π\pi means that if xπx \sim \pi, then yy given by (30) is also distributed as π\pi. In order to verify this, we shall use the following result on density evolutions of ODEs. This result has been popular in the recent literature on generative modeling (see e.g., lai2025principles).

In the next lecture, we shall see how this result implies that π\pi is invariant for the HMC chain. We will basically apply Theorem 1 with VV given by:

V(t,x,v)=(vlogπ(x))=(HvHx).\begin{align*} V(t, x, v) = \begin{pmatrix} v \\ \nabla \log \pi(x) \end{pmatrix} = \begin{pmatrix} \frac{\partial H}{\partial v} \\ -\frac{\partial H}{\partial x} \end{pmatrix}. \end{align*}

We shall also discuss the leapfrog discretization of the Hamiltonian ODE (2), and its Metropolization.