STAT 238 - Bayesian Statistics Lecture Thirty Three
Spring 2026, UC Berkeley
Hamiltonian Monte Carlo ¶ In HMC, the proposal y y y is generated by solving the following ODE:
x ( 0 ) = x x ˙ ( 0 ) = z x ¨ ( t ) = ∇ log π ( x ( t ) ) y = x ( σ ) . \begin{align}
x(0) = x ~~~~ \dot{x}(0) = z ~~~~ \ddot{x}(t) = \nabla \log
\pi(x(t)) ~~~~ y = x(\sigma).
\end{align} x ( 0 ) = x x ˙ ( 0 ) = z x ¨ ( t ) = ∇ log π ( x ( t )) y = x ( σ ) . Note that if ∇ log π ( x ( t ) ) \nabla \log \pi(x(t)) ∇ log π ( x ( t )) is replaced by ∇ log π ( x ) \nabla \log
\pi(x) ∇ log π ( x ) , then it is easy to check that y y y equals the MALA proposal y = x + ( σ 2 / 2 ) ∇ log π ( x ) + σ z y
= x + (\sigma^2/2)\nabla \log \pi(x) + \sigma z y = x + ( σ 2 /2 ) ∇ log π ( x ) + σ z .
The ODE (30) can be written in the following alternative first order form:
( x ˙ ( t ) v ˙ ( t ) ) = ( v ( t ) ∇ log π ( x ( t ) ) ) with initialization ( x ( 0 ) v ( 0 ) ) = ( x z ) \begin{align}
\begin{pmatrix}
\dot{x}(t) \\ \dot{v}(t)
\end{pmatrix} =
\begin{pmatrix}
v(t) \\ \nabla \log \pi(x(t))
\end{pmatrix} ~\text{with initialization}~
\begin{pmatrix}
x(0) \\ v(0)
\end{pmatrix} =
\begin{pmatrix}
x \\ z
\end{pmatrix}
\end{align} ( x ˙ ( t ) v ˙ ( t ) ) = ( v ( t ) ∇ log π ( x ( t )) ) with initialization ( x ( 0 ) v ( 0 ) ) = ( x z ) It is easy to see that (30) and (2) are equivalent. Here is another common way of writing (2) . Let
H ( x , v ) = H ( x 1 , … , x d , v 1 , … , v d ) : = − log π ( x ) + 1 2 ∥ v ∥ 2 \begin{align}
H(x, v) = H(x_1, \dots, x_d, v_1, \dots, v_d) := -\log \pi(x) +
\frac{1}{2}\|v\|^2
\end{align} H ( x , v ) = H ( x 1 , … , x d , v 1 , … , v d ) := − log π ( x ) + 2 1 ∥ v ∥ 2 This quantity H ( x , v ) H(x, v) H ( x , v ) is called the Hamiltonian associated with the ODE (2) . It is then easy to see that
∂ H ∂ x = ( ∂ H ∂ x 1 , … , ∂ H ∂ x d ) T = − ∇ log π ( x ) \begin{align*}
\frac{\partial H}{\partial x} = \left(\frac{\partial H}{\partial
x_1}, \dots, \frac{\partial H}{\partial x_d} \right)^T = -\nabla
\log \pi(x)
\end{align*} ∂ x ∂ H = ( ∂ x 1 ∂ H , … , ∂ x d ∂ H ) T = − ∇ log π ( x ) and
∂ H ∂ v = ( ∂ H ∂ v 1 , … , ∂ H ∂ v d ) T = v . \begin{align*}
\frac{\partial H}{\partial v} = \left(\frac{\partial H}{\partial
v_1}, \dots, \frac{\partial H}{\partial v_d} \right)^T = v.
\end{align*} ∂ v ∂ H = ( ∂ v 1 ∂ H , … , ∂ v d ∂ H ) T = v . As a result, (2) is equivalent to:
One important property of this ODE is that the Hamiltonian H ( x ( t ) , v ( t ) ) H(x(t),
v(t)) H ( x ( t ) , v ( t )) remains constant in t t t for all times t t t :
H ( x ( t ) , v ( t ) ) = constant = H ( x ( 0 ) , v ( 0 ) ) = H ( x , z ) = − log π ( x ) + 1 2 ∥ z ∥ 2 . \begin{align}
H(x(t), v(t)) = \text{constant} = H(x(0), v(0)) = H(x, z) = -\log
\pi(x) + \frac{1}{2} \|z\|^2.
\end{align} H ( x ( t ) , v ( t )) = constant = H ( x ( 0 ) , v ( 0 )) = H ( x , z ) = − log π ( x ) + 2 1 ∥ z ∥ 2 . To see (7) , just note that
d d t H ( x ( t ) , v ( t ) ) = ∑ i = 1 d ( ∂ H ∂ x i x ˙ i ( t ) + ∂ H ∂ v i v ˙ i ( t ) ) = ∑ i = 1 d ( ∂ H ∂ x i ∂ H ∂ v i + ∂ H ∂ v i ( − ∂ H ∂ x i ) ) = ∑ i = 1 d ( ∂ H ∂ x i ∂ H ∂ v i − ∂ H ∂ v i ∂ H ∂ x i ) = 0 \begin{align*}
\frac{d}{dt} H(x(t), v(t)) &= \sum_{i=1}^d \left( \frac{\partial H}{\partial
x_i} \dot{x}_i(t) + \frac{\partial
H}{\partial v_i} \dot{v}_i(t) \right) \\
&= \sum_{i=1}^d \left(\frac{\partial H}{\partial x_i} \frac{\partial
H}{\partial v_i} + \frac{\partial H}{\partial v_i} \left(-
\frac{\partial H}{\partial x_i}\right) \right) = \sum_{i=1}^d \left(\frac{\partial H}{\partial x_i} \frac{\partial
H}{\partial v_i} - \frac{\partial H}{\partial v_i}
\frac{\partial H}{\partial x_i} \right) = 0
\end{align*} d t d H ( x ( t ) , v ( t )) = i = 1 ∑ d ( ∂ x i ∂ H x ˙ i ( t ) + ∂ v i ∂ H v ˙ i ( t ) ) = i = 1 ∑ d ( ∂ x i ∂ H ∂ v i ∂ H + ∂ v i ∂ H ( − ∂ x i ∂ H ) ) = i = 1 ∑ d ( ∂ x i ∂ H ∂ v i ∂ H − ∂ v i ∂ H ∂ x i ∂ H ) = 0 Stationarity of π \pi π for the HMC ¶ The transition kernel given by (30) satisfies detailed balance with respect to π \pi π for every fixed σ > 0 \sigma > 0 σ > 0 . A consequence of this is that π \pi π is invariant (stationary) for this Markov Chain (for every σ \sigma σ ). We will sketch a proof of this invariance (and skip the proof of detailed balance).
Stationarity of π \pi π means that if x ∼ π x \sim \pi x ∼ π , then y y y given by (30) is also distributed as π \pi π . In order to verify this, we shall use the following result on density evolutions of ODEs. This result has been popular in the recent literature on generative modeling (see e.g., lai2025principles ).
Consider the ODE
S ˙ ( x , t ) = V ( t , S ( x , t ) ) with initial condition S ( x , 0 ) = x . \begin{align}
\dot{S}(x, t) = V(t, S(x, t)) ~~\text{with initial condition $S(x,
0) = x$}.
\end{align} S ˙ ( x , t ) = V ( t , S ( x , t )) with initial condition S ( x , 0 ) = x . Here S ˙ ( x , t ) = d d t S ( x , t ) \dot{S}(x, t) = \frac{d}{dt}S(x, t) S ˙ ( x , t ) = d t d S ( x , t ) , x ∈ R d x \in \R^d x ∈ R d and S ( t , ⋅ ) , V ( t , ⋅ ) S(t,
\cdot), V(t, \cdot) S ( t , ⋅ ) , V ( t , ⋅ ) are functions from R d \R^d R d to R d \R^d R d . Suppose ρ b \rho_b ρ b is a density on R d \R^d R d and let ρ ( t , ⋅ ) \rho(t, \cdot) ρ ( t , ⋅ ) be the density of S ( X , t ) S(X, t) S ( X , t ) with X ∼ ρ b X \sim \rho_b X ∼ ρ b . Then ρ ( t , x ) \rho(t, x) ρ ( t , x ) satisfies the following PDE:
∂ t ρ ( t , x ) = − ∇ ⋅ ( V ( t , x ) ρ ( t , x ) ) with ρ ( 0 , ⋅ ) = ρ b . \begin{align}
\partial_t \rho(t, x) = -\nabla \cdot \left(V(t, x) \rho(t, x)
\right) ~~ \text{ with $\rho(0, \cdot) = \rho_b$}.
\end{align} ∂ t ρ ( t , x ) = − ∇ ⋅ ( V ( t , x ) ρ ( t , x ) ) with ρ ( 0 , ⋅ ) = ρ b . Here ∇ ⋅ \nabla \cdot ∇ ⋅ , also known as the divergence, is defined as follows. ∇ ⋅ G ( x ) = ∑ i = 1 d ∂ ∂ x i G i ( x ) \nabla \cdot G(x) = \sum_{i=1}^d \frac{\partial}{\partial
x_i} G_i(x) ∇ ⋅ G ( x ) = ∑ i = 1 d ∂ x i ∂ G i ( x ) for a function G ( x ) = ( G 1 ( x ) , … , G d ( x ) ) G(x) = (G_1(x), \dots, G_d(x)) G ( x ) = ( G 1 ( x ) , … , G d ( x )) . So ∇ ⋅ ( V ρ ) = d i v ( V ρ ) = ∑ i = 1 d ∂ ∂ x i ( V i ( x , t ) ρ ( x , t ) ) \nabla \cdot (V \rho) = \mathrm{div} (V \rho) = \sum_{i=1}^d
\frac{\partial}{\partial x_i} (V_i(x, t) \rho(x, t)) ∇ ⋅ ( V ρ ) = div ( V ρ ) = ∑ i = 1 d ∂ x i ∂ ( V i ( x , t ) ρ ( x , t )) .
In the next lecture, we shall see how this result implies that π \pi π is invariant for the HMC chain. We will basically apply Theorem 1 with V V V given by:
V ( t , x , v ) = ( v ∇ log π ( x ) ) = ( ∂ H ∂ v − ∂ H ∂ x ) . \begin{align*}
V(t, x, v) =
\begin{pmatrix}
v \\ \nabla \log \pi(x)
\end{pmatrix} =
\begin{pmatrix}
\frac{\partial H}{\partial v} \\ -\frac{\partial H}{\partial x}
\end{pmatrix}.
\end{align*} V ( t , x , v ) = ( v ∇ log π ( x ) ) = ( ∂ v ∂ H − ∂ x ∂ H ) . We shall also discuss the leapfrog discretization of the Hamiltonian ODE (2) , and its Metropolization.