There is a given target probability density π and our goal is to construct a Markov chain which satisfies detailed balance with respect to π. Given a current value x, HMC constructs the next value y by solving the following second order ODE:
x¨(t)=∇logπ(x(t)) with initialization x(0)=x and x˙(0)=z∼N(0,Id).
Most authors (see e.g., neal2011mcmc) use the formulation (6) to describe HMC. The advantage of the formulation (30) is that it makes the connection to MALA very clear (essentially, MALA proposals are obtained by replacing ∇logπ(x(t)) by ∇logπ(x(0))).
On the other hand, the main advantage of the Hamiltonian formulation is that it also works for choices of the Hamiltonian that are different from (3). For example, consider the alternative choices of the Hamiltonian:
H(x,v)=−logπ(x)+vTMv/2 for a positive definite matrix v. Here the dynamics (6) become
because ∂x∂H=−∇logπ(x) and ∂v∂H=Mv. This formulation of the Hamiltonian is meaningful when the variables x1,…,xd have different scales. One can also make M depend on x. This is related to Riemannian Manifold HMC (see girolami2011riemann).
H(x,v)=−logπ(x)+∥v∥1. Here the dynamics (6) become
because ∂v∂H=sign(v). Also note that v(t)=(v1(t),…,vd(t)) and sign(v(t)) is interpreted coordinatewise as (sign(v1(t)),…,sign(vd(t))). These are interesting dynamics that cannot be written in a second order form as in (30).
The above is the same as (6) except that we denote the initial velocity by v (instead of v in (6)). In this section, we will not use the specific form of the Hamiltonian given by (3), but we will take the more general form:
where K(v) is a symmetric function of v i.e., K(−v)=K(v).
We will view the Hamiltonian dynamics (run for a fixed time σ) as a mapping from (x,v) (the initial state) to a final state (x(σ),v(σ)). We will denote this mapping by Tσ(x,v):
A function g for which g−1=g (which is equivalent to g(g(x))=x) is called an involution. The above statement is therefore equivalent to saying that TσS is an involution. Involutions have been recently used to unify many MCMC algorithms (see e.g., glatt2024sacred).
where JTσ denotes the Jacobian of Tσ. Thus volume preservation can be proved by showing that the Jacobian JTσ has determinant equal to 1 for all (x,v):
If σ is not small, then break [0,σ] into N intervals each of length σ/N, and then decompose Tσ as TN∘TN−1∘⋯∘T1 where Tj is the mapping corresponding to Hamiltonian dynamics from t=(j−1)σ/N to t=jσ/N. For each j, we use the previous argument to claim detJTj=1+O(σ2/N2). Taking the product over j, we get
The ODE (2) cannot be solved exactly for most π(⋅). So we have to approximately solve it using a discretization technique. We fix a step size ϵ and approximate the ODE (2) at t=0,ϵ,2ϵ,…. More specifically, we will construct:
Below we will describe how to obtain (x(t+ϵ),v(t+ϵ)) from (x(t),v(t)) (this process will then be iteratively applied starting at t=0).
The standard approach to discretization of the HMC equation (2) is to use the leapfrog discretization. The leapfrog discretization uses the following formulae to construct x(t+ϵ) and v(t+ϵ) from x(t),y(t).
Given the explicit formula for Tϵdisc in (25), this verification is straightforward.
Volume Preserving: Tϵdisc,N is volume-preserving. This can be verified by showing the the determinant of the Jacobian of Tϵdisc,N. equals one. This, in turn, can be verified by proving that the determinant of the Jacobian of Tϵdisc equals 1. This can be verified directly by using the formula (25) or by noting (from (24)) that Tϵdisc is the composition of three mappings:
These mappings are very simple and one can directly verify that they are volume preserving (i.e., the determinant of their Jacobians equals 1).
While the discretized Hamiltonian dynamics satisfy invertibility (or time-reversibility) and volume preservation, they do not satisfy Hamiltonian conservation (unlike continuous Hamiltonian dynamics). Because of this, a Metropolis acceptance correction has to be applied when implementing HMC with leapfrog discretization. This will be described in the next lecture.
On stationarity of π for the continuous Hamiltonian Dynamics¶
For the continuous Hamiltonian dynamics given by (6), it turns that if the initial conditions x and v are distributed independently according to x∼π and v∼N(0,Id), then x(σ) and v(σ) are also independently distributed as x(σ)∼π and v(σ)∼N(0,Id) for every σ. This can be proved using the continuity equation that we discussed in the last lecture. This argument is given below.
The PDE Theorem 1 is known as the continuity equation or Transport equation, Fokker-Planck equation (it is closely related to the Fokker-Planck and Kolmogorov Forward equations).
The continuity equation has been popular in the recent literature on generative modeling (see e.g., lai2025principles).
This implies that ρ(t,x,v)=ρb(x,v) satisfies Theorem 1 (observe that ∂t∂ρ(t,x,v)=0 because ρ(t,x,v) is constant in t).
This shows that if x∼π, then the solution x(σ) to (30) at any time σ is distributed according to π. This means that π is stationary for the Hamiltonian Markov Chain.