Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Thirty Five

Spring 2026, UC Berkeley

HMC and Leapfrog Discretization

There is a given target probability density π\pi that we need to sample from. In HMC, given a current state xx, one attempts to solve the dynamics:

(x˙(t)v˙(t))=(Hv(x(t),v(t))Hx(x(t),v(t))) with initialization (x(0)v(0))=(xv)\begin{align} \begin{pmatrix} \dot{x}(t) \\ \dot{v}(t) \end{pmatrix} = \begin{pmatrix} \frac{\partial H}{\partial v}(x(t), v(t)) \\ -\frac{\partial H}{\partial x}(x(t), v(t)) \end{pmatrix} ~\text{with initialization}~ \begin{pmatrix} x(0) \\ v(0) \end{pmatrix} = \begin{pmatrix} x \\ v \end{pmatrix} \end{align}

with vN(0,Id)v \sim N(0, I_d). Here H(,)H(\cdot, \cdot) denotes the Hamiltonian:

H(x,v)=H(x1,,xd,v1,,vd):=logπ(x)+12v2.\begin{align} H(x, v) = H(x_1, \dots, x_d, v_1, \dots, v_d) := -\log \pi(x) + \frac{1}{2}\|v\|^2. \end{align}

Let TσT_{\sigma} denote the function which maps (x,v)(x, v) to (x(σ),v(σ))(x(\sigma), v(\sigma)) obtained by solving (6) for t[0,σ]t \in [0, \sigma]. We saw in the last lecture that TσT_{\sigma} is invertible (specifically, TσS=(TσS)1T_{\sigma} S = (T_{\sigma} S)^{-1} where SS is the velocity sign-flip operator), Hamiltonian preserving (i.e., H(Tσ(x,v))=H(x,v)H(T_{\sigma}(x, v)) = H(x, v)) and Volume preserving (i.e., the determinant of the Jacobian of TσT_{\sigma} always equals 1).

The HMC ODE (6) cannot be solved exactly in practice, so one attempts to solve it approximately via discretization by fixing a step size ϵ\epsilon and solving (6) approximately for t=0,ϵ,2ϵ,t = 0, \epsilon, 2 \epsilon, \dots. To go from (x(t),v(t))(x(t), v(t)) to (x(t+ϵ),v(t+ϵ))(x(t+\epsilon), v(t + \epsilon)), one uses the leapfrog discretization:

v(t+ϵ2)=v(t)+ϵ2logπ(x(t))x(t+ϵ)=x(t)+ϵv(t+ϵ2)v(t+ϵ)=v(t+ϵ2)+ϵ2logπ(x(t+ϵ))\begin{split} & v\left(t + \frac{\epsilon}{2}\right) = v(t) + \frac{\epsilon}{2} \nabla \log \pi(x(t)) \\ & x(t + \epsilon) = x(t) + \epsilon v\left(t + \frac{\epsilon}{2} \right) \\ & v(t + \epsilon) = v \left(t + \frac{\epsilon}{2} \right) + \frac{\epsilon}{2} \nabla \log \pi(x(t + \epsilon)) \end{split}

If we denote this map from (x(t),y(t))(x(t), y(t)) to (x(t+ϵ),y(t+ϵ))(x(t + \epsilon), y(t + \epsilon)) by Tϵdisc(x,v)T^{\text{disc}}_{\epsilon}(x, v) and then apply this mapping NN times in succession starting from (x,v)(x, v), then the overall mapping is given by

Tϵdisc,N=TϵdiscTϵdisc.\begin{align*} T^{\text{disc}, N}_{\epsilon} = T_{\epsilon}^{\text{disc}} \circ \dots \circ T_{\epsilon}^{\text{disc}}. \end{align*}

The above mapping is an approximation to the continuous ODE flow TσT_{\sigma}:

Tϵdisc,N(x,v)Tσ(x,v)   provided ϵ=σN.\begin{align*} T_{\epsilon}^{\text{disc}, N}(x, v) \approx T_{\sigma}(x, v) ~~ \text{ provided $\epsilon = \frac{\sigma}{N}$}. \end{align*}

In the last lecture, we observed that Tϵdisc,NT_{\epsilon}^{\text{disc}, N} is also invertible (with STϵdisc,N=(STϵdisc,N)1S T_{\epsilon}^{\text{disc}, N} = (S T_{\epsilon}^{\text{disc}, N})^{-1}) and volume preserving. But it does not preserve the Hamiltonian i.e., H(Tϵdisc,N(x,v))H(T_{\epsilon}^{\text{disc}, N}(x, v)) can be different from H(x,v)H(x, v).

While implementing leapfrog discretization to compute x(Nϵ),v(Nϵ)x(N\epsilon), v(N \epsilon), it can be checked that the following iterations can be used which combine the two velocity updates into one:

x(jϵ)=x((j1)ϵ)+ϵv((j1/2)ϵ)v((j+1/2)ϵ)=v((j1/2)ϵ)+ϵx(jϵ)\begin{align*} & x(j \epsilon) = x((j-1)\epsilon) + \epsilon v \left((j - 1/2) \epsilon \right) \\ & v((j+ 1/2) \epsilon) = v((j-1/2) \epsilon) + \epsilon x(j \epsilon) \end{align*}

for j=1,,N1j = 1, \dots, N-1. This iteration needs to be initialized using x(0)=xx(0) = x and v(ϵ/2)=v(0)+(ϵ/2)logπ(x(0))v(\epsilon/2) = v(0) + (\epsilon/2) \nabla \log \pi(x(0)). At iteration N1N-1, we obtain x((N1)ϵ)x((N-1)\epsilon) and v((N1/2)ϵ)v((N-1/2) \epsilon). From here x(Nϵ)x(N \epsilon) and v(Nϵ)v(N \epsilon) can be computed by:

x(Nϵ)=x((N1)ϵ)+ϵv((N1/2)ϵ)v(Nϵ)=v((N1/2)ϵ)+(ϵ/2)logπ(x(Nϵ)).\begin{align*} & x(N \epsilon) = x((N-1) \epsilon) + \epsilon v((N-1/2) \epsilon) \\ & v(N \epsilon) = v((N-1/2)\epsilon) + (\epsilon/2) \nabla \log \pi(x(N \epsilon)). \end{align*}

The HMC Algorithm

We are now ready to formally describe the HMC algorithm. The algorithm starts with an arbitrary initialization x(0)x^{(0)} and then repeats the following for t=0,1,2,t = 0, 1, 2, \dots:

  1. Generate v(t)N(0,Id)v^{(t)} \sim N(0, I_d).

  2. Run discretized Leapfrog Hamiltonian dynamics starting at (x(t),v(t))(x^{(t)}, v^{(t)}) for some fixed step size ϵ\epsilon and trajectory length σ\sigma. This gives (y,w)=Tϵdisc,N(x,v)(y, w) = T_{\epsilon}^{\text{disc}, N}(x, v) (here N=σ/ϵN = \sigma/\epsilon is the number of leapfrog steps).

  3. Compute the acceptance probability:

    min(1,exp(H(x(t),v(t))H(y,w)))\begin{align*} \min \left(1, \exp(H(x^{(t)}, v^{(t)}) - H(y, w)) \right) \end{align*}

    and update x(t+1)x^{(t+1)} as:

    Undefined control sequence: \1 at position 124: …, w)) \right), \̲1̲em] 
    x^{(t)} & …
    
    x^{(t+1)} =
    \begin{cases}
    y & \text{with probability }     \min \left(1, \exp(H(x^{(t)},
         v^{(t)}) - H(y, w)) \right), \1em] 
    x^{(t)} & \text{with probability } 1 - \min \left(1, \exp(H(x^{(t)},
         v^{(t)}) - H(y, w)) \right). 
    \end{cases}

We need to verify that the Markov Chain given by the above algorithm maintains detailed balance with respect to π\pi. This is difficult to verify directly for the chain with transitions to x(t+1)x^{(t+1)} from x(t)x^{(t)}. Instead, we shall verify detailed balance for the chain which transitions to (x(t+1),v(t+1))(x^{(t+1)}, v^{(t+1)}) from (x(t),v(t))(x^{(t)}, v^{(t)}). Note that v(t+1)v^{(t+1)} does not appear in the above description of the HMC algorithm. We rewrite the algorithm below using v(t+1)v^{(t+1)} explicitly.

In addition to using v(t+1)v^{(t+1)} explicitly, another change in the following algorithm is that the equation (y,w)=Tϵdisc,N(x,v)(y, w) = T_{\epsilon}^{\text{disc}, N}(x, v) is now changed to (y,w)=STϵdisc,N(x,v)(y, w) = S T_{\epsilon}^{\text{disc}, N}(x, v). In other words, we do a velocity flip after running the discretized Hamiltonian dynamics. This velocity flip makes the deterministic transition from (x,v)(x, v) to (y,w)(y, w) an involution (i.e., its inverse is itself: (STϵdisc,N)1=STϵdisc,N(S T_{\epsilon}^{\text{disc}, N})^{-1} = S T_{\epsilon}^{\text{disc}, N}) which makes the argument for detailed balance very clean. Note also that the velocity flip does not change the Hamiltonian (because H(y,w)=H(y,w)H(y, w) = H(y, -w) as the dependence on ww is through w2\|w\|^2) so the acceptance probability does not change. Algorithmically, both the versions of the algorithm lead to identical output.

  1. Generate v(t)N(0,Id)v^{(t)} \sim N(0, I_d).

  2. Run discretized Leapfrog Hamiltonian dynamics (followed by a velocity flip) starting at (x(t),v(t))(x^{(t)}, v^{(t)}) for some fixed step size ϵ\epsilon and trajectory length σ\sigma. This gives (y,w)=STϵdisc,N(x,v)(y, w) = S T_{\epsilon}^{\text{disc}, N}(x, v) (here N=σ/ϵN = \sigma/\epsilon is the number of leapfrog steps).

  3. Compute the acceptance probability:

    min(1,exp(H(x(t),v(t))H(y,w)))\begin{align*} \min \left(1, \exp(H(x^{(t)}, v^{(t)}) - H(y, w)) \right) \end{align*}

    and update (x(t+1),v(t+1))(x^{(t+1)}, v^{(t+1)}) as:

    Undefined control sequence: \1 at position 142: …, w)) \right), \̲1̲em] 
    (x^{(t)} v…
    
    (x^{(t+1)}, v^{(t+1)}) =
    \begin{cases}
    (y, w) & \text{with probability }     \min \left(1, \exp(H(x^{(t)},
         v^{(t)}) - H(y, w)) \right), \1em] 
    (x^{(t)} v^{(t)}), & \text{with probability } 1 - \min \left(1, \exp(H(x^{(t)},
         v^{(t)}) - H(y, w)) \right). 
    \end{cases}

Next we will verify detailed balance of the above Markov Chain with respect to the joint density:

π~(x,v)exp(H(x,v)).\begin{align*} \tilde{\pi}(x, v) \propto \exp \left(-H(x, v) \right). \end{align*}

Note that, under π~\tilde{\pi}, the variables xπx \sim \pi and vN(0,Id)v \sim N(0, I_d) are independent (because H(x,v)=logπ(x)+v2/2H(x, v) = -\log \pi(x) + \|v\|^2/2.

The transitions of the HMC Markov Chain from (x,v)(x, v) to (y,w)(y, w) are deterministic (given by the involutive map STϵdisc,NS T_{\epsilon}^{\text{disc}, N}) so we need to be careful about the form of detailed balance.

Recap: detailed balance

So far in this course (e.g., see Lectures 25 and 31), we have used the following definition of detailed balance. The Markov Chain with transition densities given by Q(x,y)Q(x, y) satisfies detailed balance with respect to the density π(x)\pi(x) provided:

π(x)Q(x,y)=π(y)Q(y,x).\begin{align} \pi(x) Q(x, y) = \pi(y) Q(y, x). \end{align}

This formulation is only true if the transitions have a density given by Q(x,)Q(x, \cdot). For HMC (viewed as a chain over (x(t),v(t))(x^{(t)}, v^{(t)})), the transitions are deterministic so we cannot use the detailed balance characterization in (13). The more general way of writing the detailed balance condition is:

P{Θ(t)A,Θ(t+1)B}=P{Θ(t)B,Θ(t+1)A}\begin{align} \P\{\Theta^{(t)} \in A, \Theta^{(t+1)} \in B\} = \P\{\Theta^{(t)} \in B, \Theta^{(t+1)} \in A\} \end{align}

assuming that Θ(t)A\Theta^{(t)} \in A and the conditional distribution of Θ(t+1)\Theta^{(t+1)} given Θ(t)\Theta^{(t)} is given by the Markov chain. Also (14) is required to hold for all subsets AA and BB of the state space.

In the case when the Markov chain transitions are given by densities Q(x,)Q(x, \cdot), then it is easy to show that (14) is equivalent to (13). For example, if we restrict AA to be a small region around a point xx, and BB to be a small region around a point yy, then the left hand side of (14) is approximately π(x)Q(x,y)vol(A)vol(B)\pi(x) Q(x, y) \text{vol}(A) \text{vol}(B) and the right hand side is π(y)Q(y,x)vol(A)vol(B)\pi(y) Q(y, x) \text{vol}(A) \text{vol}(B). One can then cancel the product of the volume terms to obtain (13) from (14).

On the other hand, when the Markov chain transitions do not have a density, we need to work with the general condition (14).

Detailed Balance for HMC

We shall now derive the detailed balance condition (14) for HMC. We need to prove that:

P{(x(t),v(t))A,(x(t+1),v(t+1))B}=P{(x(t),v(t))B,(x(t+1),v(t+1))A}\begin{align} \P \left\{(x^{(t)}, v^{(t)}) \in A, (x^{(t+1)}, v^{(t+1)}) \in B \right\} = \P \left\{(x^{(t)}, v^{(t)}) \in B, (x^{(t+1)}, v^{(t+1)}) \in A \right\} \end{align}

where (x(t),v(t))(x^{(t)}, v^{(t)}) has density proportional to exp(H(x,v))\exp(-H(x, v)). We compute the left hand side as

P{(x(t),v(t))A,(x(t+1),v(t+1))B}=Aexp(H(x,v))P{(x(t+1),v(t+1))Bx(t)=x,v(t)=v}dxdv\begin{align*} & \P \left\{(x^{(t)}, v^{(t)}) \in A, (x^{(t+1)}, v^{(t+1)}) \in B \right\} \\ &= \int_A \exp(-H(x, v)) \P \left\{(x^{(t+1)}, v^{(t+1)}) \in B \mid x^{(t)} = x, v^{(t)} = v \right\} dx dv \end{align*}

To write the conditional probability P{(x(t+1),v(t+1))Bx(t)=x,v(t)=v}\P \left\{(x^{(t+1)}, v^{(t+1)}) \in B \mid x^{(t)} = x, v^{(t)} = v \right\}, note that (x(t+1),v(t+1))(x^{(t+1)}, v^{(t+1)}) either equals STϵdisc,N(x,v)S T_{\epsilon}^{\text{disc}, N}(x, v) with probability min(1,exp(H(x(t),v(t))H(STϵdisc,N(x,v)))\min(1, \exp(H(x^{(t)}, v^{(t)}) - H(S T_{\epsilon}^{\text{disc}, N}(x, v))) or it equals (x,v)(x, v) with the opposite probability. For simplicity of notation, let us denote the function STϵdisc,NS T_{\epsilon}^{\text{disc}, N} by Υ\Upsilon. Thus

P{(x(t+1),v(t+1))Bx(t)=x,v(t)=v}=min(1,exp(H(Υ(x,v)))exp(H(x,v)))I{Υ(x,v)B}+[1min(1,exp(H(Υ(x,v)))exp(H(x,v)))]I{(x,v)B}.\begin{align*} & \P \left\{(x^{(t+1)}, v^{(t+1)}) \in B \mid x^{(t)} = x, v^{(t)} = v \right\} \\ &= \min \left(1,\frac{\exp(-H(\Upsilon(x, v)))}{\exp(-H(x, v))} \right) I\{\Upsilon(x, v) \in B\} + \left[1 - \min \left(1,\frac{\exp(-H(\Upsilon(x, v)))}{\exp(-H(x, v))} \right) \right] I\{(x, v) \in B\}. \end{align*}

We then get

P{(x(t),v(t))A,(x(t+1),v(t+1))B}=Aexp(H(x,v))P{(x(t+1),v(t+1))Bx(t)=x,v(t)=v}dxdv=Aexp(H(x,v))min(1,exp(H(Υ(x,v)))exp(H(x,v)))I{Υ(x,v)B}dxdv+Aexp(H(x,v))[1min(1,exp(H(Υ(x,v)))exp(H(x,v)))]I{(x,v)B}dxdv=Amin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)B}dxdv+AB[exp(H(x,v))min(exp(H(x,v)),exp(H(Υ(x,v))))]dxdv.\begin{align*} & \P \left\{(x^{(t)}, v^{(t)}) \in A, (x^{(t+1)}, v^{(t+1)}) \in B \right\} \\ &= \int_A \exp(-H(x, v)) \P \left\{(x^{(t+1)}, v^{(t+1)}) \in B \mid x^{(t)} = x, v^{(t)} = v \right\} dx dv \\ &= \int_A \exp(-H(x, v)) \min \left(1,\frac{\exp(-H(\Upsilon(x, v)))}{\exp(-H(x, v))} \right) I\{\Upsilon(x, v) \in B\} dx dv \\ &+ \int_A \exp(-H(x, v)) \left[1 - \min \left(1,\frac{\exp(-H(\Upsilon(x, v)))}{\exp(-H(x, v))} \right) \right] I\{(x, v) \in B\} dx dv \\ &= \int_A \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in B\} dx dv \\ &+ \int_{A \cap B} \left[\exp(-H(x, v)) - \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) \right] dx dv. \end{align*}

By the same argument (but with AA and BB switched), the right hand side of (15) satisfies:

P{(x(t),v(t))B,(x(t+1),v(t+1))A}=Bmin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)A}dxdv+BA[exp(H(x,v))min(exp(H(x,v)),exp(H(Υ(x,v))))]dxdv.\begin{align*} &\P \left\{(x^{(t)}, v^{(t)}) \in B, (x^{(t+1)}, v^{(t+1)}) \in A \right\} \\ &= \int_B \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in A\} dx dv \\ &+ \int_{B \cap A} \left[\exp(-H(x, v)) - \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) \right] dx dv. \end{align*}

It is easy to see that the second terms in the expressions for P{(x(t),v(t))A,(x(t+1),v(t+1))B}\P \left\{(x^{(t)}, v^{(t)}) \in A, (x^{(t+1)}, v^{(t+1)}) \in B \right\} and P{(x(t),v(t))B,(x(t+1),v(t+1))A}\P \left\{(x^{(t)}, v^{(t)}) \in B, (x^{(t+1)}, v^{(t+1)}) \in A \right\} above are identical. So we only need to prove that the first terms also coincide i.e., we need to show

Amin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)B}dxdv=Bmin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)A}dxdv\begin{split} &\int_A \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in B\} dx dv \\ &= \int_B \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in A\} dx dv \end{split}

for all AA and BB. For this, let us take the right hand side integral and use the change of variable (y,w)=Υ(x,v)(y, w) = \Upsilon(x, v), or equivalently (x,v)=Υ1(y,w)(x, v) = \Upsilon^{-1}(y, w). Because Υ=STϵdisc,N\Upsilon = ST_{\epsilon}^{\text{disc}, N} is an involution, we have Υ1=Υ\Upsilon^{-1} = \Upsilon. Thus

Bmin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)A}dxdv=min(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)A,(x,v)B}dxdv=min(exp(H(y,w)),exp(H(Υ(y,w))))I{Υ(y,w)A,(y,w)B}dydwdetJΥ(y,w).\begin{align*} & \int_B \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in A\} dx dv \\ &= \int \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in A, (x, v) \in B\} dx dv \\ &= \int \min \left(\exp(-H(y, w)), \exp(-H(\Upsilon(y, w))) \right) I\{\Upsilon(y, w) \in A, (y, w) \in B\} dy dw |\det J \Upsilon(y, w)|. \end{align*}

Because Υ\Upsilon is volume-preserving (we proved, in the last lecture, that leapfrog discretization preserves volumes), the determinant term above equals one. We thus have

Bmin(exp(H(x,v)),exp(H(Υ(x,v))))I{Υ(x,v)A}dxdv=min(exp(H(y,w)),exp(H(Υ(y,w))))I{Υ(y,w)A,(y,w)B}dydw\begin{align*} & \int_B \min \left(\exp(-H(x, v)), \exp(-H(\Upsilon(x, v))) \right) I\{\Upsilon(x, v) \in A\} dx dv \\ &= \int \min \left(\exp(-H(y, w)), \exp(-H(\Upsilon(y, w))) \right) I\{\Upsilon(y, w) \in A, (y, w) \in B\} dy dw \end{align*}

Similarly, one can also prove that the left hand side of (20) also equals the above. This completes the proof of (20) establishing detailed balance of the HMC markov chain.

For more details behind HMC, the paper neal2011mcmc is highly recommended.