Let Tσ denote the function which maps (x,v) to (x(σ),v(σ)) obtained by solving (6) for t∈[0,σ]. We saw in the last lecture that Tσ is invertible (specifically, TσS=(TσS)−1 where S is the velocity sign-flip operator), Hamiltonian preserving (i.e., H(Tσ(x,v))=H(x,v)) and Volume preserving (i.e., the determinant of the Jacobian of Tσ always equals 1).
The HMC ODE (6) cannot be solved exactly in practice, so one attempts to solve it approximately via discretization by fixing a step size ϵ and solving (6) approximately for t=0,ϵ,2ϵ,…. To go from (x(t),v(t)) to (x(t+ϵ),v(t+ϵ)), one uses the leapfrog discretization:
If we denote this map from (x(t),y(t)) to (x(t+ϵ),y(t+ϵ)) by Tϵdisc(x,v) and then apply this mapping N times in succession starting from (x,v), then the overall mapping is given by
In the last lecture, we observed that Tϵdisc,N is also invertible (with STϵdisc,N=(STϵdisc,N)−1) and volume preserving. But it does not preserve the Hamiltonian i.e., H(Tϵdisc,N(x,v)) can be different from H(x,v).
While implementing leapfrog discretization to compute x(Nϵ),v(Nϵ), it can be checked that the following iterations can be used which combine the two velocity updates into one:
for j=1,…,N−1. This iteration needs to be initialized using x(0)=x and v(ϵ/2)=v(0)+(ϵ/2)∇logπ(x(0)). At iteration N−1, we obtain x((N−1)ϵ) and v((N−1/2)ϵ). From here x(Nϵ) and v(Nϵ) can be computed by:
We are now ready to formally describe the HMC algorithm. The algorithm starts with an arbitrary initialization x(0) and then repeats the following for t=0,1,2,…:
Generate v(t)∼N(0,Id).
Run discretized Leapfrog Hamiltonian dynamics starting at (x(t),v(t)) for some fixed step size ϵ and trajectory length σ. This gives (y,w)=Tϵdisc,N(x,v) (here N=σ/ϵ is the number of leapfrog steps).
Undefined control sequence: \1 at position 124: …, w)) \right), \̲1̲em]
x^{(t)} & …
x^{(t+1)} =
\begin{cases}
y & \text{with probability } \min \left(1, \exp(H(x^{(t)},
v^{(t)}) - H(y, w)) \right), \1em]
x^{(t)} & \text{with probability } 1 - \min \left(1, \exp(H(x^{(t)},
v^{(t)}) - H(y, w)) \right).
\end{cases}
We need to verify that the Markov Chain given by the above algorithm maintains detailed balance with respect to π. This is difficult to verify directly for the chain with transitions to x(t+1) from x(t). Instead, we shall verify detailed balance for the chain which transitions to (x(t+1),v(t+1)) from (x(t),v(t)). Note that v(t+1) does not appear in the above description of the HMC algorithm. We rewrite the algorithm below using v(t+1) explicitly.
In addition to using v(t+1) explicitly, another change in the following algorithm is that the equation (y,w)=Tϵdisc,N(x,v) is now changed to (y,w)=STϵdisc,N(x,v). In other words, we do a velocity flip after running the discretized Hamiltonian dynamics. This velocity flip makes the deterministic transition from (x,v) to (y,w) an involution (i.e., its inverse is itself: (STϵdisc,N)−1=STϵdisc,N) which makes the argument for detailed balance very clean. Note also that the velocity flip does not change the Hamiltonian (because H(y,w)=H(y,−w) as the dependence on w is through ∥w∥2) so the acceptance probability does not change. Algorithmically, both the versions of the algorithm lead to identical output.
Generate v(t)∼N(0,Id).
Run discretized Leapfrog Hamiltonian dynamics (followed by a velocity flip) starting at (x(t),v(t)) for some fixed step size ϵ and trajectory length σ. This gives (y,w)=STϵdisc,N(x,v) (here N=σ/ϵ is the number of leapfrog steps).
Note that, under π~, the variables x∼π and v∼N(0,Id) are independent (because H(x,v)=−logπ(x)+∥v∥2/2.
The transitions of the HMC Markov Chain from (x,v) to (y,w) are deterministic (given by the involutive map STϵdisc,N) so we need to be careful about the form of detailed balance.
So far in this course (e.g., see Lectures 25 and 31), we have used the following definition of detailed balance. The Markov Chain with transition densities given by Q(x,y) satisfies detailed balance with respect to the density π(x) provided:
This formulation is only true if the transitions have a density given by Q(x,⋅). For HMC (viewed as a chain over (x(t),v(t))), the transitions are deterministic so we cannot use the detailed balance characterization in (13). The more general way of writing the detailed balance condition is:
assuming that Θ(t)∈A and the conditional distribution of Θ(t+1) given Θ(t) is given by the Markov chain. Also (14) is required to hold for all subsets A and B of the state space.
In the case when the Markov chain transitions are given by densities Q(x,⋅), then it is easy to show that (14) is equivalent to (13). For example, if we restrict A to be a small region around a point x, and B to be a small region around a point y, then the left hand side of (14) is approximately π(x)Q(x,y)vol(A)vol(B) and the right hand side is π(y)Q(y,x)vol(A)vol(B). One can then cancel the product of the volume terms to obtain (13) from (14).
On the other hand, when the Markov chain transitions do not have a density, we need to work with the general condition (14).
To write the conditional probability P{(x(t+1),v(t+1))∈B∣x(t)=x,v(t)=v}, note that (x(t+1),v(t+1)) either equals STϵdisc,N(x,v) with probability min(1,exp(H(x(t),v(t))−H(STϵdisc,N(x,v))) or it equals (x,v) with the opposite probability. For simplicity of notation, let us denote the function STϵdisc,N by Υ. Thus
It is easy to see that the second terms in the expressions for P{(x(t),v(t))∈A,(x(t+1),v(t+1))∈B} and P{(x(t),v(t))∈B,(x(t+1),v(t+1))∈A} above are identical. So we only need to prove that the first terms also coincide i.e., we need to show
for all A and B. For this, let us take the right hand side integral and use the change of variable (y,w)=Υ(x,v), or equivalently (x,v)=Υ−1(y,w). Because Υ=STϵdisc,N is an involution, we have Υ−1=Υ. Thus
Because Υ is volume-preserving (we proved, in the last lecture, that leapfrog discretization preserves volumes), the determinant term above equals one. We thus have
Similarly, one can also prove that the left hand side of (20) also equals the above. This completes the proof of (20) establishing detailed balance of the HMC markov chain.
For more details behind HMC, the paper neal2011mcmc is highly recommended.