Spring 2026, UC Berkeley
We will study the MALA and HMC algorithms. MALA stands for Metropolis Adjusted Langevin Algorithm. HMC stands for Hamiltonian Monte Carlo. Before we discuss MALA, let us first recall the basics of markov chains.
Recap: Markov Chains, detailed balance and the Metropolis-Hastings Algorithm¶
Let be a state space. A sequence of random variables taking values in is a Markov Chain on provided the conditional distribution of given only depends on . Further, if this conditional distribution is the same for all , then the Markov chain is said to be time-homogeneous. We will only deal with time-homogeneous Markov Chains in this course.
A (time-homogeneous) Markov Chain is described via the conditional distribution of given , which is described by a probability transition kernel . If is discrete, is the conditional probability of given . If is an open subset of , represents the conditional density of given .
The Markov chain given by satisfies detailed balance with respect to a probability on if the following condition is satisfied:
This detailed balance condition implies that is stationary for the chain in the sense that implies .
Stationarity along with the additional condition of irreducibility (which means that for every pair of points and in the state space, there is a positive probability that the chain goes from to in finite time) imply ergodicity which means:
almost surely.
Metropolis-Hastings is the most widely used method for constructing markov chains satisfying detailed balance with respect to . This method starts with an arbitrary transition kernel that may not even have anything to do with (in particular, it is not at all necessary that satisfies detailed balance with respect to ). Metropolis-Hastings uses and to construct a new Markov chain which moves as follows. Given , first use to generate , then take to be:
Undefined control sequence: \1 at position 123: …(x,y)}\right), \̲1̲em]
x, & \text{…
\Theta^{(t+1)} =
\begin{cases}
y, & \text{with probability } \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right), \1em]
x, & \text{with probability } 1 - \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right).
\end{cases}The resulting Metropolis-Hastings chain satisfies detailed balance with respect to , as we saw previously in Lecture 25.
Random Walk Metropolis (RWM)¶
The Random Walk Metropolis (RWM) algorithm is arguably the simplest example of a markov chain which satisfies detailed balance with respect to . It uses the transition kernel which generates proposals via:
In other words, the proposal transition kernel is given by
Clearly so the resulting acceptance probability (in the Metropolis-Hastings algorithm) is simply
The choice of (in (5)) is crucial to the practical performance of the algorithm. If is not small, then may result in proposals that are far from . If is highly concentrated (as is typically the case with posteriors), then might be much smaller than leading to the chain rejecting often and getting stuck. On the other hand if is too small, then the chain might take a long time to explore the full distribution .
Metropolis Adjusted Langevin Algorithm (MALA)¶
In MALA, the rwm proposal (4) is modified with an expression that depends on . To find this formula, let us first consider the following proposal:
for some function depending on . Clearly (4) can be seen as a special case of (7) corresponding to . The proposal transition kernel corresponding to (7) is given by:
Note that is no longer equal to . The key question is how to choose . It is natural to choose so that the detailed balance condition holds with respect to :
at least approximately when is close to . Check that
As a result:
For close to , we use the approximations:
This leads to
when is close to . This clearly suggests taking
This leads to the MALA proposal:
When is small, is automatically close to , so the above argument can be used to show that the Metropolis-Hastings acceptance ratio for this proposal is nearly equal to 1.
Mechanics Interpretation¶
We can interpret the RWM proposal (4) in physical terms as follows:
Similarly, we can interpret the MALA proposal (15) in physical terms as follows:
The physical analogy makes the difference between RWM and MALA transparent. In both cases the particle starts at with a random velocity , and the proposal is the position after time . The distinction lies in the dynamics:
RWM is a free particle: no force acts, so the particle travels in a straight line at constant velocity. The proposal is purely isotropic and blind to the target — it has no tendency to move toward regions of high probability.
MALA subjects the particle to a constant acceleration , i.e. a force pointing in the direction of steepest ascent of . By Newton’s second law,
which is precisely the Langevin proposal (15). The drift term biases every proposal uphill in , so the chain is steered toward high-probability regions before the Metropolis correction is even applied.
As a consequence, MALA proposals are systematically better aligned with than RWM proposals. In practice this allows MALA to take much larger steps while maintaining a reasonable acceptance rate, leading to faster exploration of the target distribution.
In the coming lectures, we shall look at HMC, which is a further modification of (17). Specifically, the HMC proposal is obtained via:
The step from MALA to HMC is precisely the step from constant acceleration to position-dependent acceleration. In MALA the acceleration is evaluated once at the current position and held fixed for the entire flight time . In HMC the acceleration is updated continuously along the trajectory: at every time the particle feels . One downside to this time-dependent acceleration is that usually cannot be written in closed form as a function of and (unlike RWM and MALA), and some numerical integration scheme has to be used to figure out .