Let S be a state space. A sequence of random variables Θ(0),Θ(1),… taking values in S is a Markov Chain on S provided the conditional distribution of Θ(t+1) given Θ(t)=θ(t),Θ(t−1)=θ(t−1),…,Θ(0)=θ(0) only depends on θ(t). Further, if this conditional distribution is the same for all t, then the Markov chain is said to be time-homogeneous. We will only deal with time-homogeneous Markov Chains in this course.
A (time-homogeneous) Markov Chain is described via the conditional distribution of Θ(t+1) given Θ(t)=x. If S is finite or countable, then these conditional distributions can be described by the probabilities:
If S is an open subset of Rd, then one usually describes the conditional distribution of Θ(t+1) given Θ(t)=x via the conditional density function P(x,y):
Suppose {Θ(t)} is a time-homogeneous Markov chain that is irreducible. Suppose π is the stationary distribution of the Markov Chain (irreducibility guarantess uniqueness of the stationary distribution). Then
almost surely. For a reference to this result, see RobertsRosenthal2004. For a comprehensive book-length reference, see MeynTweedie2009 (Chapter 17 of this book contains this result).
When the state space is finite, irreducibility means that for every pair of points x and y in the state space, there is a positive probability that the chain goes from x to y. For continuous state spaces, irreducibility is somewhat technical to define. If you are interested, see again the aforementioned references: RobertsRosenthal2004MeynTweedie2009.
Note that one gets the same condition by also arguing from (5) (in this case, P(x,y) and P(y,x) appearing in (8) should be interpreted as conditional pdfs as opposed to pmfs).
Here is a simple example of a Markov Chain satisfying detailed balance.
Our next example of a Markov Chain with detailed balance is the Metropolis-Hastings chain.
Suppose π is a probability on a state space S. If S is discrete, π should be interpreted as a pmf and if S is continuous, π should be interpreted as a pdf. Let Q(x,y) be some transition probability. Again if S is discrete, interpret Q(x,y) as a pmf over y for each fixed x. If S is continuous, interpret Q(x,y) as a pdf over y for each fixed x.
We do not assume that π(⋅) and Q(⋅,⋅) are related in any way. For example, there is no reason for Q to satisfy detailed balance with respect to π. In other words, for x and y, it may very well happen that
The Metropolis-Hastings uses Q(⋅,⋅) and π(⋅) to construct a new Markov chain {Θ(t)} which moves as follows. Given Θ(t)=x, first use Q(x,⋅) to generate y, then take Θ(t+1) to be:
Undefined control sequence: \1 at position 123: …(x,y)}\right), \̲1̲em]
x, & \text{…
\Theta^{(t+1)} =
\begin{cases}
y, & \text{with probability } \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right), \1em]
x, & \text{with probability } 1 - \min\!\left(1, \frac{\pi(y)\,Q(y,x)}{\pi(x)\,Q(x,y)}\right).
\end{cases}
The key is to realize that this new Markov chain satisfies detailed balance with respect to π. To see this, we need to verify:
Similarly π(y)P(y,x)=min(π(x)Q(x,y),π(y)Q(y,x)), which proves (13).
The Metropolis-Hastings Algorithm can be interpreted as a device for converting any Markov Chain into one which satisfies detailed balance with respect to π. It simply adds an acceptance step to the Markov Chain given by Q(⋅,⋅) with acceptance probability:
A simple special case of the Metropolis-Hastings arises when Q(⋅,⋅) is symmetric in the sense that Q(x,y)=Q(y,x). In this case, the acceptance probability is simply min(1,π(y)/π(x)). For example, when Q is given by a random walk Markov chain: y=x+z where z has a symmetric distribution around 0 such as z∼N(0,Σ) or z∼uniform(∏j=1d[−aj,aj]). In this special case, the Metropolis-Hastings chain is simply referred to as the Metropolis Chain.
Our third example of a Markov Chain satisfying detailed balance with respect to π is the Gibbs sampler. Suppose the state space S consists of bivariate pairs (x1,x2). Suppose π(⋅) is such that its conditionals π1∣2(⋅∣x2) and π2∣1(⋅∣x1) are easy to sample from. Here π1∣2(⋅∣x2) is the conditional distribution of x1 given x2 when (x1,x2)∼π. Similarly π2∣1(⋅∣x1) is the conditional distribution of x2 given x1 when (x1,x2)∼π.
The Gibbs sampler employs the following Markov Chain. Given Θ(t)=(x1,x2), first sample one of the indices 1 and 2 with equal probability 0.5. If we get 1, then use the update: