Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

We will start our discussion of MCMC. The simplest MCMC method is the Metropolis Algorithm.

import numpy as np
import matplotlib.pyplot as plt

Example One

The goal is to use the Metropolis Algorithm to approximate the following density.

def f(th): 
    return (np.sin(8.5 * th) ** 2) * (th >= 0) * (th <= 1) * 2*np.exp(th)

th = np.arange(0, 1.001, 0.001)
plt.plot(th, f(th))
plt.xlabel("theta")
plt.ylabel("Unnormalized Density")
plt.show()
<Figure size 640x480 with 1 Axes>

The above function is not a quite a density because it does not integrate to one. Its integral is given by:

from scipy.integrate import quad
val, err = quad(f, 0, 1)
print(val)
1.8775056565536072

Below is the Random-Walk Metropolis algorithm for obtaining samples approximating this density. One gets a different version of the algorithm for different methods of selecting the candidate.

nit = 400000
path = np.empty(nit)
state = 0.4 #initialization
path[0] = state
rej = 0

for i in range(1, nit):
    #candidate = np.random.uniform(state - 0.4, state + 0.4)
    #candidate = np.random.normal(loc = state, scale = 0.08)
    candidate = np.random.normal(loc = state, scale = 2)
    #candidate = np.random.uniform(0, 1) #note here candidate does not depend on the current state
    ratio = f(candidate) / f(state)
    rnum = np.random.uniform(0, 1)
    if rnum >= ratio: 
        rej += 1
    else:
        state = candidate
    path[i] = state

print("Rejection rate:", rej / nit)

plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
plt.plot(th, f(th)/val, color = "red")
plt.show()
Rejection rate: 0.8909575
<Figure size 640x480 with 1 Axes>

Below is the ‘trace plot’ (i.e., plot of θ(t)\theta^{(t)}) for the last 2000 iterations.

#Here are the last 2000 iterations of the path: 
plt.plot(path[nit - 2000:], color = "blue")
plt.xlabel("Iteration")
plt.ylabel("state")
plt.show()
<Figure size 640x480 with 1 Axes>

Example Two: Metropolis for Bayesian Beta-Binomial inference

Consider the simple Beta-Binomial example. Prior is Beta(a, b) and Likelihood is XBin(n,θ)X \sim Bin(n, \theta). We have seen previously that tthe posterior is given by Bin(X+a,nX+b)Bin(X + a, n-X + b).

Here are the functions for the likelihood, prior and (unnormalized) posterior.

from scipy.special import gammaln #this is to calculate the Binomial coefficients

a = 12
b = 120000
x = 100
n = 500000

def lik(th):
    if th < 0 or th > 1:
        return 0.0
    return np.exp(gammaln(n + 1) - gammaln(x + 1) - gammaln(n - x + 1) + x * np.log(th) + (n - x) * np.log(1 - th))

def prior(th):
    if th < 0 or th > 1:
        return 0.0
    return np.exp((a - 1) * np.log(th) + (b - 1) * np.log(1 - th))

def posterior(th):
    return lik(th) * prior(th)

from scipy.integrate import quad
val, err = quad(posterior, 0, 20*(x/n))
print(val, err)
5.907639805069782e-58 1.0821820585301677e-57
nit = 400000
path = np.empty(nit)
state = x/n #initialization
path[0] = state
rej = 0

for i in range(1, nit):
    candidate = np.random.normal(loc = state, scale = 1e-5) #scale 1e-5 works
    #candidate = np.random.normal(loc = state, scale = 1e-2) #does not work (poor mixing)
    ratio = posterior(candidate) / posterior(state)
    rnum = np.random.uniform(0, 1)
    if rnum >= ratio: 
        rej += 1
    else:
        state = candidate
    path[i] = state

print("Rejection rate:", rej / nit)

plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
#plotting the Beta(a + x, b + n - x) density for comparison
from scipy.stats import beta
th = np.sort(path)
plt.plot(th, beta.pdf(th, a + x, b + n - x), color = "red") 
plt.show()
Rejection rate: 0.1828175
<Figure size 640x480 with 1 Axes>
#Here are the last 2000 iterations of the path: 
plt.plot(path[nit - 2000:], color = "blue")
plt.xlabel("Iteration")
plt.ylabel("state")
plt.show()
<Figure size 640x480 with 1 Axes>

When the scale of the proposal distribution is changed to 1e21e-2 from 1e51e-5, the algorithm has very poor mixing.

Example Three: Binomial inference with non-conjugate prior

Now consider the problem of estimating θ\theta from XBin(n,θ)X \sim Bin(n, \theta) with the prior log(θ/(1θ))N(μ,σ2)\log (\theta/(1 - \theta)) \sim N(\mu, \sigma^2). The prior density for θ\theta is given by:

fθ(θ)=12πσ1θ(1θ)exp(12(logθ1θμ)2)\begin{align*} f_{\theta}(\theta) = \frac{1}{\sqrt{2 \pi} \sigma} \frac{1}{\theta(1 - \theta)} \exp \left(-\frac{1}{2} \left(\log \frac{\theta}{1 - \theta} - \mu \right)^2 \right) \end{align*}

Here is the code for the prior and posterior.

#Now we will use the prior log(\theta/1-\theta) \sim N(mu, sig^2) instead of the Beta(a, b) prior.
mu = -9
sig = 0.3
print(mu, sig) #these values are chosen to make the prior density look similar to the Beta(12, 120000) prior density

def prior(th):
    if th < 0 or th > 1:
        return 0.0
    z = np.log(th / (1 - th))
    return np.exp(-0.5 * ((z - mu) / sig)**2) / (sig * np.sqrt(2 * np.pi) * th * (1 - th))

def posterior(th):
    return prior(th) * lik(th)
-9 0.3

Below is the Metropolis code.

nit = 400000
path = np.empty(nit)
state = x/n #initialization
path[0] = state
rej = 0

for i in range(1, nit):
    candidate = np.random.normal(loc = state, scale = 1e-5) #scale 1e-5 works
    #candidate = np.random.normal(loc = state, scale = 1e-2) #does not work (poor mixing)
    ratio = posterior(candidate) / posterior(state)
    rnum = np.random.uniform(0, 1)
    if rnum >= ratio: 
        rej += 1
    else:
        state = candidate
    path[i] = state

print("Rejection rate:", rej / nit)

plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
#plotting the Beta(a + x, b + n - x) density for comparison
th = np.sort(path)
post_unnorm = np.array([posterior(t) for t in th])
plt.plot(th, post_unnorm / np.trapezoid(post_unnorm, th), color="red")
plt.show()
Rejection rate: 0.1685225
<Figure size 640x480 with 1 Axes>

Again the mixing is good when the scale of the Gaussian proposal distribution is 1e51e-5, and poor when it is 1e21e-2.

Example Four: Normal Likelihood and prior

The following is a simple example with prior given by θN(μ),σ02)\theta \sim N(\mu_), \sigma_0^2) and likelihood given by N(θ,σ2)N(\theta, \sigma^2). We are considering posterior inference with X=130X = 130 (observation), σ=5\sigma = 5, μ0=120\mu_0 = 120 and σ0=10\sigma_0 = 10.

X, sig = 130, 5
mu_prior, sig_prior = 120, 10

def dnorm(x, mu, s):
    return np.exp(-0.5 * ((x - mu) / s)**2) / (s * np.sqrt(2 * np.pi))

def lik(th):
    return dnorm(X, th, sig)

def prior(th):
    return dnorm(th, mu_prior, sig_prior)

def post(th):
    return prior(th) * lik(th)

The Metropolis algorithm is coded below.

nit = 400000
path = np.empty(nit)
state = 130.0
path[0] = state
rej = 0

for it in range(1, nit):
    cand = np.random.normal(loc = state, scale = 3)
    ratio = post(cand) / post(state)
    u = np.random.uniform()
    if u < ratio:
        state = cand
    else:
        rej += 1
    path[it] = state

print(rej / nit)

plt.hist(path, bins=1000, density=True)
#True posterior:
mu_post = ((X / sig**2) + (mu_prior / sig_prior**2)) / ((1 / sig**2) + (1 / sig_prior**2))
sig_post = np.sqrt(1 / ((1 / sig**2) + (1 / sig_prior**2)))
th = np.arange(path.min(), path.max(), 0.01)
plt.plot(th, dnorm(th, mu_post, sig_post), color="red")
plt.show()

Example Five: Linear Regression

This is our first multidimensional example. The problem is simple linear regression so there are three parameters: β0,β1,σ\beta_0, \beta_1, \sigma.

import pandas as pd
height = pd.read_csv("PearsonHeightData.txt", sep = r"\s+")
print(height)
      Father   Son
0       65.0  59.8
1       63.3  63.2
2       65.0  63.3
3       65.8  62.8
4       61.1  64.3
...      ...   ...
1073    67.0  70.8
1074    71.3  68.3
1075    71.8  69.3
1076    70.7  69.3
1077    70.3  67.0

[1078 rows x 2 columns]
plt.figure(figsize = (10, 4))
plt.scatter(height["Father"], height["Son"])
plt.xlabel("Father's height")
plt.ylabel("Son's height")
plt.title("Scatter plot of heights data")
plt.show()
<Figure size 1000x400 with 1 Axes>
n = height.shape[0]
X = np.column_stack([np.ones(n), height["Father"]]) #this is the X matrix (consisting of covariates and the column of ones)
y = height["Son"] #this is the y vector consisting of response values

For comparison, here is the least squares fit.

import statsmodels.api as sm
linmod = sm.OLS(y, X).fit()
print(linmod.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    Son   R-squared:                       0.251
Model:                            OLS   Adj. R-squared:                  0.250
Method:                 Least Squares   F-statistic:                     360.9
Date:                Mon, 30 Mar 2026   Prob (F-statistic):           1.27e-69
Time:                        13:56:31   Log-Likelihood:                -2489.4
No. Observations:                1078   AIC:                             4983.
Df Residuals:                    1076   BIC:                             4993.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         33.8928      1.833     18.491      0.000      30.296      37.489
x1             0.5140      0.027     18.997      0.000       0.461       0.567
==============================================================================
Omnibus:                       17.527   Durbin-Watson:                   0.765
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               30.642
Skew:                          -0.052   Prob(JB):                     2.22e-07
Kurtosis:                       3.819   Cond. No.                     1.67e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.67e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

The estimate of σ\sigma is obtained as follows.

bhat = linmod.params.values
print(bhat)
Sbhat = np.sum(linmod.resid ** 2)
m = X.shape[1] - 1 #in this case, m = 1
sighat = np.sqrt(Sbhat/(n-m-1)) #this is also called the residual standard error
print(sighat)
[33.89280054  0.51400591]
2.438134379352427
plt.figure(figsize = (10, 7))
plt.scatter(height["Father"], height["Son"])
x = height["Father"].values
plt.plot(x, linmod.fittedvalues, 'r')
plt.xlabel("Father's height")
plt.ylabel("Son's height")
plt.title("Scatter plot of heights data")
plt.show()
<Figure size 1000x700 with 1 Axes>

We now define the likelihood, prior and posterior for Bayesian inference with the Metropolis algorithm. We shall parametrize the problem in terms of β0,β1\beta_0, \beta_1 and logσ\log \sigma (instead of σ\sigma).

n = len(y)
x = height["Father"].to_numpy()
y = height["Son"].to_numpy()

def loglik(th):
    beta0, beta1, logsigma = th
    sigma = np.exp(logsigma)
    return -n * np.log(sigma) - np.sum((y - beta0 - beta1 * x) ** 2) / (2 * sigma**2)
def logprior(th):
    beta0, beta1, logsigma = th
    if -200 <= beta0 <= 200 and -200 <= beta1 <= 200 and -200 <= logsigma <= 200:
        return np.log((1/400) * (1/400) * (1/400))
    return -np.inf
#Unnormalized Posterior
def logpost(th):
    return loglik(th) + logprior(th)

We are now working with log-prior, log-likelihood and log-posterior instead of posterior. This makes the computations stable.

The challenge now is in the proposal distribution. We will take it to be given by independent normals. The scales of the normals need to be chosen carefully though. Note that here that the variables here β0,β1,logσ\beta_0, \beta_1, \log \sigma are on different scales (as can be seen from the estimates and standard errors in the linear regression summary).

nit = 400000
path = np.zeros((nit, 3))
state = np.array([0.0, 0.0, 0.0])
path[0] = state
rej = 0

for it in range(1, nit):
    cand = state + np.array([np.random.normal(scale=0.1),np.random.normal(scale=0.001),np.random.normal(scale=0.01)]) #this works well
    #cand = state + np.array([np.random.normal(scale=0.1),np.random.normal(scale=0.1),np.random.normal(scale=0.1)]) #this does not work
    logratio = logpost(cand) - logpost(state)
    if np.log(np.random.uniform()) < logratio:
        state = cand
    else:
        rej += 1
    path[it] = state

print(rej / nit)
0.47582
beta0, beta1, logsigma = path[100000:, 0], path[100000:, 1], path[100000:, 2] #we are discarding the first 100000 iterations as burn-in

print(np.mean(beta0), np.std(beta0))
print(np.mean(beta1), np.std(beta1))
print(np.mean(logsigma), np.std(logsigma))
print(np.exp(np.mean(logsigma)), np.exp(np.std(logsigma)))

print(linmod.summary())
print(sighat)
33.604022436453214 1.9236727765562034
0.5182672403018774 0.028400803552315227
0.8916462091055167 0.021533275635037
2.439141685312359 1.0217667897104907
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    Son   R-squared:                       0.251
Model:                            OLS   Adj. R-squared:                  0.250
Method:                 Least Squares   F-statistic:                     360.9
Date:                Mon, 30 Mar 2026   Prob (F-statistic):           1.27e-69
Time:                        15:04:03   Log-Likelihood:                -2489.4
No. Observations:                1078   AIC:                             4983.
Df Residuals:                    1076   BIC:                             4993.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         33.8928      1.833     18.491      0.000      30.296      37.489
x1             0.5140      0.027     18.997      0.000       0.461       0.567
==============================================================================
Omnibus:                       17.527   Durbin-Watson:                   0.765
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               30.642
Skew:                          -0.052   Prob(JB):                     2.22e-07
Kurtosis:                       3.819   Cond. No.                     1.67e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.67e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
2.438134379352427
plt.figure(figsize=(10, 8))
plt.subplot(3, 2, 1); plt.plot(beta0)
plt.subplot(3, 2, 2); plt.hist(beta0, bins=30)
plt.subplot(3, 2, 3); plt.plot(beta1)
plt.subplot(3, 2, 4); plt.hist(beta1, bins=30)
plt.subplot(3, 2, 5); plt.plot(logsigma); plt.xlabel("MC iteration")
plt.subplot(3, 2, 6); plt.hist(logsigma, bins=30)
plt.tight_layout()
plt.show()

print(np.mean(beta0), np.mean(beta1), np.mean(logsigma))
<Figure size 1000x800 with 6 Axes>
33.604022436453214 0.5182672403018774 0.8916462091055167