We will start our discussion of MCMC. The simplest MCMC method is the Metropolis Algorithm.
import numpy as np
import matplotlib.pyplot as pltExample One¶
The goal is to use the Metropolis Algorithm to approximate the following density.
def f(th):
return (np.sin(8.5 * th) ** 2) * (th >= 0) * (th <= 1) * 2*np.exp(th)
th = np.arange(0, 1.001, 0.001)
plt.plot(th, f(th))
plt.xlabel("theta")
plt.ylabel("Unnormalized Density")
plt.show()
The above function is not a quite a density because it does not integrate to one. Its integral is given by:
from scipy.integrate import quad
val, err = quad(f, 0, 1)
print(val)1.8775056565536072
Below is the Random-Walk Metropolis algorithm for obtaining samples approximating this density. One gets a different version of the algorithm for different methods of selecting the candidate.
nit = 400000
path = np.empty(nit)
state = 0.4 #initialization
path[0] = state
rej = 0
for i in range(1, nit):
#candidate = np.random.uniform(state - 0.4, state + 0.4)
#candidate = np.random.normal(loc = state, scale = 0.08)
candidate = np.random.normal(loc = state, scale = 2)
#candidate = np.random.uniform(0, 1) #note here candidate does not depend on the current state
ratio = f(candidate) / f(state)
rnum = np.random.uniform(0, 1)
if rnum >= ratio:
rej += 1
else:
state = candidate
path[i] = state
print("Rejection rate:", rej / nit)
plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
plt.plot(th, f(th)/val, color = "red")
plt.show()Rejection rate: 0.8909575

Below is the ‘trace plot’ (i.e., plot of ) for the last 2000 iterations.
#Here are the last 2000 iterations of the path:
plt.plot(path[nit - 2000:], color = "blue")
plt.xlabel("Iteration")
plt.ylabel("state")
plt.show()
Example Two: Metropolis for Bayesian Beta-Binomial inference¶
Consider the simple Beta-Binomial example. Prior is Beta(a, b) and Likelihood is . We have seen previously that tthe posterior is given by .
Here are the functions for the likelihood, prior and (unnormalized) posterior.
from scipy.special import gammaln #this is to calculate the Binomial coefficients
a = 12
b = 120000
x = 100
n = 500000
def lik(th):
if th < 0 or th > 1:
return 0.0
return np.exp(gammaln(n + 1) - gammaln(x + 1) - gammaln(n - x + 1) + x * np.log(th) + (n - x) * np.log(1 - th))
def prior(th):
if th < 0 or th > 1:
return 0.0
return np.exp((a - 1) * np.log(th) + (b - 1) * np.log(1 - th))
def posterior(th):
return lik(th) * prior(th)
from scipy.integrate import quad
val, err = quad(posterior, 0, 20*(x/n))
print(val, err)5.907639805069782e-58 1.0821820585301677e-57
nit = 400000
path = np.empty(nit)
state = x/n #initialization
path[0] = state
rej = 0
for i in range(1, nit):
candidate = np.random.normal(loc = state, scale = 1e-5) #scale 1e-5 works
#candidate = np.random.normal(loc = state, scale = 1e-2) #does not work (poor mixing)
ratio = posterior(candidate) / posterior(state)
rnum = np.random.uniform(0, 1)
if rnum >= ratio:
rej += 1
else:
state = candidate
path[i] = state
print("Rejection rate:", rej / nit)
plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
#plotting the Beta(a + x, b + n - x) density for comparison
from scipy.stats import beta
th = np.sort(path)
plt.plot(th, beta.pdf(th, a + x, b + n - x), color = "red")
plt.show()Rejection rate: 0.1828175

#Here are the last 2000 iterations of the path:
plt.plot(path[nit - 2000:], color = "blue")
plt.xlabel("Iteration")
plt.ylabel("state")
plt.show()
When the scale of the proposal distribution is changed to from , the algorithm has very poor mixing.
Example Three: Binomial inference with non-conjugate prior¶
Now consider the problem of estimating from with the prior . The prior density for is given by:
Here is the code for the prior and posterior.
#Now we will use the prior log(\theta/1-\theta) \sim N(mu, sig^2) instead of the Beta(a, b) prior.
mu = -9
sig = 0.3
print(mu, sig) #these values are chosen to make the prior density look similar to the Beta(12, 120000) prior density
def prior(th):
if th < 0 or th > 1:
return 0.0
z = np.log(th / (1 - th))
return np.exp(-0.5 * ((z - mu) / sig)**2) / (sig * np.sqrt(2 * np.pi) * th * (1 - th))
def posterior(th):
return prior(th) * lik(th)-9 0.3
Below is the Metropolis code.
nit = 400000
path = np.empty(nit)
state = x/n #initialization
path[0] = state
rej = 0
for i in range(1, nit):
candidate = np.random.normal(loc = state, scale = 1e-5) #scale 1e-5 works
#candidate = np.random.normal(loc = state, scale = 1e-2) #does not work (poor mixing)
ratio = posterior(candidate) / posterior(state)
rnum = np.random.uniform(0, 1)
if rnum >= ratio:
rej += 1
else:
state = candidate
path[i] = state
print("Rejection rate:", rej / nit)
plt.hist(path, bins = 1000, density = True)
plt.xlabel("theta")
plt.ylabel("Density")
#plotting the Beta(a + x, b + n - x) density for comparison
th = np.sort(path)
post_unnorm = np.array([posterior(t) for t in th])
plt.plot(th, post_unnorm / np.trapezoid(post_unnorm, th), color="red")
plt.show()Rejection rate: 0.1685225

Again the mixing is good when the scale of the Gaussian proposal distribution is , and poor when it is .
Example Four: Normal Likelihood and prior¶
The following is a simple example with prior given by and likelihood given by . We are considering posterior inference with (observation), , and .
X, sig = 130, 5
mu_prior, sig_prior = 120, 10
def dnorm(x, mu, s):
return np.exp(-0.5 * ((x - mu) / s)**2) / (s * np.sqrt(2 * np.pi))
def lik(th):
return dnorm(X, th, sig)
def prior(th):
return dnorm(th, mu_prior, sig_prior)
def post(th):
return prior(th) * lik(th)The Metropolis algorithm is coded below.
nit = 400000
path = np.empty(nit)
state = 130.0
path[0] = state
rej = 0
for it in range(1, nit):
cand = np.random.normal(loc = state, scale = 3)
ratio = post(cand) / post(state)
u = np.random.uniform()
if u < ratio:
state = cand
else:
rej += 1
path[it] = state
print(rej / nit)
plt.hist(path, bins=1000, density=True)
#True posterior:
mu_post = ((X / sig**2) + (mu_prior / sig_prior**2)) / ((1 / sig**2) + (1 / sig_prior**2))
sig_post = np.sqrt(1 / ((1 / sig**2) + (1 / sig_prior**2)))
th = np.arange(path.min(), path.max(), 0.01)
plt.plot(th, dnorm(th, mu_post, sig_post), color="red")
plt.show()Example Five: Linear Regression¶
This is our first multidimensional example. The problem is simple linear regression so there are three parameters: .
import pandas as pd
height = pd.read_csv("PearsonHeightData.txt", sep = r"\s+")
print(height) Father Son
0 65.0 59.8
1 63.3 63.2
2 65.0 63.3
3 65.8 62.8
4 61.1 64.3
... ... ...
1073 67.0 70.8
1074 71.3 68.3
1075 71.8 69.3
1076 70.7 69.3
1077 70.3 67.0
[1078 rows x 2 columns]
plt.figure(figsize = (10, 4))
plt.scatter(height["Father"], height["Son"])
plt.xlabel("Father's height")
plt.ylabel("Son's height")
plt.title("Scatter plot of heights data")
plt.show()
n = height.shape[0]
X = np.column_stack([np.ones(n), height["Father"]]) #this is the X matrix (consisting of covariates and the column of ones)
y = height["Son"] #this is the y vector consisting of response values
For comparison, here is the least squares fit.
import statsmodels.api as sm
linmod = sm.OLS(y, X).fit()
print(linmod.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Son R-squared: 0.251
Model: OLS Adj. R-squared: 0.250
Method: Least Squares F-statistic: 360.9
Date: Mon, 30 Mar 2026 Prob (F-statistic): 1.27e-69
Time: 13:56:31 Log-Likelihood: -2489.4
No. Observations: 1078 AIC: 4983.
Df Residuals: 1076 BIC: 4993.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 33.8928 1.833 18.491 0.000 30.296 37.489
x1 0.5140 0.027 18.997 0.000 0.461 0.567
==============================================================================
Omnibus: 17.527 Durbin-Watson: 0.765
Prob(Omnibus): 0.000 Jarque-Bera (JB): 30.642
Skew: -0.052 Prob(JB): 2.22e-07
Kurtosis: 3.819 Cond. No. 1.67e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.67e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
The estimate of is obtained as follows.
bhat = linmod.params.values
print(bhat)
Sbhat = np.sum(linmod.resid ** 2)
m = X.shape[1] - 1 #in this case, m = 1
sighat = np.sqrt(Sbhat/(n-m-1)) #this is also called the residual standard error
print(sighat)[33.89280054 0.51400591]
2.438134379352427
plt.figure(figsize = (10, 7))
plt.scatter(height["Father"], height["Son"])
x = height["Father"].values
plt.plot(x, linmod.fittedvalues, 'r')
plt.xlabel("Father's height")
plt.ylabel("Son's height")
plt.title("Scatter plot of heights data")
plt.show()
We now define the likelihood, prior and posterior for Bayesian inference with the Metropolis algorithm. We shall parametrize the problem in terms of and (instead of ).
n = len(y)
x = height["Father"].to_numpy()
y = height["Son"].to_numpy()
def loglik(th):
beta0, beta1, logsigma = th
sigma = np.exp(logsigma)
return -n * np.log(sigma) - np.sum((y - beta0 - beta1 * x) ** 2) / (2 * sigma**2)def logprior(th):
beta0, beta1, logsigma = th
if -200 <= beta0 <= 200 and -200 <= beta1 <= 200 and -200 <= logsigma <= 200:
return np.log((1/400) * (1/400) * (1/400))
return -np.inf#Unnormalized Posterior
def logpost(th):
return loglik(th) + logprior(th)We are now working with log-prior, log-likelihood and log-posterior instead of posterior. This makes the computations stable.
The challenge now is in the proposal distribution. We will take it to be given by independent normals. The scales of the normals need to be chosen carefully though. Note that here that the variables here are on different scales (as can be seen from the estimates and standard errors in the linear regression summary).
nit = 400000
path = np.zeros((nit, 3))
state = np.array([0.0, 0.0, 0.0])
path[0] = state
rej = 0
for it in range(1, nit):
cand = state + np.array([np.random.normal(scale=0.1),np.random.normal(scale=0.001),np.random.normal(scale=0.01)]) #this works well
#cand = state + np.array([np.random.normal(scale=0.1),np.random.normal(scale=0.1),np.random.normal(scale=0.1)]) #this does not work
logratio = logpost(cand) - logpost(state)
if np.log(np.random.uniform()) < logratio:
state = cand
else:
rej += 1
path[it] = state
print(rej / nit)0.47582
beta0, beta1, logsigma = path[100000:, 0], path[100000:, 1], path[100000:, 2] #we are discarding the first 100000 iterations as burn-in
print(np.mean(beta0), np.std(beta0))
print(np.mean(beta1), np.std(beta1))
print(np.mean(logsigma), np.std(logsigma))
print(np.exp(np.mean(logsigma)), np.exp(np.std(logsigma)))
print(linmod.summary())
print(sighat)33.604022436453214 1.9236727765562034
0.5182672403018774 0.028400803552315227
0.8916462091055167 0.021533275635037
2.439141685312359 1.0217667897104907
OLS Regression Results
==============================================================================
Dep. Variable: Son R-squared: 0.251
Model: OLS Adj. R-squared: 0.250
Method: Least Squares F-statistic: 360.9
Date: Mon, 30 Mar 2026 Prob (F-statistic): 1.27e-69
Time: 15:04:03 Log-Likelihood: -2489.4
No. Observations: 1078 AIC: 4983.
Df Residuals: 1076 BIC: 4993.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 33.8928 1.833 18.491 0.000 30.296 37.489
x1 0.5140 0.027 18.997 0.000 0.461 0.567
==============================================================================
Omnibus: 17.527 Durbin-Watson: 0.765
Prob(Omnibus): 0.000 Jarque-Bera (JB): 30.642
Skew: -0.052 Prob(JB): 2.22e-07
Kurtosis: 3.819 Cond. No. 1.67e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.67e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
2.438134379352427
plt.figure(figsize=(10, 8))
plt.subplot(3, 2, 1); plt.plot(beta0)
plt.subplot(3, 2, 2); plt.hist(beta0, bins=30)
plt.subplot(3, 2, 3); plt.plot(beta1)
plt.subplot(3, 2, 4); plt.hist(beta1, bins=30)
plt.subplot(3, 2, 5); plt.plot(logsigma); plt.xlabel("MC iteration")
plt.subplot(3, 2, 6); plt.hist(logsigma, bins=30)
plt.tight_layout()
plt.show()
print(np.mean(beta0), np.mean(beta1), np.mean(logsigma))
33.604022436453214 0.5182672403018774 0.8916462091055167