We shall look at how the Gibbs sampler works in some simple examples.
import numpy as np
import matplotlib.pyplot as pltExample One: Bivariate Normal¶
Below we use the Gibbs sampler to generate samples from the bivariate normal distribution with mean , variances equal to 1 and correlation equal to . The main message here is that Gibbs sampler works well unless is very close to one.
y1, y2 = 0, 0
rho = 0.8
#rho = 0.9999 #this is the correlation between the two variables
state = np.array([0.0, 0.0], dtype = float) #this is the initial state
nit = 100000 #number of iterations for the Gibbs sampler
path = np.zeros((nit, 2))
path[0] = state
sd = np.sqrt(1 - rho**2) #this is the standard deviation of the conditional distribution
for t in range(1, nit):
j = np.random.randint(2)
if j == 0:
path[t, 0] = y1 + rho * (state[1] - y2) + sd * np.random.randn()
path[t, 1] = state[1]
else:
path[t, 0] = state[0]
path[t, 1] = y2 + rho * (state[0] - y1) + sd * np.random.randn()
state = path[t]
print(path[:8]) #first 8 samples from the Gibbs sampler
print(path[-8:]) #last 8 samples from the Gibbs sampler[[ 0. 0. ]
[ 0. 0.32019321]
[ 0. 0.1621067 ]
[-0.53972888 0.1621067 ]
[ 0.133507 0.1621067 ]
[-0.85725147 0.1621067 ]
[-0.85725147 -0.64325011]
[-0.85725147 -0.5493405 ]]
[[ 0.20994665 0.63532072]
[-0.72248697 0.63532072]
[-0.72248697 0.24007978]
[-1.07082005 0.24007978]
[-0.28484312 0.24007978]
[-0.44242363 0.24007978]
[-0.44242363 0.54204849]
[ 0.22792695 0.54204849]]
plt.figure(figsize=(6, 6))
plt.scatter(path[:, 0], path[:, 1], alpha=0.5)
plt.title("Scatter plot of Gibbs samples")
plt.xlabel("X1")
plt.ylabel("X2")
plt.axis("equal")
plt.show()
plt.figure(figsize=(6, 6))
nplot = 60
plt.plot(path[:nplot, 0], path[:nplot, 1], "-o", lw=0.8) # path of the chain
plt.plot(path[0, 0], path[0, 1], "o", label="start") # starting point
plt.title("Path of the Gibbs sampler")
plt.xlabel("X1")
plt.ylabel("X2")
plt.axis("equal")
plt.legend()
plt.show()
Below we plot the trace plots for each of the two variables and . The trace plot for is the plot of versus and the trace plot for is the plot of versus . One uses these to check if the Markov chain has mixed properly.
#Plotting the marginal distributions:
xx = np.linspace(-4, 4, 400)
dnorm = (1 / np.sqrt(2 * np.pi)) * np.exp(-xx**2 / 2)
plt.figure(figsize=(12, 8))
plt.subplot(2, 1, 1)
plt.plot(path[:2000, 0], lw=0.7)
plt.title("Trace plot for X1")
plt.xlabel("Iteration")
plt.ylabel("X1")
plt.subplot(2, 1, 2)
plt.plot(path[:2000, 1], lw=0.7)
plt.title("Trace plot for X2")
plt.xlabel("Iteration")
plt.ylabel("X2")
plt.tight_layout()
plt.show()
Below we plot the histograms of the samples (separately for and ) and check how close they are to the true densities.
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.hist(path[:, 0], bins=90, density=True, alpha=0.6)
plt.plot(xx, dnorm, lw=2)
plt.title("Histogram of X1 vs N(0,1)")
plt.xlabel("X1")
plt.ylabel("Density")
plt.subplot(1, 2, 2)
plt.hist(path[:, 1], bins=90, density=True, alpha=0.6)
plt.plot(xx, dnorm, lw=2)
plt.title("Histogram of X2 vs N(0,1)")
plt.xlabel("X2")
plt.ylabel("Density")
plt.tight_layout()
plt.show()
Now go back and change to a value close to 1 (say ), then the marginal histograms will be quite far from standard normal densities. This shows that the Gibbs sampler has issues with variables that are highly correlated.
Example Two: Linear Regression¶
The second example is that of linear regression. The model is with with being uniform on .
For illustration, we use the heights dataset.
import pandas as pd
height = pd.read_csv("PearsonHeightData.txt", sep = r"\s+")
print(height) Father Son
0 65.0 59.8
1 63.3 63.2
2 65.0 63.3
3 65.8 62.8
4 61.1 64.3
... ... ...
1073 67.0 70.8
1074 71.3 68.3
1075 71.8 69.3
1076 70.7 69.3
1077 70.3 67.0
[1078 rows x 2 columns]
plt.figure(figsize = (10, 4))
plt.scatter(height["Father"], height["Son"])
plt.xlabel("Father's height")
plt.ylabel("Son's height")
plt.title("Scatter plot of heights data")
plt.show()
n = height.shape[0]
X = np.column_stack([np.ones(n), height["Father"]]) #this is the X matrix (consisting of covariates and the column of ones)
y = height["Son"] #this is the y vector consisting of response values
The usual method of linear regression simply uses the OLS function in statsmodels.
import statsmodels.api as sm
linmod = sm.OLS(y, X).fit()
print(linmod.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Son R-squared: 0.251
Model: OLS Adj. R-squared: 0.250
Method: Least Squares F-statistic: 360.9
Date: Sat, 04 Apr 2026 Prob (F-statistic): 1.27e-69
Time: 00:45:11 Log-Likelihood: -2489.4
No. Observations: 1078 AIC: 4983.
Df Residuals: 1076 BIC: 4993.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 33.8928 1.833 18.491 0.000 30.296 37.489
x1 0.5140 0.027 18.997 0.000 0.461 0.567
==============================================================================
Omnibus: 17.527 Durbin-Watson: 0.765
Prob(Omnibus): 0.000 Jarque-Bera (JB): 30.642
Skew: -0.052 Prob(JB): 2.22e-07
Kurtosis: 3.819 Cond. No. 1.67e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.67e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
The estimate of is obtained as follows.
bhat = linmod.params.values
print(bhat)
Sbhat = np.sum(linmod.resid ** 2)
m = X.shape[1] - 1 #in this case, m = 1
sighat = np.sqrt(Sbhat/(n-m-1)) #this is also called the residual standard error
print(sighat)[33.89280054 0.51400591]
2.438134379352427
The Gibbs sampler for linear regression is implemented below. It is based on the following full conditionals:
where is the least squares estimate, and
So the Gibbs sampler algorithm is:
Initialize at arbitrary and .
Repeat the following for . Pick to be either 1 or 2 with equal probability. If , take and . If , take . Generate an observation from the Gamma distribution with parameters and . Then take .
This algorithm is implemented below.
MM = np.linalg.inv(X.T @ X)
state = np.array([-30.0, 15.0, 250.0]) # bad initial value; still works
nit = 100000
path = np.zeros((nit, 3))
path[0] = state
for t in range(1, nit):
j = np.random.randint(2)
if j == 0:
Cov_mat = (state[2] ** 2) * MM
path[t, :2] = np.random.multivariate_normal(mean=bhat, cov=Cov_mat)
path[t, 2] = state[2]
else:
path[t, :2] = state[:2]
rss = np.sum((y - state[0] - state[1] * height["Father"]) ** 2)
gg = np.random.gamma(shape=n / 2, scale=2 / rss)
path[t, 2] = gg ** (-0.5)
state = path[t]
if (t + 1) % 2000 == 0:
print(t + 1)
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
22000
24000
26000
28000
30000
32000
34000
36000
38000
40000
42000
44000
46000
48000
50000
52000
54000
56000
58000
60000
62000
64000
66000
68000
70000
72000
74000
76000
78000
80000
82000
84000
86000
88000
90000
92000
94000
96000
98000
100000
Below we plot the histograms of the posterior samples (after removing burnin of 500) along with the actual marginal posterior densities (which are t-densities for and , and chi-squared for where denotes the residual sum of squares).
from scipy.stats import t, chi2, norm
df = n - 2
path2 = path[500:, :]
z_samp = Sbhat / (path2[:, 2] ** 2)
scale_beta0 = sighat * np.sqrt(MM[0, 0])
scale_beta1 = sighat * np.sqrt(MM[1, 1])
x0 = np.linspace(path2[:, 0].min(), path2[:, 0].max(), 400)
x1 = np.linspace(path2[:, 1].min(), path2[:, 1].max(), 400)
xz = np.linspace(max(1e-6, z_samp.min()), z_samp.max(), 400)
plt.figure(figsize=(15, 4))
plt.subplot(1, 3, 1)
plt.hist(path2[:, 0], bins=60, density=True, alpha=0.6)
plt.plot(x0, t.pdf(x0, df=df, loc=bhat[0], scale=scale_beta0), lw=2, label="t")
plt.plot(x0, norm.pdf(x0, loc=bhat[0], scale=scale_beta0), lw=2, ls="--", label="normal")
plt.title(r"Posterior of $\beta_0$")
plt.xlabel(r"$\beta_0$")
plt.ylabel("Density")
plt.legend()
plt.subplot(1, 3, 2)
plt.hist(path2[:, 1], bins=60, density=True, alpha=0.6)
plt.plot(x1, t.pdf(x1, df=df, loc=bhat[1], scale=scale_beta1), lw=2, label="t")
plt.plot(x1, norm.pdf(x1, loc=bhat[1], scale=scale_beta1), lw=2, ls="--", label="normal")
plt.title(r"Posterior of $\beta_1$")
plt.xlabel(r"$\beta_1$")
plt.ylabel("Density")
plt.legend()
plt.subplot(1, 3, 3)
plt.hist(z_samp, bins=60, density=True, alpha=0.6)
plt.plot(xz, chi2.pdf(xz, df=df), lw=2)
plt.title(r"Posterior of RSS/$\sigma^2$")
plt.xlabel(r"RSS/$\sigma^2$")
plt.ylabel("Density")
plt.tight_layout()
plt.show()
From the above plots, it is clear that the Gibbs sampler is working well.
Example Three: Probit Regression¶
Consider the frogs dataset that we previously used for logistic regression. Below we fit the probit model to this data, using a standard software function. We fit both the probit model as well as the logistic regression model. Note that the fitted coefficients in both these models are somewhat close to each other.
frogs = pd.read_csv("frogs.csv")
# Design matrix
X = frogs[['altitude','distance','NoOfPools','NoOfSites','avrain','meanmin','meanmax']].copy()
X['distance'] = np.log(X['distance'])
X['NoOfPools'] = np.log(X['NoOfPools'])
X = sm.add_constant(X)
y = frogs['pres.abs'].values
Xm = X.values
n, p = Xm.shape
linmod_cov = np.linalg.inv(Xm.T @ Xm)
# GLM (probit + logit)
print(sm.GLM(y, Xm, family=sm.families.Binomial(link=sm.families.links.Probit())).fit().summary())
print(sm.GLM(y, Xm, family=sm.families.Binomial()).fit().summary())
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 212
Model: GLM Df Residuals: 204
Model Family: Binomial Df Model: 7
Link Function: Probit Scale: 1.0000
Method: IRLS Log-Likelihood: -99.536
Date: Sat, 04 Apr 2026 Deviance: 199.07
Time: 00:52:50 Pearson chi2: 229.
No. Iterations: 6 Pseudo R-squ. (CS): 0.3173
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 40.9617 73.193 0.560 0.576 -102.493 184.417
x1 -0.0090 0.021 -0.426 0.670 -0.050 0.032
x2 -0.4170 0.144 -2.896 0.004 -0.699 -0.135
x3 0.3396 0.122 2.794 0.005 0.101 0.578
x4 0.0044 0.062 0.071 0.944 -0.117 0.126
x5 -0.0102 0.034 -0.299 0.765 -0.077 0.057
x6 2.9654 0.863 3.437 0.001 1.274 4.656
x7 -2.4367 2.700 -0.902 0.367 -7.729 2.856
==============================================================================
Generalized Linear Model Regression Results
==============================================================================
Dep. Variable: y No. Observations: 212
Model: GLM Df Residuals: 204
Model Family: Binomial Df Model: 7
Link Function: Logit Scale: 1.0000
Method: IRLS Log-Likelihood: -98.812
Date: Sat, 04 Apr 2026 Deviance: 197.62
Time: 00:52:50 Pearson chi2: 234.
No. Iterations: 6 Pseudo R-squ. (CS): 0.3219
Covariance Type: nonrobust
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 40.8988 132.663 0.308 0.758 -219.116 300.914
x1 -0.0066 0.039 -0.172 0.863 -0.082 0.069
x2 -0.7593 0.255 -2.973 0.003 -1.260 -0.259
x3 0.5727 0.216 2.649 0.008 0.149 0.997
x4 -0.0009 0.107 -0.008 0.993 -0.211 0.210
x5 -0.0068 0.060 -0.113 0.910 -0.124 0.111
x6 5.3048 1.543 3.439 0.001 2.281 8.328
x7 -3.1730 4.840 -0.656 0.512 -12.659 6.313
==============================================================================
In the next lecture, we shall see how to perform Bayesian inference in this model (with the uniform prior on ) using the Gibbs sampler.