Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

import numpy as np
import matplotlib.pyplot as plt
from scipy.special import gammaln, digamma

Lecture 13: Bayesian Bootstrap

Consider the following problem that we already studied previously (see e.g., Lecture 4):

Suppose a scientist makes 6 numerical measurements 26.6,38.5,34.4,34,31,23.626.6, 38.5, 34.4, 34, 31, 23.6 on an unknown real-valued physical quantity θ\theta. On the basis of these measurements, what can be inferred about θ\theta?

Previously we solved this problem using the Bayesian model:

X1,,Xnθ,σi.i.dN(θ,σ2)   and   θ,logσi.i.duniform(,).\begin{align*} X_1, \dots, X_n \mid \theta,\sigma \overset{\text{i.i.d}}{\sim} N(\theta, \sigma^2)~~ \text{ and }~~ \theta, \log \sigma \overset{\text{i.i.d}}{\sim} \text{uniform}(-\infty, \infty). \end{align*}

This gave the following posterior for θ\theta:

n(θθ^)S(θ^)/(n1)datatn1\begin{align*} \frac{\sqrt{n}(\theta - \hat{\theta})}{\sqrt{S(\hat{\theta})/(n-1)}} \mid \text{data} \sim t_{n-1} \end{align*}

where S(θ):=i=1n(xiθ)2S(\theta) := \sum_{i=1}^n (x_i - \theta)^2 and θ^=xˉ=(x1++xn)/n\hat{\theta} = \bar{x} = (x_1 + \dots + x_n)/n. This led to the 95% interval: [25.598,37.102][25.598, 37.102].

We now use the model:

X1,,XnPi.i.dP   and   θ=mean corresponding to P\begin{align*} X_1, \dots, X_n \mid P \overset{\text{i.i.d}}{\sim} P ~~ \text{ and } ~~ \theta = \text{mean corresponding to} ~ P \end{align*}

We use discretization and assume that PP is supported on a large finite set G={g1,,gk}G = \{g_1, \dots, g_k\} e.g., G={0.1,0.2,,99.9,100.0}G = \{0.1, 0.2, \dots, 99.9, 100.0\}. Let the probabilities assigned by PP to g1,,gkg_1, \dots, g_k be p1,,pkp_1, \dots, p_k. We use the noninformative prior:

(p1,,pk)Dirichlet(0,,0).\begin{align*} (p_1, \dots, p_k) \sim \text{Dirichlet}(0, \dots, 0). \end{align*}

This leads to the posterior:

PX1=x1,,Xn=xni=1nwiδ{xi} where (w1,,wn)Dirichlet(1,,1).\begin{align*} P \mid X_1 = x_1, \dots, X_n = x_n \sim \sum_{i=1}^n w_i \delta_{\{x_i\}} ~\text{where} ~ (w_1, \dots, w_n) \sim \text{Dirichlet}(1, \dots, 1). \end{align*}

The posterior of θ\theta is therefore:

θdatai=1nwixi where (w1,,wn)Dirichlet(1,,1).\begin{align*} \theta \mid \text{data} \sim \sum_{i=1}^n w_i x_i \text{ where } (w_1, \dots, w_n) \sim \text{Dirichlet}(1, \dots, 1). \end{align*}

Below we simulate from this posterior, and obtain posterior mean and 95% credible interval.

x_obs = np.array([26.6, 38.5, 34.4, 34, 31, 23.6])
#x_obs = np.array([26.6, 38.5, 34.4, 34, 31, 23.6, 100]) #this data will reveal differences between estimation of mean and median
n = len(x_obs)
M = 100000 #number of posterior samples
np.random.seed(42) #for reproducibility
W = np.random.dirichlet(alpha = np.ones(n), size=M) #M samples from the Dirichlet distribution
theta_samples = W @ x_obs #compute the posterior samples of theta

theta_mean = np.mean(theta_samples)
theta_ci = np.quantile(theta_samples, [0.025, 0.975])

print(f"Posterior mean (Bayesian Bootstrap) of theta: {theta_mean:.2f}")
print(f"95% credible (Bayesian Bootstrap) interval: [{theta_ci[0]:.2f}, {theta_ci[1]:.2f}]")
Posterior mean (Bayesian Bootstrap) of theta: 31.35
95% credible (Bayesian Bootstrap) interval: [27.53, 34.90]

Below is a histogram of all the posterior samples.

plt.hist(theta_samples, bins=40, density=True)
plt.xlabel(r"$\theta$")
plt.ylabel("Posterior density")
plt.title("Posterior Samples of $\\theta$")
plt.show()
<Figure size 640x480 with 1 Axes>

This method is actually called the Bayesian bootstrap because it is very close to the bootstrap operationally, as demonstrated below.

The usual bootstrap generates samples of the form:

i=1nviδ{xi}   where   (v1,,vn)1nMultinomial(n;1/n,,1/n).\begin{align*} \sum_{i=1}^n v_i \delta_{\{x_i\}} ~~ \text{ where } ~~ (v_1, \dots, v_n) \sim \frac{1}{n}\text{Multinomial}(n; 1/n, \dots, 1/n). \end{align*}

This leads to samples of θ\theta (which is the mean of PP) generated according to:

i=1nvixi   where   (v1,,vn)1nMultinomial(n;1/n,,1/n).\begin{align*} \sum_{i=1}^n v_i x_i ~~ \text{ where } ~~ (v_1, \dots, v_n) \sim \frac{1}{n}\text{Multinomial}(n; 1/n, \dots, 1/n). \end{align*}
#Bootstrap samples
V = np.random.multinomial(n=n, pvals=np.ones(n)/n, size=M)/n
theta_bootstrap = V @ x_obs

Below we plot the histograms of the bootstrap and Bayesian bootstrap samples.

plt.hist(theta_samples, bins=40, density=True, alpha=0.5, label="Bayesian bootstrap")
plt.hist(theta_bootstrap, bins=40, density=True, alpha=0.5, label="Classical bootstrap")
plt.legend()
plt.xlabel(r"$\theta$")
plt.ylabel("Density")
plt.show()
<Figure size 640x480 with 1 Axes>

The two histograms overlap considerably, but the classical bootstrap histogram is much rougher compared to the Bayesian bootstrap histogram.

Below we compute the bootstrap mean and confidence intervals, and compare them to the Bayesian bootstrap mean and confidence intervals.

theta_mean_bootstrap = np.mean(theta_bootstrap)
theta_ci_bootstrap = np.quantile(theta_bootstrap, [0.025, 0.975])

print(f"Bootstrap mean of theta: {theta_mean_bootstrap:.2f}")
print(f"95% bootstrap confidence interval: [{theta_ci_bootstrap[0]:.2f}, {theta_ci_bootstrap[1]:.2f}]")

print(f"Posterior mean (Bayesian Bootstrap) of theta: {theta_mean:.2f}")
print(f"95% credible (Bayesian Bootstrap) interval: [{theta_ci[0]:.2f}, {theta_ci[1]:.2f}]")
Bootstrap mean of theta: 31.35
95% bootstrap confidence interval: [27.13, 35.15]
Posterior mean (Bayesian Bootstrap) of theta: 31.35
95% credible (Bayesian Bootstrap) interval: [27.53, 34.90]

Note that these intervals are very close to each other but the Bayesian bootstrap interval is slightly narrower. It can be checked that:

E(θdata)=E(i=1nwixidata)=xˉ   and   var(θdata)=var(i=1nwixidata)=1n(n+1)i=1n(xixˉ)2.\begin{align*} \mathbb{E} \left(\theta \mid \text{data} \right) = \mathbb{E} \left(\sum_{i=1}^n w_i x_i \mid \text{data} \right) = \bar{x} ~~ \text{ and } ~~ \text{var} \left(\theta \mid \text{data} \right) = \text{var} \left(\sum_{i=1}^n w_i x_i \mid \text{data} \right) = \frac{1}{n(n+1)} \sum_{i=1}^n (x_i - \bar{x})^2. \end{align*}

and

E(i=1nvixidata)=xˉ   and   var(i=1nvixidata)=1n1ni=1n(xixˉ)2\begin{align*} \mathbb{E} \left(\sum_{i=1}^n v_i x_i \mid \text{data} \right) = \bar{x} ~~ \text{ and } ~~ \text{var} \left(\sum_{i=1}^n v_i x_i \mid \text{data} \right) = \frac{1}{n} \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})^2 \end{align*}

So the Bayesian bootstrap variance is very slightly (by a factor of n/(n+1)n/(n+1)) smaller than the bootstrap variance.

Median Estimation

Instead of assuming that θ\theta is the mean of PP, we can assume that θ\theta is the median of PP. We would then be computing the medians of the generated samples (either the Bayesian bootstrap samples, or the usual bootstrap samples).

For an arbitrary probability measure QQ, we define its median as:

inf{x:Q(,x]0.5}\begin{align*} \inf \left\{x: Q(-\infty, x] \geq 0.5 \right\} \end{align*}

This definition makes sense even when QQ is discrete. As an example, note that if QQ is the discrete distribution taking the two values 0 and 1 with equal probabilities, then the median of QQ according to the above definition is 0.

Below we compute the medians corresponding to each Bayesian boostrap sample.

order_obs = np.argsort(x_obs)
x_sorted = x_obs[order_obs]
W_sorted = W[:, order_obs]
cdf = np.cumsum(W_sorted, axis=1)
median_idx = np.argmax(cdf >= 0.5, axis=1)
theta_median_samples = x_sorted[median_idx]
theta_median_mean = np.mean(theta_median_samples)
theta_median_ci = np.quantile(theta_median_samples, [0.025, 0.975])

print(f"Posterior mean (Bayesian Bootstrap) of theta (median): {theta_median_mean:.2f}")
print(f"95% credible (Bayesian Bootstrap) interval for theta (median): [{theta_median_ci[0]:.2f}, {theta_median_ci[1]:.2f}]")

#Compare to mean: 
print(f"Posterior mean (Bayesian Bootstrap) of theta (mean): {theta_mean:.2f}")
print(f"95% credible (Bayesian Bootstrap) interval for theta (mean): [{theta_ci[0]:.2f}, {theta_ci[1]:.2f}]")
Posterior mean (Bayesian Bootstrap) of theta (median): 31.78
95% credible (Bayesian Bootstrap) interval for theta (median): [23.60, 38.50]
Posterior mean (Bayesian Bootstrap) of theta (mean): 31.35
95% credible (Bayesian Bootstrap) interval for theta (mean): [27.53, 34.90]

Next we do the same thing for the usual bootstrap.

order_obs = np.argsort(x_obs)
x_sorted = x_obs[order_obs]
V_sorted = V[:, order_obs]
cdf = np.cumsum(V_sorted, axis=1)
median_idx = np.argmax(cdf >= 0.5, axis=1)
theta_median_bootstrap = x_sorted[median_idx]
theta_median_bootstrap_mean = np.mean(theta_median_bootstrap)
theta_median_bootstrap_ci = np.quantile(theta_median_bootstrap, [0.025, 0.975])

print(f"Posterior mean (Bootstrap) of theta (median): {theta_median_bootstrap_mean:.2f}")
print(f"95% credible (Bootstrap) interval for theta (median): [{theta_median_bootstrap_ci[0]:.2f}, {theta_median_bootstrap_ci[1]:.2f}]")

#Compare to mean: 
print(f"Posterior mean (Bootstrap) of theta (mean): {theta_mean_bootstrap:.2f}")
print(f"95% credible (Bootstrap) interval for theta (mean): [{theta_ci_bootstrap[0]:.2f}, {theta_ci_bootstrap[1]:.2f}]")
Posterior mean (Bootstrap) of theta (median): 30.51
95% credible (Bootstrap) interval for theta (median): [23.60, 34.40]
Posterior mean (Bootstrap) of theta (mean): 31.35
95% credible (Bootstrap) interval for theta (mean): [27.13, 35.15]

The histograms of the two median samples (Bootstrap and Bayesian Bootstrap) are plotted below.

plt.hist(theta_median_samples, bins=40, density=True, alpha=0.5, label="Bayesian bootstrap")
plt.hist(theta_median_bootstrap, bins=40, density=True, alpha=0.5, label="Classical bootstrap")
plt.legend()
plt.xlabel(r"$\theta$")
plt.ylabel("Density")
plt.show()
<Figure size 640x480 with 1 Axes>

For more on the Bayesian bootstrap and its connection to the classical bootstrap, I highly recommend the blog post https://www.sumsar.net/blog/2015/04/the-non-parametric-bootstrap-as-a-bayesian-model/ and the original paper on the Bayesian Bootstrap by Donald Rubin (1981).