Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Consider again the wages dataset that we used in lectures (e.g., Lecture 23). This simple dataset, which comes from the introductory statistics textbook “Statistical Sleuth” by Ramsey and Schafer, contains weekly wages in 1987 for a sample of 25437 males between the ages of 18 and 70 who worked full-time along with their years of education, years of experience, indicator variable for whether they worked near a city, and a code for the region in the US where they worked.

We will fit a Gaussian Process regression to this data using the squared exponential kernel (as opposed to the kernel based on Integrated Brownian Motion that we used previously).

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
#Load the dataset wagedata.csv
dt = pd.read_csv("wagedata.csv")
print(dt.head(10))
print(dt.shape)
      Region   MetropolitanStatus  Exper  Educ  WeeklyEarnings
0      South  NotMetropolitanArea      8    12          859.71
1    Midwest     MetropolitanArea     30    12          786.73
2       West     MetropolitanArea     31    14         1424.50
3       West     MetropolitanArea     17    16          959.16
4       West     MetropolitanArea      6    12          154.32
5    Midwest     MetropolitanArea     24    17          282.19
6  Northeast     MetropolitanArea     14    12          444.44
7  Northeast     MetropolitanArea     28    12          451.09
8      South     MetropolitanArea     40    12          490.27
9      South     MetropolitanArea      7    16          493.83
(25437, 5)

As before, we want to fit a regression model with logarithm of weekly earnings as the response variable:

y=log(weekly earnings)\begin{align*} y = \log \left( \text{weekly earnings} \right) \end{align*}

and years of experience as the covariate:

x=Exper=Years of Experience.\begin{align*} x = \text{Exper} = \text{Years of Experience}. \end{align*}

The following is the scatter plot between yy and xx.

import numpy as np
y = np.log(dt['WeeklyEarnings'])
x = dt['Exper'].to_numpy()
import matplotlib.pyplot as plt
plt.figure(figsize = (10, 6))
plt.scatter(x, y, s = 1)
plt.xlabel('Years of Experience')
plt.ylabel('Log of Weekly Earnings')
plt.title('Scatter Plot of Years of Experience vs Log(Weekly Earnings)')
plt.show()
<Figure size 1000x600 with 1 Axes>

Running the GP on the full data might be computationally expensive, so, to illustrate the main ideas without going to computational efficiency considerations, let us work with a smaller dataset.

#Repeating the analysis on a smaller dataset: 
n = 500
random_seed = 42
dt_small = dt.sample(n = 500, random_state = random_seed)
#dt_small = dt #this is to switch back to the full dataset
y = np.log(dt_small['WeeklyEarnings'])
x = dt_small['Exper'].to_numpy()
n = len(y)

Our regression model is:

yi=f(xi)+ϵi\begin{align*} y_i = f(x_i) + \epsilon_i \end{align*}

where ϵii.i.dN(0,σ2)\epsilon_i \overset{\text{i.i.d}}{\sim} N(0, \sigma^2). For ff, we use the Gaussian process with mean zero and covariance kernel K(,)K(\cdot, \cdot). For the covariance kernel, we use the squared exponential kernel, also known as the Radial Basis Function (RBF) kernel, which is given by:

K(u,v)=τ2exp((uv)222).\begin{align*} K(u, v) = \tau^2 \exp \left(-\frac{(u - v)^2}{2 \ell^2} \right). \end{align*}

This kernel has two hyperparameters τ\tau and \ell. Here τ\tau controls the overall size (also known as amplitude) of the fluctuations of ff around its mean (which is zero). Larger values of τ\tau allow the regression function to vary more widely. The parameter \ell is called the length-scale and controls the smoothness of the sample paths: when \ell is large, the function f(x)f(x) changes slowly with xx, while smaller values of \ell allow more rapid local variation. Thus τ\tau governs the overall magnitude of variation, where \ell determines how quickly the function can change with the input.

The following function implements the RBF kernel.

def rbf_kernel(x1, x2, tau2, ell):
    x1 = x1[:,None]
    x2 = x2[None,:]
    return tau2 * np.exp(-(x1-x2)**2/(2*ell**2))

Given fixed values of τ,,σ\tau, \ell, \sigma, the posterior mean of f(x)f(x) is:

f(x)^=kTK1(f(x1),,f(xn))T.\begin{align*} \widehat{f(x)} = \mathbf{k}^TK^{-1} (f(x_1), \dots, f(x_n))^T. \end{align*}

where K=(K(xi,xj))n×nK = (K(x_i, x_j))_{n \times n} and k=(K(xi,x))n×1\mathbf{k} = (K(x_i, x))_{n \times 1}. The following code computes the posterior mean of f(x)f(x) for a grid of xx values.

#Here are some arbitrary choices for tau^2, \ell, \sigma^2: 
tau2 = 14
ell = 70
sigma2 = 0.3 

K = rbf_kernel(x,x,tau2,ell) + sigma2*np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(min(x),max(x),200)

K_star = rbf_kernel(x_grid,x,tau2,ell)

post_mean = K_star @ Kinv @ y

plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2)
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>

The posterior mean gives a smooth fitted function. Let us now change the parameter values and plot the posterior mean of f(x)f(x).

#Here are some arbitrary choices for tau^2, \ell, \sigma^2: 
tau2 = 14
ell = 7000 #this change to 7000 will make the fit much smoother, since the kernel will be much less sensitive to differences in x values.
sigma2 = 0.3 

K = rbf_kernel(x,x,tau2,ell) + sigma2*np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(min(x),max(x),200)

K_star = rbf_kernel(x_grid,x,tau2,ell)

post_mean = K_star @ Kinv @ y

plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2)
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>

Note that we are not using a parameter for the mean. In other words, our model is NOT f(x)=β0+g(x)f(x) = \beta_0 + g(x) where gg is the GP with RBF kernel. Because of this, if we set τ\tau too small, then the fitted function will basically be zero, and will present a poor fit to the data.

#Here are some arbitrary choices for tau^2, \ell, \sigma^2: 
tau2 = 0.00001 #for such a small value of tau^2, the fit will basically equal zero. 
ell = 70 
sigma2 = 0.3 

K = rbf_kernel(x,x,tau2,ell) + sigma2*np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(min(x),max(x),200)

K_star = rbf_kernel(x_grid,x,tau2,ell)

post_mean = K_star @ Kinv @ y

plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2)
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>

In practice, the hyperparameters need to be tuned based on the data. For this, the simplest method is to write down the marginal likelihood of y1,,yny_1, \dots, y_n (marginal here means that the function f(x)f(x) is integrated out), and then to maximize the marginal likelihood over the hyperparameters. The marginal likelihood is given by:

fyσ,τ,(y)1det(K+σ2In)exp(12yT(K+σ2In)1y).\begin{align*} f_{y \mid \sigma, \tau, \ell}(y) \propto \frac{1}{\sqrt{\det(K + \sigma^2 I_n)}} \exp \left(-\frac{1}{2} y^T (K + \sigma^2 I_n)^{-1} y \right). \end{align*}

where y=(y1,,yn)y = (y_1, \dots, y_n). Note that KK depends on τ\tau and \ell.

Below we optimize logfyσ,τ,(y)\log f_{y \mid \sigma, \tau, \ell}(y) over σ,τ,\sigma, \tau, \ell. We take a grid based approach which is time consuming (and will be infeasible if we run the method on the whole dataset with size around 25K).

#we are using Cholesky decomposition to compute the log marginal likelihood in a more numerically stable way.
def log_marginal_likelihood(x,y,tau2,ell,sigma2):

    K = rbf_kernel(x,x,tau2,ell)
    K = K + sigma2*np.eye(len(x))

    L = np.linalg.cholesky(K)

    alpha = np.linalg.solve(L.T,
            np.linalg.solve(L,y))

    logdet = 2*np.sum(np.log(np.diag(L)))

    return -0.5*y@alpha -0.5*logdet -0.5*len(x)*np.log(2*np.pi)
tau_grid = np.logspace(-2,2,20)
ell_grid = np.logspace(-1,2,20)
sig_grid = np.logspace(-3,0,20)

best_val = -np.inf
best_params = None

for tau2 in tau_grid:
    for ell in ell_grid:
        for sigma2 in sig_grid:

            val = log_marginal_likelihood(x,y,tau2,ell,sigma2)

            if val > best_val:
                best_val = val
                best_params = (tau2,ell,sigma2)

print(best_params)
(np.float64(14.38449888287663), np.float64(69.51927961775606), np.float64(0.3359818286283781))
tau2, ell, sigma2 = best_params

K = rbf_kernel(x,x,tau2,ell) + sigma2*np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(min(x),max(x),200)

K_star = rbf_kernel(x_grid,x,tau2,ell)

post_mean = K_star @ Kinv @ y
plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2)
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>

Instead of gridding, we can using inbuilt optimization routines in scipy. This makes the code much faster.

from scipy.optimize import minimize

def rbf_kernel(x1, x2, tau2, ell):
    x1 = np.asarray(x1).reshape(-1, 1)
    x2 = np.asarray(x2).reshape(-1, 1)
    sqdist = (x1 - x2.T) ** 2
    return tau2 * np.exp(-sqdist / (2 * ell**2))

def log_marginal_likelihood(x, y, tau2, ell, sigma2):
    n = len(x)
    K = rbf_kernel(x, x, tau2, ell)
    K = K + sigma2 * np.eye(n)

    # small jitter for numerical stability
    K = K + 1e-8 * np.eye(n)

    L = np.linalg.cholesky(K)

    alpha = np.linalg.solve(L.T, np.linalg.solve(L, y))
    logdet = 2 * np.sum(np.log(np.diag(L)))

    return -0.5 * y @ alpha - 0.5 * logdet - 0.5 * n * np.log(2 * np.pi)

def objective(log_params, x, y):
    log_tau2, log_ell, log_sigma2 = log_params

    tau2 = np.exp(log_tau2)
    ell = np.exp(log_ell)
    sigma2 = np.exp(log_sigma2)

    # minimize negative log marginal likelihood
    return -log_marginal_likelihood(x, y, tau2, ell, sigma2)

# initial guess
init = np.log([1.0, 1.0, 0.1])

# optional bounds in log-space
bounds = [
    (np.log(1e-4), np.log(1e4)),   # tau2
    (np.log(1e-3), np.log(1e3)),   # ell
    (np.log(1e-6), np.log(1e1))    # sigma2
]

result = minimize(
    objective,
    init,
    args=(x, y),
    method="L-BFGS-B",
    bounds=bounds
)

best_log_tau2, best_log_ell, best_log_sigma2 = result.x
best_params_lbfgs = (
    np.exp(best_log_tau2),
    np.exp(best_log_ell),
    np.exp(best_log_sigma2)
)

print("Best parameters:", best_params_lbfgs)
Best parameters: (np.float64(19.49622536763746), np.float64(72.1825467002777), np.float64(0.29899253407560966))
tau2, ell, sigma2 = best_params_lbfgs

K = rbf_kernel(x,x,tau2,ell) + sigma2*np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(min(x),max(x),200)

K_star = rbf_kernel(x_grid,x,tau2,ell)

post_mean_lbfgs = K_star @ Kinv @ y
plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2, label='Grid Search Fit')
plt.plot(x_grid,post_mean_lbfgs,color='blue',lw=2, label='L-BFGS-B Fit')
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.legend()
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>
print(np.column_stack([post_mean_lbfgs, post_mean]))
[[5.63709283 5.64158245]
 [5.65373462 5.65801332]
 [5.67024647 5.67431908]
 [5.68662765 5.69049895]
 [5.70287744 5.70655216]
 [5.7189951  5.72247793]
 [5.73497993 5.7382755 ]
 [5.75083122 5.75394411]
 [5.76654826 5.769483  ]
 [5.78213036 5.78489144]
 [5.79757685 5.80016869]
 [5.81288704 5.81531403]
 [5.82806026 5.83032672]
 [5.84309586 5.84520607]
 [5.85799317 5.85995136]
 [5.87275156 5.8745619 ]
 [5.88737038 5.889037  ]
 [5.901849   5.90337598]
 [5.91618681 5.91757817]
 [5.93038318 5.93164289]
 [5.94443752 5.9455695 ]
 [5.95834922 5.95935734]
 [5.97211769 5.97300578]
 [5.98574235 5.98651418]
 [5.99922264 5.99988191]
 [6.01255797 6.01310837]
 [6.0257478  6.02619294]
 [6.03879158 6.03913502]
 [6.05168877 6.05193403]
 [6.06443884 6.06458937]
 [6.07704125 6.07710049]
 [6.08949551 6.0894668 ]
 [6.10180109 6.10168776]
 [6.11395751 6.11376281]
 [6.12596427 6.12569142]
 [6.13782089 6.13747306]
 [6.14952691 6.14910721]
 [6.16108186 6.16059334]
 [6.17248527 6.17193096]
 [6.18373672 6.18311958]
 [6.19483576 6.1941587 ]
 [6.20578195 6.20504785]
 [6.21657489 6.21578656]
 [6.22721417 6.22637436]
 [6.23769937 6.23681082]
 [6.24803011 6.24709549]
 [6.258206   6.25722793]
 [6.26822668 6.26720773]
 [6.27809176 6.27703446]
 [6.2878009  6.28670773]
 [6.29735374 6.29622714]
 [6.30674996 6.3055923 ]
 [6.31598921 6.31480284]
 [6.32507117 6.32385838]
 [6.33399555 6.33275858]
 [6.34276202 6.34150307]
 [6.3513703  6.35009152]
 [6.35982011 6.35852361]
 [6.36811116 6.366799  ]
 [6.37624319 6.37491739]
 [6.38421595 6.38287848]
 [6.39202919 6.39068197]
 [6.39968266 6.39832758]
 [6.40717614 6.40581503]
 [6.41450941 6.41314407]
 [6.42168225 6.42031443]
 [6.42869447 6.42732588]
 [6.43554587 6.43417818]
 [6.44223627 6.44087109]
 [6.44876549 6.44740441]
 [6.45513337 6.45377793]
 [6.46133975 6.45999145]
 [6.46738449 6.46604478]
 [6.47326745 6.47193775]
 [6.4789885  6.47767018]
 [6.48454752 6.48324192]
 [6.4899444  6.48865282]
 [6.49517905 6.49390274]
 [6.50025137 6.49899155]
 [6.50516128 6.50391912]
 [6.50990871 6.50868536]
 [6.51449359 6.51329015]
 [6.51891587 6.51773341]
 [6.5231755  6.52201505]
 [6.52727246 6.526135  ]
 [6.53120671 6.5300932 ]
 [6.53497823 6.5338896 ]
 [6.53858703 6.53752415]
 [6.54203309 6.54099682]
 [6.54531644 6.54430758]
 [6.54843708 6.54745643]
 [6.55139505 6.55044335]
 [6.55419039 6.55326835]
 [6.55682315 6.55593145]
 [6.55929337 6.55843266]
 [6.56160113 6.56077202]
 [6.5637465  6.56294959]
 [6.56572957 6.56496539]
 [6.56755042 6.56681951]
 [6.56920916 6.568512  ]
 [6.57070589 6.57004296]
 [6.57204075 6.57141247]
 [6.57321385 6.57262063]
 [6.57422534 6.57366755]
 [6.57507536 6.57455334]
 [6.57576406 6.57527815]
 [6.57629162 6.5758421 ]
 [6.5766582  6.57624534]
 [6.57686399 6.57648802]
 [6.57690918 6.57657032]
 [6.57679396 6.5764924 ]
 [6.57651855 6.57625445]
 [6.57608316 6.57585667]
 [6.57548803 6.57529925]
 [6.57473337 6.57458241]
 [6.57381945 6.57370636]
 [6.5727465  6.57267134]
 [6.5715148  6.57147758]
 [6.5701246  6.57012534]
 [6.56857619 6.56861487]
 [6.56686986 6.56694643]
 [6.56500589 6.5651203 ]
 [6.5629846  6.56313676]
 [6.56080628 6.56099611]
 [6.55847127 6.55869865]
 [6.5559799  6.55624468]
 [6.55333249 6.55363453]
 [6.55052939 6.55086853]
 [6.54757096 6.547947  ]
 [6.54445755 6.5448703 ]
 [6.54118955 6.54163877]
 [6.53776731 6.53825279]
 [6.53419124 6.53471271]
 [6.53046172 6.53101893]
 [6.52657915 6.52717182]
 [6.52254395 6.52317179]
 [6.51835653 6.51901924]
 [6.51401732 6.51471458]
 [6.50952675 6.51025823]
 [6.50488526 6.50565062]
 [6.5000933  6.50089219]
 [6.49515133 6.49598339]
 [6.4900598  6.49092466]
 [6.4848192  6.48571648]
 [6.47943    6.4803593 ]
 [6.47389268 6.47485361]
 [6.46820775 6.4691999 ]
 [6.4623757  6.46339865]
 [6.45639704 6.45745038]
 [6.45027229 6.45135558]
 [6.44400196 6.44511477]
 [6.4375866  6.43872848]
 [6.43102674 6.43219724]
 [6.42432292 6.42552159]
 [6.41747569 6.41870208]
 [6.41048562 6.41173926]
 [6.40335327 6.4046337 ]
 [6.39607921 6.39738595]
 [6.38866402 6.38999661]
 [6.38110829 6.38246624]
 [6.3734126  6.37479545]
 [6.36557757 6.36698483]
 [6.35760379 6.35903499]
 [6.34949188 6.35094653]
 [6.34124245 6.34272008]
 [6.33285614 6.33435626]
 [6.32433357 6.3258557 ]
 [6.31567539 6.31721905]
 [6.30688223 6.30844694]
 [6.29795475 6.29954003]
 [6.2888936  6.29049899]
 [6.27969945 6.28132447]
 [6.27037297 6.27201714]
 [6.26091482 6.26257769]
 [6.2513257  6.2530068 ]
 [6.24160629 6.24330516]
 [6.23175727 6.23347346]
 [6.22177936 6.22351241]
 [6.21167324 6.21342272]
 [6.20143964 6.2032051 ]
 [6.19107926 6.19286027]
 [6.18059283 6.18238897]
 [6.16998106 6.17179191]
 [6.15924471 6.16106984]
 [6.14838448 6.15022351]
 [6.13740114 6.13925365]
 [6.12629543 6.12816104]
 [6.11506809 6.11694642]
 [6.10371989 6.10561055]
 [6.09225158 6.09415423]
 [6.08066394 6.0825782 ]
 [6.06895772 6.07088327]
 [6.05713372 6.05907021]
 [6.04519271 6.04713981]
 [6.03313547 6.03509288]
 [6.0209628  6.0229302 ]
 [6.00867548 6.01065259]
 [5.99627433 5.99826086]
 [5.98376013 5.98575582]
 [5.9711337  5.97313829]]

Clearly the two fits are very nearly the same. Let us now change the prior on f(x)f(x) to:

f(x)=β0+g(x)\begin{align*} f(x) = \beta_0 + g(x) \end{align*}

where gg now has the aforementioned squared exponential kernel. We can treate β0\beta_0 also as a hyperparameter and estimate it by maximizing the marginal likelihood.

def rbf_kernel(x1, x2, tau2, ell):
    x1 = np.asarray(x1).reshape(-1, 1)
    x2 = np.asarray(x2).reshape(-1, 1)
    sqdist = (x1 - x2.T) ** 2
    return tau2 * np.exp(-sqdist / (2 * ell**2))

def log_marginal_likelihood(x, y, beta0, tau2, ell, sigma2):
    n = len(x)

    K = rbf_kernel(x, x, tau2, ell) + sigma2 * np.eye(n)
    K = K + 1e-8 * np.eye(n)   # jitter for numerical stability

    yc = y - beta0             # center by the mean parameter

    L = np.linalg.cholesky(K)
    alpha = np.linalg.solve(L.T, np.linalg.solve(L, yc))
    logdet = 2 * np.sum(np.log(np.diag(L)))

    return -0.5 * yc @ alpha - 0.5 * logdet - 0.5 * n * np.log(2 * np.pi)

def objective(params, x, y):
    beta0, log_tau2, log_ell, log_sigma2 = params

    tau2 = np.exp(log_tau2)
    ell = np.exp(log_ell)
    sigma2 = np.exp(log_sigma2)

    return -log_marginal_likelihood(x, y, beta0, tau2, ell, sigma2)

# initial guess
init = np.array([
    np.mean(y),        # beta0
    np.log(1.0),       # tau2
    np.log(1.0),       # ell
    np.log(0.1)        # sigma2
])

bounds = [
    (None, None),                  # beta0 can be any real number
    (np.log(1e-4), np.log(1e4)),   # tau2
    (np.log(1e-3), np.log(1e3)),   # ell
    (np.log(1e-6), np.log(1e1))    # sigma2
]

result = minimize(
    objective,
    init,
    args=(x, y),
    method="L-BFGS-B",
    bounds=bounds
)

beta0_hat = result.x[0]
tau2_hat = np.exp(result.x[1])
ell_hat = np.exp(result.x[2])
sigma2_hat = np.exp(result.x[3])

best_params_lbfgs_beta0 = (beta0_hat, tau2_hat, ell_hat, sigma2_hat)

print("Best parameters (beta0, tau2, ell, sigma2):", best_params_lbfgs_beta0)
Best parameters (beta0, tau2, ell, sigma2): (np.float64(5.879643451687931), np.float64(0.3424475854385845), np.float64(14.032516835084214), np.float64(0.29735067195655634))

Note now that the estimates of τ\tau and \ell change quite significantly compared to before. Below we compute the posterior mean of f(x)f(x) in this model. We are simply subtracting β0\beta_0 from yy to get yβ0y - \beta_0, then computing the posterior mean as before, and then adding back β0\beta_0.

n = len(x)

beta0, tau2, ell, sigma2 = best_params_lbfgs_beta0

K = rbf_kernel(x, x, tau2, ell) + sigma2 * np.eye(n)
Kinv = np.linalg.inv(K)

x_grid = np.linspace(np.min(x), np.max(x), 200)

K_star = rbf_kernel(x_grid, x, tau2, ell)

post_mean_lbfgs_beta0 = beta0 + K_star @ Kinv @ (y - beta0)
plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
plt.plot(x_grid,post_mean,color='red',lw=2, label='Grid Search Fit')
plt.plot(x_grid,post_mean_lbfgs,color='blue',lw=2, label='L-BFGS-B Fit')
plt.plot(x_grid,post_mean_lbfgs_beta0,color='green',lw=2, label='L-BFGS-B Fit with beta0')
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.legend()
plt.title('Gaussian Process Regression with the squared exponential kernel')
plt.show()
<Figure size 1000x600 with 1 Axes>

Let us now compare this estimate with that obtained from the Integrated Brownian Motion prior that we used in Lecture 23. The value of γ\gamma below was obtained by posterior inference on γ\gamma from Lecture 23.

gamma = 0.0278 #this value was suggested by posterior inference over gamma (see code for Lecture 23)
X = np.column_stack((np.ones(n), x)) #usual X matrix for linear regression

def K_I(u, v):
    minuv = min(u, v)
    maxuv = max(u, v)
    return (minuv**2 / 6.0) * (3*maxuv - minuv)

K_ibm = np.array([[K_I(xi, xj) for xj in x] for xi in x])

# grid
M = np.max(x)
grid_f = x_grid
kvec  = np.array([[K_I(u, xi) for xi in x] for u in grid_f])

# A_gamma
A_gamma = np.eye(n) + gamma**2 * K_ibm
A_gamma_inv = np.linalg.inv(A_gamma)

#the X^T A_gamma^-1 X and its inverse
XtAinvX = X.T @ A_gamma_inv @ X
XtAinvX_inv = np.linalg.inv(XtAinvX)

# the matrix P (defined above)
P = A_gamma_inv - A_gamma_inv @ X @ XtAinvX_inv @ X.T @ A_gamma_inv

# Offset term:
alpha = XtAinvX_inv  @ X.T @ A_gamma_inv @ y

# all the vectors (1, u) stacked together for all u in grid_f
H = np.column_stack((np.ones(len(grid_f)), grid_f))

# posterior mean
post_mean_ibm = H @ alpha + gamma**2 * kvec @ (P @ y)
plt.figure(figsize=(10,6))
plt.scatter(x,y, s = 15, color = 'black')
#plt.plot(x_grid,post_mean,color='red',lw=2, label='Grid Search Fit')
plt.plot(x_grid,post_mean_lbfgs,color='blue',lw=2, label='L-BFGS-B Fit (without beta0)')
plt.plot(x_grid,post_mean_lbfgs_beta0,color='green',lw=2, label='L-BFGS-B Fit with beta0')
plt.plot(x_grid,post_mean_ibm,color='orange',lw=2, label='IBM prior')
plt.xlabel('Experience')
plt.ylabel('Log Earnings')
plt.legend()
plt.title('Gaussian Process Regression')
plt.show()
<Figure size 1000x600 with 1 Axes>

These estimates all look quite similar.