Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Twenty Two

Spring 2026, UC Berkeley

Gaussian Processes

The goal is to infer an unknown function f:ΩRf : \Omega \rightarrow \R that is defined on a known domain ΩRp\Omega \subseteq \R^p. We assume that {f(x),xΩ}\{f(x), x \in \Omega\} forms a Gaussian process with mean zero and covariance function or kernel given by K(x,x)K(x, x') i.e.,

Cov(f(x),f(x))=K(x,x)  for all x,xΩ.\text{Cov}(f(x), f(x')) = K(x, x') ~~ \text{for all $x, x' \in \Omega$}.

The kernel needs to be positive semi-definite i.e., for every N1N \geq 1, distinct points u1,,uNΩu_1, \dots, u_N \in \Omega, the N×NN \times N matrix with (i,j)th(i, j)^{th} entry K(ui,uj)K(u_i, u_j) is positive semi-definite. Sometimes, this matrix will be positive definite, and hence invertible.

Here are some examples of Gaussian processes and kernels.

  1. Brownian Motion: Here Ω=[0,)\Omega = [0, \infty) and K(s,t)=min(s,t)K(s, t) = \min(s, t).

  2. (scaled) Brownian Motion plus constant: Here again Ω=[0,)\Omega = [0, \infty). In the Brownian motion model, f(0)=0f(0) = 0 which is generally an unrealistic assumption. To fix this, we assume that f(t)=β0+τWtf(t) = \beta_0 + \tau W_t where WtBMW_t \sim BM, β0N(0,C)\beta_0 \sim N(0, C) and τ>0\tau > 0 (and independence between β0\beta_0 and {Wt}\{W_t\}). CC will be taken to be large (potentially CC \rightarrow \infty). Now

    K(s,t)=C+τ2min(s,t).K(s, t) = C + \tau^2 \min(s, t).
  3. Integrated Brownian Motion: Ω=[0,)\Omega = [0, \infty) and

    f(t)=0tWsds  where WsBM.f(t) = \int_0^t W_s ds ~~ \text{where $W_s \sim BM$}.

    The kernel is given by

    K(s,t)=Cov(0sWudu,0tWvdv)=0s0tCov(Wu,Wv)dvdu=0s0tmin(u,v)dvdu\begin{align*} K(s, t) &= \text{Cov} \left(\int_0^s W_u du, \int_0^t W_v dv \right) \\ &= \int_0^s \int_0^t \text{Cov}(W_u, W_v) dv du = \int_0^s \int_0^t \min(u, v) dv du \end{align*}

    To simplify further, assume that sts \leq t so that

    K(s,t)=0s(0uvdv+utudv)du=0s(u22+u(tu))du=0s(utu22)du=ts22s36.\begin{align*} K(s, t) &= \int_0^s \left(\int_0^u v dv + \int_u^t u dv \right) du \\ &= \int_0^s \left(\frac{u^2}{2} + u(t - u) \right) du = \int_0^s \left(ut - \frac{u^2}{2} \right) du = t \frac{s^2}{2} - \frac{s^3}{6}. \end{align*}

    For general ss and tt, we have

    K(s,t)=12max(s,t)(min(s,t))216(min(s,t))3.K(s, t) = \frac{1}{2} \max(s, t) \left(\min(s, t)\right)^2 - \frac{1}{6} \left(\min(s, t) \right)^3.
  4. (scaled) Integrated Brownian Motion Plus a Linear Term: In the IBM model, we have f(0)=0f(0) = 0 and f(0)=0f'(0) = 0. This might be an unrealistic assumption to make when ff is completely unknown. In this case, a better model is:

    f(t)=β0+β1t+τ0tWsdsf(t) = \beta_0 + \beta_1 t + \tau \int_0^t W_s ds

    where β0,β1,{Ws}\beta_0, \beta_1, \{W_s\} are independent with WsW_s being Brownian motion and β0,β1i.i.dN(0,C)\beta_0, \beta_1 \overset{\text{i.i.d}}{\sim} N(0, C). The kernel now becomes

    K(s,t)=Cov(β0+β1s+τ0sWudu,β0+β1t+τ0tWvdv)=C(1+st)+τ22max(s,t)(min(s,t))2τ26(min(s,t))3.\begin{align*} K(s, t) &= \text{Cov} \left(\beta_0 + \beta_1 s + \tau \int_0^s W_u du, \beta_0 + \beta_1 t + \tau \int_0^t W_v dv \right) \\ &= C (1 + st) + \frac{\tau^2}{2} \max(s, t) \left(\min(s, t)\right)^2 - \frac{\tau^2}{6} \left(\min(s, t) \right)^3. \end{align*}

We will see some more examples of kernels later. Next we see some basic applications of Gaussian Processes.

Interpolation

Suppose we are given values of ff at 0x1<<xn10 \leq x_1 < \dots < x_n \leq 1: f(x1),,f(xn)f(x_1), \dots, f(x_n). Let x[0,1]x \in [0, 1] be a new point (i.e., distinct from x1,,xnx_1, \dots, x_n). What can we say about f(x)f(x)?

We will solve this problem assuming a Gaussian process prior for ff with mean zero and covariance kernel K(,)K(\cdot, \cdot). The answer then will be given by the conditional distribution of f(x)f(x) given f(x1),,f(xn)f(x_1), \dots, f(x_n). To compute this, we note that:

(f(x1),,f(xn),f(x))TN(0,((K(xi,xj))n×n(K(xi,x))n×1(K(x,xi))1×nK(x,x))).\begin{align*} (f(x_1), \dots, f(x_n), f(x))^T \sim N \left(0, \begin{pmatrix} (K(x_i, x_j))_{n \times n} & (K(x_i, x))_{n \times 1} \\ (K(x, x_i))_{1 \times n} & K(x, x) \end{pmatrix} \right). \end{align*}

Using the notation K=(K(xi,xj))n×nK = (K(x_i, x_j))_{n \times n} and k=(K(xi,x))n×1\mathbf{k} = (K(x_i, x))_{n \times 1}, we can write the conditional distribution of f(x)f(x) given f(x1),,f(xn)f(x_1), \dots, f(x_n) as

f(x)f(x1),,f(xn)N(kTK1(f(x1),,f(xn))T,K(x,x)kTK1k).\begin{align*} f(x) \mid f(x_1), \dots, f(x_n) \sim N\left(\mathbf{k}^TK^{-1} (f(x_1), \dots, f(x_n))^T, K(x, x) - \mathbf{k}^T K^{-1} \mathbf{k} \right). \end{align*}

Thus the posterior mean (or mode) estimate of f(x)f(x) is given by

f(x)^=kTK1(f(x1),,f(xn))T.\begin{align} \widehat{f(x)} = \mathbf{k}^TK^{-1} (f(x_1), \dots, f(x_n))^T. \end{align}

For different kernels KK, the above expression behaves differently on f(x1),,f(xn)f(x_1), \dots, f(x_n). The simplest example is that of Brownian Motion.

Next example is that of the Integrated Brownian Motion prior.

Integration

Suppose we are given values of ff at 0x1<<xn0 \leq x_1 < \dots < x_n \leq: f(x1),f,f(xn)f(x_1), f\dots, f(x_n). What can we say about 01f(x)\int_0^1f(x)?

Again we place a Gaussian process prior on ff. By linearity, we can write

E[01f(x)dxf(x1),,f(xn)]=01E[f(x)f(x1),,f(xn)]=01kTK1(f(x1),,f(xn))T=(01k(x1,x)dx,,01k(xn,x)dx)TK1(f(x1),,f(xn))T.\begin{align*} & \E \left[ \int_0^1 f(x) dx \mid f(x_1), \dots, f(x_n) \right] \\ &= \int_0^1 \E \left[ f(x) \mid f(x_1), \dots, f(x_n) \right] \\ &= \int_0^1 \mathbf{k}^T K^{-1} (f(x_1), \dots, f(x_n))^T \\ &= \left(\int_0^1 k(x_1, x) dx, \dots,\int_0^1 k(x_n, x) dx \right)^T K^{-1} (f(x_1), \dots, f(x_n))^T. \end{align*}

For more on the relation between classical integration (quadrature) rules and Gaussian processes, see the book hennig2022probabilistic or the paper diaconis1988bayesian.

In the next lecture, we shall see how to use Gaussian processes for the regression problem.