We studied the following model in the past few lectures.
We have a response variable y and a single covariate x. In our example, y denotes weekly earnings and x denotes years of experience. The covariate x takes the values 0,1,…,m for some fixed integer m.
where I(x) is integrated Brownian motion (and β0,β1∼i.i.dN(0,C)). We shall some intuition today as to why the Integrated Brownian Motion is appearing here.
Before defining Brownian motion, let us recall the definition of Brownian motion. Brownian motion on [0,M] (for some fixed M>0) is a stochastic process {B(t),0≤t≤M} characterized by the following two properties:
Every realization is a continuous function on [0,M], and B(0)=0.
For every fixed t1,…,tk∈[0,M], the random vector (B(t1),…,B(tk)) has a multivariate normal distribution with mean zero and covariance matrix
on [0,M] with ti=iM/N. One can then simulate (B(t1),…,B(tN)) from the multivariate normal distribution with mean zero and covariance matrix min(ti,tj).
However, this approach is computationally inefficient for large N. Sampling directly from an N×N covariance matrix typically requires a matrix factorization (such as a Cholesky decomposition), which has computational cost on the order of O(N3).
For simulations in particular, the following alternative definition is more appealing:
Every realization is a continuous function on [0,M] with B(0)=0.
The process has independent increments which means that B(t1),B(t2)−B(t1),…,B(tk)−B(tk−1) are independent whenever 0≤t1<⋯<tk≤M.
B(t)−B(s)∼N(0,t−s) whenever 0≤s≤t.
With this definition, we can simulate B(t1),…,B(tN) for ti=iM/N in the following way:
First generate α1,…,αN∼i.i.dN(0,M/N)
Take B(ti)=α1+⋯+αi for i=1,…,N.
Note that in this way αi=B(ti)−B(ti−1). The equation B(ti)=α1+⋯+αi can also be written as
This means that Brownian motion can be well-approximated (when N is large) by a piecewise constant process which jumps at each grid point ti=iM/N by a small amount given by N(0,M/N).
We are now ready to define the Integrated Brownian Motion. This is given by:
Here N is a large number. If we do a further coarse approximation with N=M, we get back the model (1) if we make the identification βi=ταi−1 for i=2,…,N and M=m. Therefore, the model (1) can be understood as a coarse discretization of the model (1) which stipulates that f is an Integrated Brownian Motion (scaled by τ) and added to a linear function β0+β1x with the uniformative N(0,C) prior on the coefficients β0,β1.
The prior (1) is an example of a Gaussian process prior.
Consider the usual nonparametric regression problem where the goal is to estimate an unknown function f:Ω→R from observations (x1,y1),…,(xn,yn) under the model:
yi=f(xi)+ϵiwhere ϵi∼i.i.dN(0,σ2).
Here Ω denotes the domain which is a subset of Rd for some d≥1.
We assume that {f(x),x∈Ω} forms a Gaussian process with mean function μ(x),x∈Ω and covariance function or kernel given by K(x,x′) i.e.,
Cov(f(x),f(x′))=K(x,x′)for all x,x′∈Ω.
The kernel is positive semi-definite i.e., for every N≥1, distinct points u1,…,uN∈Ω, the N×N matrix with (i,j)th entry K(ui,uj) is positive semi-definite. Often, this matrix will be positive definite, and hence invertible.
The mean function is usually taken to be zero. Here are some standard special cases. Here are some examples of Gaussian processes and kernels:
Brownian Motion: Here Ω=[0,∞) and K(s,t)=min(s,t).
(scaled) Brownian Motion plus constant: Here Ω=[0,∞) and we assume that f(t)=β0+τWt where Wt∼BM, β0∼N(0,C) and τ>0 (and independence between β0 and {Wt}). C will be taken to be large. Now
(scaled) Integrated Brownian Motion Plus a Linear Term: In the IBM model, we have f(0)=0 and f′(0)=0. This might be an unrealistic assumption to make when f is completely unknown. In this case, a better model might be
f(t)=β0+β1t+τ∫0tWsds
where β0,β1,{Ws} are independent with Ws being Brownian motion and β0,β1∼i.i.dN(0,C). The kernel now becomes