Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Twenty Seven

Spring 2026, UC Berkeley

Gibbs Sampler for Probit Regression

Our next application of the Gibbs sampler is to Probit Regression which was introduced in Homework Four. The model is given by the following. We observe data (y1,x1),,(yn,xn)(y_1, x_1), \dots, (y_n, x_n) with y1,,yny_1, \dots, y_n binary and x1,,xnRpx_1, \dots, x_n \in \R^p. x1,,xnx_1, \dots, x_n are treated non-random and y1,,yny_1, \dots, y_n random. The parameter is βRp\beta \in \R^p. The probit regression model assumes that y1,yny_1 \dots, y_n are independent given β\beta with

yiβBernoulli(Φ(xiβ))y_i | \beta \sim \text{Bernoulli} \left(\Phi(x_i' \beta) \right)

where Φ()\Phi(\cdot) is the standard normal cdf. Let π()\pi(\cdot) be our prior for β\beta. The posterior is then given by:

π(βdata)π(β)i=1n(Φ(xiβ))yi(1Φ(xiβ))1yi.\pi(\beta | \text{data}) \propto \pi(\beta) \prod_{i=1}^n \left(\Phi(x_i' \beta) \right)^{y_i} \left(1 - \Phi(x_i' \beta) \right)^{1-y_i}.

Suppose we take the (improper) uniform prior for β\beta so that the posterior becomes

π(βdata)i=1n(Φ(xiβ))yi(1Φ(xiβ))1yi.\pi(\beta | \text{data}) \propto \prod_{i=1}^n \left(\Phi(x_i' \beta) \right)^{y_i} \left(1 - \Phi(x_i' \beta) \right)^{1-y_i}.

It is not actually clear how to use the Gibbs sampler on the above posterior. We can try decomposing β\beta as (β1,,βp)(\beta_1, \dots, \beta_p) and then attempt to obtain samples from the conditional:

π(βjβj,data)i=1n(Φ(xiβ))yi(1Φ(xiβ))1yi.\pi(\beta_j | \beta_{-j}, \text{data}) \propto \prod_{i=1}^n \left(\Phi(x_i' \beta) \right)^{y_i} \left(1 - \Phi(x_i' \beta) \right)^{1-y_i}.

I am not sure how to obtain samples from the above conditional.

There is a trick that people use to run the Gibbs sampler for probit regression. Let us introduce random variables w1,,wnw_1, \dots, w_n which, conditional on β\beta, are independent with

wiβN(xiβ,1).w_i | \beta \sim N(x_i' \beta, 1).

We then let

yi=I{wi>0}.y_i = I\{w_i > 0\}.

It is then clear that y1,,yny_1, \dots, y_n defined thus have the same likelihood as in the probit regression model. Let w=(w1,,wn)w = (w_1, \dots, w_n). The posterior of (w,β)(w, \beta) given the data (data=y=(y1,,yn)\text{data} = y = (y_1, \dots, y_n)) is

π(w,βdata)π(β)π(wβ)π(yw,β)i=1nexp(12(wixiβ)2)i=1n(I{wi>0,yi=1}+I{wi0,yi=0})exp(12wXβ2)i=1n(I{wi>0,yi=1}+I{wi0,yi=0}).\begin{align*} \pi(w, \beta | \text{data}) &\propto \pi(\beta) \pi(w | \beta) \pi(y | w, \beta) \\ &\propto \prod_{i=1}^n \exp \left(\frac{-1}{2} (w_i - x_i' \beta)^2\right) \prod_{i=1}^n \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right) \\ &\propto \exp \left(\frac{-1}{2} \|w - X \beta\|^2 \right) \prod_{i=1}^n \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right). \end{align*}

where XX is, as usual, the design matrix with rows x1,,xnx_1', \dots, x_n'.

We shall now run the Gibbs sampler with the decomposition θ=(w,β)\theta = (w, \beta). For this, we need to sample from the conditionals βw,data\beta|w, \text{data} and wβ,dataw | \beta, \text{data}.

π(βw,data)π(w,βdata)exp(12wXβ2)i=1n(I{wi>0,yi=1}+I{wi0,yi=0})exp(12wXβ2)\begin{align*} \pi(\beta | w, \text{data}) &\propto \pi(w, \beta | \text{data}) \\ &\propto \exp \left(\frac{-1}{2} \|w - X \beta\|^2 \right) \prod_{i=1}^n \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right) \\ &\propto \exp \left(\frac{-1}{2} \|w - X \beta\|^2 \right) \end{align*}

For a fixed wRnw \in \R^n, let

β^w:=argminβRpwXβ2=(XX)1Xw\hat{\beta}_w := \underset{\beta \in \R^p}{\text{argmin}} \|w - X \beta\|^2 = (X' X)^{-1} X' w

so that

wXβ2=wXβ^w2+Xβ^wXβ2=wXβ^w2+(ββ^w)(XX)(ββ^w).\|w - X \beta\|^2 = \|w - X \hat{\beta}_w\|^2 + \|X \hat{\beta}_w - X \beta\|^2 = \|w - X \hat{\beta}_w\|^2 + \left(\beta - \hat{\beta}_w \right)' (X' X) \left(\beta - \hat{\beta}_w \right).

Thus

π(βw,data)exp(12(ββ^w)(XX)(ββ^w)).\pi(\beta | w, \text{data}) \propto \exp \left(\frac{-1}{2} \left(\beta - \hat{\beta}_w \right)' (X' X) \left(\beta - \hat{\beta}_w \right) \right).

In other words,

βw,dataN(β^w,(XX)1).\beta|w, \text{data} \sim N \left(\hat{\beta}_w, (X' X)^{-1} \right).

The next task is to figure out the conditional wβ,dataw | \beta, \data:

π(wβ,data)π(w,βdata)exp(12wXβ2)i=1n(I{wi>0,yi=1}+I{wi0,yi=0})i=1nexp(12(wixiβ)2)(I{wi>0,yi=1}+I{wi0,yi=0})\begin{align*} \pi(w | \beta, \text{data}) &\propto \pi(w, \beta | \text{data}) \\ &\propto \exp \left(\frac{-1}{2} \|w - X \beta\|^2 \right) \prod_{i=1}^n \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right) \\ & \propto \prod_{i=1}^n \exp \left(\frac{-1}{2} (w_i - x_i' \beta)^2 \right) \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right) \end{align*}

Thus w1,,wnw_1, \dots, w_n are independent conditional on β\beta and data\data. Further

π(wiβ,data)=π(wiβ,yi)exp(12(wixiβ)2)(I{wi>0,yi=1}+I{wi0,yi=0}).\pi(w_i|\beta, \data) = \pi(w_i | \beta, y_i) \propto \exp \left(\frac{-1}{2} (w_i - x_i' \beta)^2 \right) \left(I\{w_i > 0, y_i = 1\} + I\{w_i \leq 0, y_i = 0\} \right).

It makes sense now to separate the two cases yi=1y_i = 1 and yi=0y_i = 0. For yi=1y_i = 1, we get

π(wiβ,yi=1)exp(12(wixiβ)2)I{wi>0}.\pi(w_i | \beta, y_i = 1) \propto \exp \left(\frac{-1}{2} (w_i - x_i' \beta)^2 \right) I\{w_i > 0\}.

Thus, conditional on β\beta and yi=1y_i = 1, the random variable wiw_i has the N(xiβ,1)N(x_i' \beta, 1) distribution trucated to the positive half-line.

Here is a fact (from the wikipedia article for truncated normal distributions: Truncated normal distribution#Generating values from the truncated normal distribution) that is useful for simulating from truncated normal distributions.

Using this fact, it is straightforward to verify that wi{β,yi=1}w_i|\{\beta, y_i = 1\} can be simulated by

xiβ+Φ1((1U)Φ(xiβ)+U).x_i' \beta + \Phi^{-1} \left((1 - U) \Phi(-x_i' \beta) + U \right).

This follows directly from the fact above by taking μ=xiβ\mu = x_i' \beta, σ=1\sigma = 1, a=0a = 0 and b=b = \infty (this implies that a0=xiβa_0 = -x_i' \beta and b0=b_0 = \infty).

One can argue analogously to deduce that wi{β,yi=0}w_i | \{\beta, y_i = 0\} can be simulated by

xiβ+Φ1(UΦ(xiβ)).x_i' \beta + \Phi^{-1} \left(U \Phi(-x_i' \beta) \right).

The Gibbs sampler for the probit model can be therefore implemented by the following model:

  1. Initialize at β(0)\beta^{(0)} and w(0)w^{(0)}.

  2. Repeat the following for t=1,,Nt = 1, \dots, N

    1. Pick JJ to be either 1 or 2 with equal probability.

    2. If J=1J = 1, take β(t+1)N((XX)1Xw(t),(XX)1)\beta^{(t+1)} \sim N((X'X)^{-1} X' w^{(t)}, (X' X)^{-1}) and w(t+1)=w(t)w^{(t+1)} = w^{(t)}.

    3. If J=2J = 2, take β(t+1)=β(t)\beta^{(t+1)} = \beta^{(t)} and

      wi={xiβ(t)+Φ1((1Ui)Φ(xiβ(t))+Ui)if yi=1xiβ(t)+Φ1(UiΦ(xiβ(t)))if yi=0w_i = \begin{cases} x_i' \beta^{(t)} + \Phi^{-1} \left((1 - U_i) \Phi(-x_i' \beta^{(t)}) + U_i \right) & \text{if } y_i = 1 \\ x_i' \beta^{(t)} + \Phi^{-1} \left(U_i \Phi(-x_i' \beta^{(t)}) \right) & \text{if } y_i = 0 \end{cases}

      where UiUnif(0,1)U_i \sim Unif(0, 1).