Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Eleven

Spring 2026, UC Berkeley

Bayesian Inference with Normal Likelihoods

Data is yy and parameter is θ\theta, the likelihood is given by:

yθN(θ,σ2)\begin{align*} y \mid \theta \sim N(\theta, \sigma^2) \end{align*}

for a fixed σ>0\sigma > 0. We will look at the case of unknown σ\sigma later. The frequentist estimator of θ\theta is the MLE yy. For Bayesian inference, we use the normal N(μ,τ2)N(\mu, \tau^2) prior on θ\theta. The basic fact is:

θN(μ,τ2)  and  yθN(θ,σ2)    θyN(y/σ2+μ/τ21/σ2+1/τ2,11/σ2+1/τ2) and yN(μ,σ2+τ2).\begin{align*} \theta \sim N(\mu, \tau^2) ~ \text{ and } ~ y \mid \theta \sim N(\theta, \sigma^2) &\implies \theta \mid y \sim N \left(\frac{y/\sigma^2 + \mu/\tau^2}{1/\sigma^2 + 1/\tau^2}, \frac{1}{1/\sigma^2 + 1/\tau^2} \right) \\ &\text{ and } y \sim N(\mu, \sigma^2 + \tau^2). \end{align*}

The mean of the posterior distribution is thus

y/σ2+μ/τ21/σ2+1/τ2=1/σ21/σ2+1/τ2y+1/τ21/σ2+1/τ2μ\frac{y/\sigma^2 + \mu/\tau^2}{1/\sigma^2 + 1/\tau^2} = \frac{1/\sigma^2}{1/\sigma^2 + 1/\tau^2} y + \frac{1/\tau^2}{1/\sigma^2 + 1/\tau^2} \mu

which is a weighted linear combination of the prior mean μ\mu and the data yy. The weights are inversely proportion to the variances of the corresponding normal distributions. For a normal distribution, we use the term “precision” to denote the inverse of variance. Thus the weights of the linear combination above are proportional to the precisions of the prior and the likelihood.

Also note that the precision of the posterior equals the sum of the prior and likelihood precisions.

Note that when τ2\tau^2 \rightarrow \infty and μ\mu is a fixed constant, we have θyN(y,σ2)\theta \mid y \sim N(y, \sigma^2) (in this case, the posterior mean coincides with the MLE yy). Operationally, the N(μ,τ2)N(\mu, \tau^2) prior with fixed μ\mu and τ2=+\tau^2 = +\infty has the same behavior as the uniform(,)\text{uniform}(-\infty, \infty) prior. These are uninformative priors in this problem.

Often the data will consist of multiple numbers y1,,yny_1, \dots, y_n with the likelihood being:

y1,,yni.i.dN(θ,σ2).\begin{align*} y_1, \dots, y_n \overset{\text{i.i.d}}{\sim} N(\theta, \sigma^2). \end{align*}

In this case (for the same prior θN(μ,τ2)\theta \sim N(\mu, \tau^2)), the posterior is

θy1,,ynN(nyˉ/σ2+μ/τ2n/σ2+1/τ2,1n/σ2+1/τ2).\theta | y_1, \dots, y_n \sim N \left(\frac{n\bar{y}/\sigma^2 + \mu/\tau^2}{n/\sigma^2 + 1/\tau^2}, \frac{1}{n/\sigma^2 + 1/\tau^2} \right).

where yˉ:=(y1++yn)/n\bar{y} := (y_1 + \dots + y_n)/n. This can be proved directly or use the fact that yˉ\bar{y} is sufficient for θ\theta and the data can be reduced to the single number yˉ\bar{y} with likelihood N(θ,σ2/n)N(\theta, \sigma^2/n).

Multiple Instances of the Same Problem

Consider the problem of estimating θ1,,θN\theta_1, \dots, \theta_N from data y1,,yNy_1, \dots, y_N under the likelihood:

yiindN(θi,σ2)\begin{align*} y_i \overset{\text{ind}}{\sim} N(\theta_i, \sigma^2) \end{align*}

for a fixed σ2\sigma^2. We further assume:

θii.i.dN(μ,τ2).\begin{align*} \theta_i \overset{\text{i.i.d}}{\sim} N(\mu, \tau^2). \end{align*}

If we fix μ\mu to be some constant value (say μ=0\mu = 0) and τ2\tau^2 to be very large, then the estimate of θi\theta_i would be equal to yiy_i.

Instead of fixing these values, we can treat μ\mu and τ2\tau^2 also as unknown parameters and attempt to estimate them from the observed data. Marginalizing θi\theta_i, it is easy to see that

yiμ,τi.i.dN(μ,σ2+τ2).\begin{align*} y_i \mid \mu, \tau \overset{\text{i.i.d}}{\sim} N(\mu, \sigma^2 + \tau^2). \end{align*}

One can estimate μ\mu and τ\tau from this model by some estimates μ^\hat{\mu} and τ^\hat{\tau}. θi\theta_i is then estimated by E(θiyi,μ=μ^,τ=τ^)\E(\theta_i \mid y_i, \mu = \hat{\mu}, \tau = \hat{\tau}). How to obtain μ^\hat{\mu} and τ^\hat{\tau}? It is natural to take

μ^=y1++yNN.\begin{align*} \hat{\mu} = \frac{y_1 + \dots + y_N}{N}. \end{align*}

The estimate of τ\tau should be based on the sample variance:

i=1N(yiyˉ)2(σ2+τ2)χN12.\begin{align*} \sum_{i=1}^N(y_i - \bar{y})^2 \sim \left(\sigma^2 + \tau^2\right) \chi^2_{N-1}. \end{align*}

The formula for E(θiyi,μ,τ)\E(\theta_i \mid y_i, \mu, \tau) is:

E(θiyi,μ,τ)=yi/σ2+μ/τ21/σ2+1/τ2=μ+(1σ2σ2+τ2)(yiμ).\begin{align*} \E(\theta_i \mid y_i, \mu, \tau) = \frac{y_i/\sigma^2 + \mu/\tau^2}{1/\sigma^2 + 1/\tau^2} = \mu + \left(1 - \frac{\sigma^2}{\sigma^2 + \tau^2} \right) (y_i - \mu). \end{align*}

So we need to estimate 1/(σ2+τ2)1/(\sigma^2 + \tau^2) as opposed to τ2\tau^2 directly. The following fact:

Undefined control sequence: \qt at position 72: … \frac{1}{v-2} \̲q̲t̲{for $v > 2$}
\…

\begin{align*}
  \E \left(\frac{1}{\chi^2_{v}} \right) = \frac{1}{v-2} \qt{for $v > 2$}
\end{align*}

shows that

E(1i=1n(yiyˉ)2)=1σ2+τ2E(1χN12)=1(N3)(σ2+τ2).\begin{align*} \E \left(\frac{1}{\sum_{i=1}^n (y_i - \bar{y})^2} \right) = \frac{1}{\sigma^2 + \tau^2} \E \left(\frac{1}{\chi^2_{N-1}} \right) = \frac{1}{(N-3)(\sigma^2 + \tau^2)}. \end{align*}

So we use

estimate of 1σ2+τ2=N3i=1n(yiyˉ)2.\begin{align} \text{estimate of } \frac{1}{\sigma^2 + \tau^2} = \frac{N-3}{\sum_{i=1}^n (y_i - \bar{y})^2}. \end{align}

The estimate of τ2\tau^2 implied by (12) can be nonpositive (depending on the data and σ2\sigma^2). We shall ignore this however.

The estimates of θi\theta_i are therefore given by:

θ^iJS:=yˉ+(1(N3)σ2i=1n(yiyˉ)2)(yiyˉ).\begin{align*} \hat{\theta}_i^{\text{JS}} := \bar{y} + \left(1 - \frac{(N-3) \sigma^2}{\sum_{i=1}^n (y_i - \bar{y})^2} \right) \left(y_i - \bar{y} \right). \end{align*}

This is the James-Stein estimator.

The James-Stein estimator can thus be seen as an empirical Bayes procedure. The James-Stein estimator has the following remarkable frequentist property: its risk is unformly better than the naive estimator θ^i,naive=yi\hat{\theta}_{i, \text{naive}} = y_i in mean squared error for all values of θ1,,θN\theta_1, \dots, \theta_N (this is true for N3N \geq 3; although we only defined it for N>3N > 3, one can create a more naive version of James-Stein replacing yˉ\bar{y} by any fixed constant like 0 and this would work also for N=3N = 3). This property is beyond the scope of this class (and also irrelevant to us as it is a frequentist property).

Let us now present the full Bayes estimate. This requires specification of (uninformative) priors of μ\mu and τ\tau. For μ\mu, we use the uniform(,)\text{uniform}(-\infty, \infty) prior. For τ\tau, we use the fact that the marginal distribution of the data given μ,τ\mu, \tau is yiμ,τi.i.dN(μ,σ2+τ2)y_i \mid \mu, \tau \overset{\text{i.i.d}}{\sim} N(\mu, \sigma^2 + \tau^2), which is in terms of the parameter γ2:=σ2+τ2\gamma^2 := \sigma^2 + \tau^2. For the N(μ,γ2)N(\mu, \gamma^2) model, the standard uninformative prior for γ\gamma is

logγuniform(,)   or   fγ(γ)I{γ>0}γ.\begin{align*} \log \gamma \sim \text{uniform}(-\infty, \infty) ~~ \text{ or } ~~ f_{\gamma}(\gamma) \propto \frac{I\{\gamma > 0\}}{\gamma}. \end{align*}

Now we have the additional information that γ>σ\gamma > \sigma (because γ2=σ2+τ2\gamma^2 = \sigma^2 + \tau^2), so it is natural to modify the above prior as:

fγ(γ)I{γ>σ}γ.\begin{align*} f_{\gamma}(\gamma) \propto \frac{I\{\gamma > \sigma\}}{\gamma}. \end{align*}

It is easy to check that this prior on γ\gamma leads to the following prior on τ=γ2σ2\tau = \sqrt{\gamma^2 - \sigma^2}:

fτ(τ)τσ2+τ2I{τ>0}.\begin{align*} f_{\tau}(\tau) \propto \frac{\tau}{\sigma^2 + \tau^2} I\{\tau > 0\}. \end{align*}

To recap, we are using the following prior for μ\mu and τ\tau:

μuniform(,)   and   fτ(τ)τσ2+τ2I{τ>0}.\begin{align*} \mu \sim \text{uniform}(-\infty, \infty) ~~ \text{ and } ~~ f_{\tau}(\tau) \propto \frac{\tau}{\sigma^2 + \tau^2} I\{\tau > 0\}. \end{align*}

We will combine this with the likelihood yiμ,τi.i.dN(μ,σ2+τ2)y_i \mid \mu, \tau \overset{\text{i.i.d}}{\sim} N(\mu, \sigma^2 + \tau^2) to obtain the posterior of μ,τ\mu, \tau. It can be checked that:

μγ,y1,,yNN(yˉ,γ2/N) and fγy1,,yN(γ)γNexp(i=1N(yiyˉ)22γ2)I{γ>σ}.\begin{align*} \mu \mid \gamma, y_1, \dots, y_N \sim N\left(\bar{y}, \gamma^2/N \right) \text{ and } f_{\gamma \mid y_1, \dots, y_N}(\gamma) \propto \gamma^{-N} \exp \left(-\frac{\sum_{i=1}^N (y_i - \bar{y})^2}{2 \gamma^2} \right) I\{\gamma > \sigma\}. \end{align*}

You will see how to compute these posteriors given data in the next lab.