for a fixed σ>0. We will look at the case of unknown σ later. The frequentist estimator of θ is the MLE y. For Bayesian inference, we use the normal N(μ,τ2) prior on θ. The basic fact is:
θ∼N(μ,τ2) and y∣θ∼N(θ,σ2)⟹θ∣y∼N(1/σ2+1/τ2y/σ2+μ/τ2,1/σ2+1/τ21) and y∼N(μ,σ2+τ2).
which is a weighted linear combination of the prior mean μ and the data y. The weights are inversely proportion to the variances of the corresponding normal distributions. For a normal distribution, we use the term “precision” to denote the inverse of variance. Thus the weights of the linear combination above are proportional to the precisions of the prior and the likelihood.
Also note that the precision of the posterior equals the sum of the prior and likelihood precisions.
Note that when τ2→∞ and μ is a fixed constant, we have θ∣y∼N(y,σ2) (in this case, the posterior mean coincides with the MLE y). Operationally, the N(μ,τ2) prior with fixed μ and τ2=+∞ has the same behavior as the uniform(−∞,∞) prior. These are uninformative priors in this problem.
Often the data will consist of multiple numbers y1,…,yn with the likelihood being:
where yˉ:=(y1+⋯+yn)/n. This can be proved directly or use the fact that yˉ is sufficient for θ and the data can be reduced to the single number yˉ with likelihood N(θ,σ2/n).
If we fix μ to be some constant value (say μ=0) and τ2 to be very large, then the estimate of θi would be equal to yi.
Instead of fixing these values, we can treat μ and τ2 also as unknown parameters and attempt to estimate them from the observed data. Marginalizing θi, it is easy to see that
One can estimate μ and τ from this model by some estimates μ^ and τ^. θi is then estimated by E(θi∣yi,μ=μ^,τ=τ^). How to obtain μ^ and τ^? It is natural to take
The James-Stein estimator can thus be seen as an empirical Bayes procedure. The James-Stein estimator has the following remarkable frequentist property: its risk is unformly better than the naive estimator θ^i,naive=yi in mean squared error for all values of θ1,…,θN (this is true for N≥3; although we only defined it for N>3, one can create a more naive version of James-Stein replacing yˉ by any fixed constant like 0 and this would work also for N=3). This property is beyond the scope of this class (and also irrelevant to us as it is a frequentist property).
Let us now present the full Bayes estimate. This requires specification of (uninformative) priors of μ and τ. For μ, we use the uniform(−∞,∞) prior. For τ, we use the fact that the marginal distribution of the data given μ,τ is yi∣μ,τ∼i.i.dN(μ,σ2+τ2), which is in terms of the parameter γ2:=σ2+τ2. For the N(μ,γ2) model, the standard uninformative prior for γ is