STAT 238 - Bayesian Statistics Lab Two

Uniform Prior on $\log \sigma$ or $\sigma$ ¶

Consider the problem of estimating a scale parameter $\sigma$ from observations $X_1, \dots, X_n \overset{\text{i.i.d}}{\sim} N(0, \sigma^2)$ . We want to use an uniformative prior for $\sigma$ . There are two options. Option A is $\sigma \sim \text{uniform}(0, +\infty)$ (or $\text{uniform}(0, C)$ for a large $C$ ). Option B is $\log \sigma \sim \text{uniform}(-\infty, +\infty)$ (or $\text{uniform}(-C, C)$ for a large $C$ ). The second prior (option B) is universally preferred to the first prior (option A). Here are some reasons for why this is the case.

Relative Scale vs Absolute Scale: Consider the two probabilities $\P\{1 < \sigma < 2\}$ and $\P\{101 < \sigma < 102\}$ . Under the first prior (uniform $(0, C)$ ), both these probabilities are the same.
Under the second prior ( $\log \sigma$ is uniform on $(-C, C)$ ), we have:
$\begin{align*} \P\{1 < \sigma < 2\} = \P\{0 < \log \sigma < \log 2\} = \frac{\log 2}{2C} = \frac{0.693}{2C} \end{align*}$
(1)
and
$\begin{align*} \P\{101 < \sigma < 102\} = \P\{\log 101 < \log \sigma < \log 102\} = \frac{\log(102/101)}{2C} = \frac{0.00985}{2C} \end{align*}$
(2)
The second probability is therefore much smaller than the first. This reflects the fact that Prior B regards the values 101 and 102 as much closer to one another than the values 1 and 2. This behavior aligns well with intuition on a relative scale: moving from $\sigma = 1$ to $\sigma = 2$ corresponds to a doubling, whereas moving from $\sigma = 101$ to $\sigma = 102$ represents only a very small relative change.
Prior Odds: Under the first prior, consider the prior odds that $\sigma < 2$ versus $\sigma \geq 2$ :
$\begin{align*} \frac{\P\{\sigma < 2\}}{\P\{\sigma \geq 2\}} = \frac{2/C}{(C-2)/C} = \frac{2}{C - 2} \end{align*}$
(3)
which is very small (as $C$ is large). This is clearly an informative statement about $\sigma$ (that it is much more likely to be more than 2 than smaller than 2). On the other hand, the same odds for the second prior becomes:
$\begin{align*} \frac{\P\{\sigma < 2\}}{\P\{\sigma \geq 2\}} = \frac{\P\{\log \sigma < \log 2\}}{\P\{\log \sigma \geq \log 2\}} = \frac{C + \log 2}{C - \log 2} \end{align*}$
(4)
which is approximately equal to 1, reflecting prior ignorance between the events $\sigma < 2$ and $\sigma \geq 2$ .
Posterior Propriety for $n = 1$ : The posterior for $\sigma$ corresponding to the first prior is:
$\begin{align} \frac{2 \left(\frac{S}{2} \right)^{(n-1)/2}}{\Gamma\left(\frac{n-1}{2} \right)} \sigma^{-n} \exp \left(-\frac{S}{2 \sigma^2} \right) I\{\sigma > 0\} \end{align}$
(5)
while the posterior corresponding to the second prior is:
$\begin{align} \frac{2 \left(\frac{S}{2} \right)^{n/2}}{\Gamma\left(\frac{n}{2} \right)} \sigma^{-(n+1)} \exp \left(-\frac{S}{2 \sigma^2} \right) I\{\sigma > 0\}. \end{align}$
(6)
In both these formulae, $S := \sum_{i=1}^n X_i^2$ .
These two posteriors will behave similarly when $n$ is large. However, when $n = 1$ , the first posterior is actually ill-defined because the Gamma function term is $\Gamma(0)$ which is $\infty$ . In other words, the posterior is improper when $n = 1$ . On the other hand, the second posterior is proper even when $n = 1$ .
This is another reason for preferring the second prior in this problem. We would intuitively expect to obtain some concrete information on $\sigma$ when $n = 1$ so we want the posterior to be proper in that case. This is only true for the second prior but not for the first prior.

Now consider the problem of estimating both $\theta$ and $\sigma$ from $X_1, \dots, X_n \overset{\text{i.i.d}}{\sim} N(\theta, \sigma^2)$ . Here there are two options for the prior. Prior A is:

\begin{align*} \theta, \sigma \text{ are independent with } \theta \sim \text{unif}(-\infty, \infty) \text{ and } \sigma \sim \text{unif}(0, \infty). \end{align*}

(7)

Prior B is

\begin{align*} \theta, \log \sigma \overset{\text{i.i.d}}{\sim} \text{unif}(-\infty, \infty). \end{align*}

(8)

Again the universally preferred prior is the second one. All of the reasons previously mentioned apply here as well. For the third point, the marginal posterior of $\sigma$ corresponding to the first prior is:

\begin{align} \frac{2 \left(\frac{S}{2} \right)^{(n-2)/2}}{\Gamma\left(\frac{n-2}{2} \right)} \sigma^{-(n-1)} \exp \left(-\frac{S}{2 \sigma^2} \right) I\{\sigma > 0\} \end{align}

(9)

and the marginal posterior for $\sigma$ corresponding to the second prior is:

\begin{align} \frac{2 \left(\frac{S}{2} \right)^{(n-1)/2}}{\Gamma\left(\frac{n-1}{2} \right)} \sigma^{-n} \exp \left(-\frac{S}{2 \sigma^2} \right) I\{\sigma > 0\}. \end{align}

(10)

In both the above formulae, $S = \sum_{i=1}^n (X_i - \bar{X})^2$ . The first posterior above is ill-defined when $n = 1, 2$ while the second posterior is ill-defined only for $n = 1$ . It is well-known that in this problem we would need at least two observations to estimate $\sigma$ , so we would like the posterior to be well-defined for $n = 2$ which is only true if we use the second prior.

For more, see Zellner (1971, pages 41-47). See also Jaynes (2003, Section 12.4) for another justification for the $\log \sigma \sim \text{uniform}(-\infty, +\infty)$ prior based on invariance to certain parameter transformations.

References¶

Zellner, A. (1971). An Introduction to Bayesian Inference in Econometrics. Wiley.
Jaynes, E. T. (2003). Probability Theory: The Logic of Science. Cambridge University Press.

STAT 238 - Bayesian Statistics Lab Two

Uniform Prior on log⁡σ\log \sigmalogσ or σ\sigmaσ¶

Uniform Prior on $\log \sigma$ or $\sigma$ ¶