Consider the problem of estimating a scale parameter σ from observations X1,…,Xn∼i.i.dN(0,σ2). We want to use an uniformative prior for σ. There are two options. Option A is σ∼uniform(0,+∞) (or uniform(0,C) for a large C). Option B is logσ∼uniform(−∞,+∞) (or uniform(−C,C) for a large C). The second prior (option B) is universally preferred to the first prior (option A). Here are some reasons for why this is the case.
Relative Scale vs Absolute Scale: Consider the two probabilities P{1<σ<2} and P{101<σ<102}. Under the first prior (uniform(0,C)), both these probabilities are the same.
Under the second prior (logσ is uniform on (−C,C)), we have:
The second probability is therefore much smaller than the first. This reflects the fact that Prior B regards the values 101 and 102 as much closer to one another than the values 1 and 2. This behavior aligns well with intuition on a relative scale: moving from σ=1 to σ=2 corresponds to a doubling, whereas moving from σ=101 to σ=102 represents only a very small relative change.
Prior Odds: Under the first prior, consider the prior odds that σ<2 versus σ≥2:
which is very small (as C is large). This is clearly an informative statement about σ (that it is much more likely to be more than 2 than smaller than 2). On the other hand, the same odds for the second prior becomes:
These two posteriors will behave similarly when n is large. However, when n=1, the first posterior is actually ill-defined because the Gamma function term is Γ(0) which is ∞. In other words, the posterior is improper when n=1. On the other hand, the second posterior is proper even when n=1.
This is another reason for preferring the second prior in this problem. We would intuitively expect to obtain some concrete information on σ when n=1 so we want the posterior to be proper in that case. This is only true for the second prior but not for the first prior.
Now consider the problem of estimating both θ and σ from X1,…,Xn∼i.i.dN(θ,σ2). Here there are two options for the prior. Prior A is:
θ,σ are independent with θ∼unif(−∞,∞) and σ∼unif(0,∞).
Again the universally preferred prior is the second one. All of the reasons previously mentioned apply here as well. For the third point, the marginal posterior of σ corresponding to the first prior is:
In both the above formulae, S=∑i=1n(Xi−Xˉ)2. The first posterior above is ill-defined when n=1,2 while the second posterior is ill-defined only for n=1. It is well-known that in this problem we would need at least two observations to estimate σ, so we would like the posterior to be well-defined for n=2 which is only true if we use the second prior.
For more, see Zellner (1971, pages 41-47). See also Jaynes (2003, Section 12.4) for another justification for the logσ∼uniform(−∞,+∞) prior based on invariance to certain parameter transformations.