STAT 238 - Bayesian Statistics Lecture Seven

Spring 2026, UC Berkeley

Derivation of the Rules of Probability for Subjective Probability¶

In the last class, we sketched the derivation of the rules of probability from logical consistency without relying on any imagined frequency considerations for defining probability. The argument (taken from Jaynes) proceeded in the following way.

We denoted the “plausibility” of a proposition $A$ given some information in the form of proposition $B$ by $(A \mid B)$ . We assumed that these plausibilities take values in the set of real numbers (no restriction to be in the interval $[0, 1]$ ) and that a higher value of plausibility represents greater belief.

To deduce the usual product rule of probability, we assumed the existence of a continuous coordinate-wise increasing function $F$ of two real variables such that

(AB \mid C) = F \left((B \mid C), (A \mid BC) \right)

for all $A, B, C$ . This function can then be employed in two different sequences to calculate $(ABC \mid D)$ for four propositions $A, B, C, D$ :

\begin{align*} & (ABC \mid D) = F(F((C|D), (B|CD)), (A|BCD)) \\ & (ABC \mid D) = F((C|D), F((B|CD), (A|BCD))). \end{align*}

(1)

It is therefore natural to require that the function $F$ should be such that the right hand sides of the above two equations produce the same answer. From this, one gets the condition:

F(F(x, y), z) = F(x, F(y, z)) ~~\text{for all real numbers $x, y, z$}.

As proved in Jaynes, the only functions $F$ which satisfy the above equation are of the form

F(x, y) = w^{-1}(w(x) w(y))

(2)

for a positive continuous increasing function $w$ . Note that the function $w$ is not unique. For example, if $w$ satisfies (2), then $\tilde{w}(x) := (w(x))^{\alpha}$ (which means $\tilde{w}^{-1}(u) = w^{-1}(u^{1/\alpha})$ ) also satisfies (2) for every $\alpha > 0$ . It can be proved that there is uniqueness up to this power transformation.

From here, we derived the following:

$w(AB \mid C) = w(A \mid C) w(B \mid AC) = w(B \mid C) w(A \mid BC)$ .
$w(A \mid C)$ always lies between 0 and 1 with $w(\text{impossible}) = 0$ and $w(\text{certain}) = 1$ .

In other words, the function $w$ applied to the plausibilities leads to quantities which satisfy the first two rules of probability.

Next the goal is to derive the sum rule. Here we first assume that there exists a function $S: [0, 1] \rightarrow [0, 1]$ such that

w(A^c \mid C) = S(w(A \mid C))

for all $A$ and $C$ . Note that we are working with $w(A \mid C)$ instead of the raw plausibilities $(A \mid C)$ . This allows us to use the product rule which has already been derived. We can assume that $S$ is strictly decreasing, $S(0) = 1$ , $S(0) = 1$ and that $S$ is continuous.

To set up the characterizing equation for $S(\cdot)$ , consider the setting of Figure 1. We are looking at $A$ and $B$ such that $B^c$ is contained in $A$ (or equivalently, $A^c$ is contained in $B$ ).

Figure 1:Setting for deriving the Sum Rule

In this setting, there are two different ways of calculating the plausibility of the proposition $R = A B$ in terms of $x := w(A)$ and $y := w(B)$ and the function $S$ . Both these calculations use the product rule. The first method for calculating $w(R) = w(AB)$ is:

\begin{align*} w(AB) &= w(A) w(B \mid A) \\ &= w(A) S \left(w(B^c \mid A) \right) \\ &= w(A) S \left(\frac{w(B^c)}{w(A)} \right) \\ &= w(A) S \left(\frac{S(w(B))}{w(A)} \right) = x S \left(\frac{S(y)}{x} \right). \end{align*}

(3)

The second method for calculating $w(R) = w(A B)$ simply switches the roles of $A$ and $B$ in the first method:

\begin{align*} w(AB) &= w(B) w(A \mid B) \\ &= w(B) S \left(w(A^c \mid B) \right) \\ &= w(B) S \left(\frac{w(A^c)}{w(B)} \right) \\ &= w(B) S \left(\frac{S(w(A))}{w(B)} \right) = y S \left(\frac{S(x)}{y} \right). \end{align*}

(4)

It is therefore natural to assume that $S$ satisfies:

x S \left(\frac{S(y)}{x} \right) =y S \left(\frac{S(x)}{y} \right).

Recall that here $x = w(A)$ and $y = w(B)$ . The setting is such that $x$ and $y$ cannot be completely arbitrary. Indeed because $B^c \subseteq A$ , we must have

w(B^c) \leq w(A) ~~ \text{ or equivalently } ~~ S(y) \leq x.

Our condition on $S$ is therefore

xS\left(\frac{S(y)}{x} \right) = y S \left(\frac{S(x)}{y} \right) ~~ \text{for all $0 \leq x, y \leq 1$ with $0 \leq S(y) \leq x$}.

It is now proved in Jaynes that the above condition implies that

S(x) = (1 - x^{\alpha})^{1/\alpha} ~~\text{for $x \in [0, 1]$}

for some $\alpha > 0$ . This $\alpha$ is uniquely determined by $S$ .

We have thus proved that

w(A^c|C) = \left(1 - w^{\alpha}(A|C) \right)^{1/\alpha}

which is equivalent to

w^{\alpha}(A^c|C) = 1 - w^{\alpha}(A|C) ~~ \text{ or } ~~ w^{\alpha}(A^c|C) + w^{\alpha}(A|C) = 1.

It can now be noted that the first two rules that are satisfied by $w(A|B)$ are also satisfied by $w^{\alpha}(A|B)$ . Thus $w^{\alpha}(A|B)$ satisfies all the three rules

$0 \leq w^{\alpha}(A|B) \leq 1$ , $w^{\alpha}(\text{impossible}) = 0$ , and $w^{\alpha}(\text{certain}) = 1$ ,
$w^{\alpha}(AB|C) = w^{\alpha}(A|C) w^{\alpha}(B|AC) = w^{\alpha}(B|C) w^{\alpha}(A|BC)$ , and
$w^{\alpha}(A^c|C) + w^{\alpha}(A|C) = 1$ .

Denoting $w^{\alpha}$ by $\P$ , we have

$0 \leq \P(A|B) \leq 1$ , $\P(\text{impossible}) = 0$ , and $\P(\text{certain}) = 1$ ,
Product Rule: $\P(AB|C) = \P(A|C) \P(B|AC) = \P(B|C) \P(A|BC)$ , and
Sum Rule: $\P(A^c|C) + \P(A|C) = 1$ .

We have thus derived the rules of probability without invoking any relationship between probability and long-run frequency.

To summarize: If we argue in terms of plausibilities but we take care to ensure some natural constraints for logical consistency, then we cannot manipulate the plausibilities arbitrarily but have to reason according to the usual rules of probability after an appropriate transformation (this transformation is given by the function $w^{\alpha}$ ).

The sum rule of probability is usually stated as

\P(A \cup B|C) = \P(A|C) + \P(B|C) - \P(A B|C)

(5)

where $A \cup B$ denotes the proposition “at least one of $A$ and $B$ is true” (Jaynes uses the notation $A + B$ for $A \cup B$ ). (5) can be derived from the stated rules as

\begin{align*} \P(A \cup B|C) &= 1 - \P((A\cup B)^c|C) \\ &= 1 - \P(A^c \cap B^c|C) \\ &= 1 - \P(B^c | A^cC) \P(A^c|C) \\ &= 1 - \left(1 - \P(B|A^cC) \right) \P(A^c|C) \\ &= 1 - \P(A^c|C) + \P(B|A^cC) \P(A^c|C) \\ &= \P(A|C) + \P(A^c B|C) \\ &= \P(A|C) + \P(B|C) \P(A^c | BC) \\ &= \P(A|C) + \P(B|C) (1 - \P(A | BC)) \\ &= \P(A|C) + \P(B|C) - \P(B|C) \P(A|BC) = \P(A|C) + \P(B|C) - \P(A B|C). \end{align*}

(6)