Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

STAT 238 - Bayesian Statistics Lecture Seven

Spring 2026, UC Berkeley

Derivation of the Rules of Probability for Subjective Probability

In the last class, we sketched the derivation of the rules of probability from logical consistency without relying on any imagined frequency considerations for defining probability. The argument (taken from Jaynes) proceeded in the following way.

We denoted the “plausibility” of a proposition AA given some information in the form of proposition BB by (AB)(A \mid B). We assumed that these plausibilities take values in the set of real numbers (no restriction to be in the interval [0,1][0, 1]) and that a higher value of plausibility represents greater belief.

To deduce the usual product rule of probability, we assumed the existence of a continuous coordinate-wise increasing function FF of two real variables such that

(ABC)=F((BC),(ABC))(AB \mid C) = F \left((B \mid C), (A \mid BC) \right)

for all A,B,CA, B, C. This function can then be employed in two different sequences to calculate (ABCD)(ABC \mid D) for four propositions A,B,C,DA, B, C, D:

(ABCD)=F(F((CD),(BCD)),(ABCD))(ABCD)=F((CD),F((BCD),(ABCD))).\begin{align*} & (ABC \mid D) = F(F((C|D), (B|CD)), (A|BCD)) \\ & (ABC \mid D) = F((C|D), F((B|CD), (A|BCD))). \end{align*}

It is therefore natural to require that the function FF should be such that the right hand sides of the above two equations produce the same answer. From this, one gets the condition:

F(F(x,y),z)=F(x,F(y,z))  for all real numbers x,y,z.F(F(x, y), z) = F(x, F(y, z)) ~~\text{for all real numbers $x, y, z$}.

As proved in Jaynes, the only functions FF which satisfy the above equation are of the form

F(x,y)=w1(w(x)w(y))F(x, y) = w^{-1}(w(x) w(y))

for a positive continuous increasing function ww. Note that the function ww is not unique. For example, if ww satisfies (2), then w~(x):=(w(x))α\tilde{w}(x) := (w(x))^{\alpha} (which means w~1(u)=w1(u1/α)\tilde{w}^{-1}(u) = w^{-1}(u^{1/\alpha})) also satisfies (2) for every α>0\alpha > 0. It can be proved that there is uniqueness up to this power transformation.

From here, we derived the following:

  1. w(ABC)=w(AC)w(BAC)=w(BC)w(ABC)w(AB \mid C) = w(A \mid C) w(B \mid AC) = w(B \mid C) w(A \mid BC).

  2. w(AC)w(A \mid C) always lies between 0 and 1 with w(impossible)=0w(\text{impossible}) = 0 and w(certain)=1w(\text{certain}) = 1.

In other words, the function ww applied to the plausibilities leads to quantities which satisfy the first two rules of probability.

Next the goal is to derive the sum rule. Here we first assume that there exists a function S:[0,1][0,1]S: [0, 1] \rightarrow [0, 1] such that

w(AcC)=S(w(AC))w(A^c \mid C) = S(w(A \mid C))

for all AA and CC. Note that we are working with w(AC)w(A \mid C) instead of the raw plausibilities (AC)(A \mid C). This allows us to use the product rule which has already been derived. We can assume that SS is strictly decreasing, S(0)=1S(0) = 1, S(0)=1S(0) = 1 and that SS is continuous.

To set up the characterizing equation for S()S(\cdot), consider the setting of Figure 1. We are looking at AA and BB such that BcB^c is contained in AA (or equivalently, AcA^c is contained in BB).

Setting for deriving the Sum Rule

Figure 1:Setting for deriving the Sum Rule

In this setting, there are two different ways of calculating the plausibility of the proposition R=ABR = A B in terms of x:=w(A)x := w(A) and y:=w(B)y := w(B) and the function SS. Both these calculations use the product rule. The first method for calculating w(R)=w(AB)w(R) = w(AB) is:

w(AB)=w(A)w(BA)=w(A)S(w(BcA))=w(A)S(w(Bc)w(A))=w(A)S(S(w(B))w(A))=xS(S(y)x).\begin{align*} w(AB) &= w(A) w(B \mid A) \\ &= w(A) S \left(w(B^c \mid A) \right) \\ &= w(A) S \left(\frac{w(B^c)}{w(A)} \right) \\ &= w(A) S \left(\frac{S(w(B))}{w(A)} \right) = x S \left(\frac{S(y)}{x} \right). \end{align*}

The second method for calculating w(R)=w(AB)w(R) = w(A B) simply switches the roles of AA and BB in the first method:

w(AB)=w(B)w(AB)=w(B)S(w(AcB))=w(B)S(w(Ac)w(B))=w(B)S(S(w(A))w(B))=yS(S(x)y).\begin{align*} w(AB) &= w(B) w(A \mid B) \\ &= w(B) S \left(w(A^c \mid B) \right) \\ &= w(B) S \left(\frac{w(A^c)}{w(B)} \right) \\ &= w(B) S \left(\frac{S(w(A))}{w(B)} \right) = y S \left(\frac{S(x)}{y} \right). \end{align*}

It is therefore natural to assume that SS satisfies:

xS(S(y)x)=yS(S(x)y).x S \left(\frac{S(y)}{x} \right) =y S \left(\frac{S(x)}{y} \right).

Recall that here x=w(A)x = w(A) and y=w(B)y = w(B). The setting is such that xx and yy cannot be completely arbitrary. Indeed because BcAB^c \subseteq A, we must have

w(Bc)w(A)   or equivalently   S(y)x.w(B^c) \leq w(A) ~~ \text{ or equivalently } ~~ S(y) \leq x.

Our condition on SS is therefore

xS(S(y)x)=yS(S(x)y)  for all 0x,y1 with 0S(y)x.xS\left(\frac{S(y)}{x} \right) = y S \left(\frac{S(x)}{y} \right) ~~ \text{for all $0 \leq x, y \leq 1$ with $0 \leq S(y) \leq x$}.

It is now proved in Jaynes that the above condition implies that

S(x)=(1xα)1/α  for x[0,1]S(x) = (1 - x^{\alpha})^{1/\alpha} ~~\text{for $x \in [0, 1]$}

for some α>0\alpha > 0. This α\alpha is uniquely determined by SS.

We have thus proved that

w(AcC)=(1wα(AC))1/αw(A^c|C) = \left(1 - w^{\alpha}(A|C) \right)^{1/\alpha}

which is equivalent to

wα(AcC)=1wα(AC)   or   wα(AcC)+wα(AC)=1.w^{\alpha}(A^c|C) = 1 - w^{\alpha}(A|C) ~~ \text{ or } ~~ w^{\alpha}(A^c|C) + w^{\alpha}(A|C) = 1.

It can now be noted that the first two rules that are satisfied by w(AB)w(A|B) are also satisfied by wα(AB)w^{\alpha}(A|B). Thus wα(AB)w^{\alpha}(A|B) satisfies all the three rules

  1. 0wα(AB)10 \leq w^{\alpha}(A|B) \leq 1, wα(impossible)=0w^{\alpha}(\text{impossible}) = 0, and wα(certain)=1w^{\alpha}(\text{certain}) = 1,

  2. wα(ABC)=wα(AC)wα(BAC)=wα(BC)wα(ABC)w^{\alpha}(AB|C) = w^{\alpha}(A|C) w^{\alpha}(B|AC) = w^{\alpha}(B|C) w^{\alpha}(A|BC), and

  3. wα(AcC)+wα(AC)=1w^{\alpha}(A^c|C) + w^{\alpha}(A|C) = 1.

Denoting wαw^{\alpha} by P\P, we have

  1. 0P(AB)10 \leq \P(A|B) \leq 1, P(impossible)=0\P(\text{impossible}) = 0, and P(certain)=1\P(\text{certain}) = 1,

  2. Product Rule: P(ABC)=P(AC)P(BAC)=P(BC)P(ABC)\P(AB|C) = \P(A|C) \P(B|AC) = \P(B|C) \P(A|BC), and

  3. Sum Rule: P(AcC)+P(AC)=1\P(A^c|C) + \P(A|C) = 1.

We have thus derived the rules of probability without invoking any relationship between probability and long-run frequency.

To summarize: If we argue in terms of plausibilities but we take care to ensure some natural constraints for logical consistency, then we cannot manipulate the plausibilities arbitrarily but have to reason according to the usual rules of probability after an appropriate transformation (this transformation is given by the function wαw^{\alpha}).

The sum rule of probability is usually stated as

P(ABC)=P(AC)+P(BC)P(ABC)\P(A \cup B|C) = \P(A|C) + \P(B|C) - \P(A B|C)

where ABA \cup B denotes the proposition “at least one of AA and BB is true” (Jaynes uses the notation A+BA + B for ABA \cup B). (5) can be derived from the stated rules as

P(ABC)=1P((AB)cC)=1P(AcBcC)=1P(BcAcC)P(AcC)=1(1P(BAcC))P(AcC)=1P(AcC)+P(BAcC)P(AcC)=P(AC)+P(AcBC)=P(AC)+P(BC)P(AcBC)=P(AC)+P(BC)(1P(ABC))=P(AC)+P(BC)P(BC)P(ABC)=P(AC)+P(BC)P(ABC).\begin{align*} \P(A \cup B|C) &= 1 - \P((A\cup B)^c|C) \\ &= 1 - \P(A^c \cap B^c|C) \\ &= 1 - \P(B^c | A^cC) \P(A^c|C) \\ &= 1 - \left(1 - \P(B|A^cC) \right) \P(A^c|C) \\ &= 1 - \P(A^c|C) + \P(B|A^cC) \P(A^c|C) \\ &= \P(A|C) + \P(A^c B|C) \\ &= \P(A|C) + \P(B|C) \P(A^c | BC) \\ &= \P(A|C) + \P(B|C) (1 - \P(A | BC)) \\ &= \P(A|C) + \P(B|C) - \P(B|C) \P(A|BC) = \P(A|C) + \P(B|C) - \P(A B|C). \end{align*}