Video
Links:
YouTube
Twitter
Rumble Part 1
Rumble Part 2 (Rumble only allows files of < 2 GB)
HOMEWORK
1. Finish that last step in defining the value of w(A|B) when A is false given B.
2. (From Bretthorst) Can you expand Pr(C|A+B) like we did with Pr(A+B|C) to form the sum rule?
3. (From Bretthorst) Generalize the sum rule for Pr(A_1+A_2 + … + A_p|C).
Lecture
We’re doing Chapter 2, pages 24-35, of Jaynes in this lesson, plus a bit of Uncertainty starting in Chapter 4. Jaynes’s book, Chapters 1 – 3, can be downloaded here.
Class length was doubled. For good, even excellent, reasons.
We do something amazing today, which you won’t find done is nearly any other probability course, unless that course is based on either of these two books. We prove:
Probability can be a number;
That this number has a certain numerical form, which is bounded in [0,1];
That all probability is conditional;
That all probability is an extension, nay, completion of logic;
That probability, once the evidence and propositions are subjectively chosen, is like logic and entirely objective;
That the definition of probability is a measure of the certainty of a given proposition with respect to assumed evidence;
That therefore all other interpretations are lacking and in some way wrong.
We do all this starting with Cox’s almost trivial axioms, or, rather, desires. Wokepedia does a somewhat reasonable job covering this development shortly, though in a dull way that misses all subtleties. You must read Jaynes for the full proof. Jaynes, and then later authors missed at Wokepedia, take care to handle complaints about non-differentiability and so forth. Those conditions only matter when subtle infinities are involved anyway, which we do not find in actuality, a point often missed.
Cox is superior starting point to Kolmogorov, whom you meet in ordinary probability courses. The only problem with Kolmogorov’s “axioms”—where that word is used in the mathematical sense of assumption and not necessary truth like we use it—is that Kolmogorov assumes probability can be a real number. Cox proves this. Both men end at exactly the same point, but because of the Curse of Notation, something is lost in Kolmogorov.
It’s buried and difficult to see, but that all probability is conditional is tucked away in Kolmogorov. In the rush to “do math” almost everybody misses this. With Cox, we see it all from the beginning. Also in Cox, we learn about where probability can be applied. That’s amazing! Kolmogorov never says a word about what probability is for! To him, it’s all just math. This has resulted in great confusion about the interpretation and use of probability. Not so with Cox: we get it all.
I stress there is nothing, at all, wrong with Kolmogorov’s math; nor Cox’s. It’s what the math means that we care about. It is what we cared about from Day One.
Now we have proved that all reasoning rests on faith, on unprovable notions that we grasp using various forms of induction and intuition. We intuit the rules and forms of logic, which rest on induction. We know there are many things we can prove with logic, a host of necessary and local truths. Like
x,y,z are integers (premise 1)
x > y (premise 2)
y > z (premise 3)
what all these symbols/words mean (tacit premise 4)
————————————————————
x > z (proposition of interest)
The conclusion is true given these premises. Logic is only about the connection between premises and conclusion. It has nothing to say about where the assumptions and conclusion (proposition) of interest come from. For instance, I could have put “x < z” as the proposition of interest (below the bar). We deduce, given these premises, it is false. A local falsity.
Most arguments we entertain do not lead to (local or necessary) truths. Most are more like this:
Witness said, “I saw a bunch of guys come out the door” (premise 1)
What the words etc. mean (tacit premise 2)
————————————————————
There were 12 guys coming out the door
That proposition is not certain. We do not know exactly what “bunch” means precisely. You could add a tacit premise—and most who do so forget they have done so—and try to solve this. But, of course, there are many possible meanings of “bunch”. People do this adding-and-forgetting when you ask them “What is the chance it will rain tomorrow?” They might even give you a number, but not the premises they use. Which is to say, the model they used.
Well, we’ll come back to all that, in depth. For now, it’s clear that many arguments are uncertain. Hence we’re interested in uncertainty. What does it mean? Can it be quantified?
Here are Cox’s (Jaynes draws these out, as you’ll read) desideratum (quoting Cox’s book, with my emphasis):
1. “The probability of an inference on given evidence determines the probability of its contradictory on the same evidence.”
2. “The probability on given evidence that both of two inferences are true is determined by their separate probabilities, one on the given evidence, the other on this evidence with the additional assumption the first inference is true.”
We get that probability can be a number from deducing the consequences of these statements.
Let’s change notation! All we do is take the vertical arguments we’ve been writing and rotate them 90 degrees to the right. So we get, say,
x > z | x,y,z integers & …
With this one simple trick we have conquered probability! We’ll also follow the notation convention that lower case Latin letters late in the alphabet represent numbers; upper case Latin letter early in the alphabet represent propositions (Cox does not, however, follow this). Later, we’ll use Greek letters as unobservable parameters.
We could also write that last equation like this:
A | B.
Or we could write, say (this follows Jaynes), for some other argument:
AB | C,
where we use Boolean notation that “AB” means “A is true and B is true”. Notice very very carefully that “C” could be large. It could be pages long. It contains everything we are assuming.
And most importantly it does not contain anything else than is stated. We’ll come back to this when we discuss errors in interpretation.
How do we go about discerning whether AB is true given C? Well, we could start by asking if B|C is true. Then we could use what we learned about B to tell us about A, like this: A|BC (this is Cox’s second statement). Or, of course, we could reverse it, since the order shouldn’t matter. We start with A|C, then ask about B: B|AC.
Therefore, if probability is a function, then it would be of the form AB|C = F[B|C, A|BC] or AB|C = F[A|C, B|AC].
At this point, there should be questions in your mind about particularities. Which means you haven’t yet read Jaynes. Do so.
Now if all this works, it should work for any logical argument we care to give. Thus
ABC|D = F[BC|D, A|BCD] = F{ F[C|D, B|CD], A|BCD} = F{(C|D), F[(B|CD),(A|BCD)].
From this we deduce that probability, if it can be a number, is of the functional form:
F[F(x, y), z] = F[x, F(y, z)].
This is a functional equation, which, when solved, will produce the probability function (those who have had a lot of math will recognize this; those that haven’t will be lost, so I merely present the answer; I don’t have time to also teach calculus and functional equations). Now Jaynes gives the step-by-step proof for this in its differentiable form, showing where to find its generalization not assuming differentiability. In any case, the unique answer is this function:
w(AB|C) = w(A|BC)w(B|C) = w(B|AC)w(A|C).
Do you see!? It’s Bayes! Right there, step number one from the functional equation. We get Bayes on day one. Well, almost. Because all we know so far is that probability is a function that looks like this w(). We don’t yet know what values to give it, where w() lives, that is. Let’s do that next.
This function has to work on any logical argument. So suppose we know that A is true given C. Then
AB|C = B|C
and
A|BC = A|C
because adding any information to C (here, B), does not change what we know about A (and no cheating by Redditing and snickering and making C and B mutually contradictory).
Thus our function when we know A is true given C collapses to:
w(B|C) = w(A|C)w(B|C),
Which means w(A|C) = 1. Since we assumed A given C was true, certainty, then, is represented by the number 1.
We can do a similar thing for discovering what number represents falsity. I won’t hold you in suspense. It’s 0. Prove it, though. (And this is an open-book test, because the answer in right there in Jaynes.)
Thus we can dispense with w() and witness the birth of Probability Itself!
Pr(AB|C) = Pr(A|BC)Pr(B|C) = Pr(B|AC)Pr(A|C).
And that, my friends, is Bayes’s theorem in all its glory, deduced from the simple idea that probability can be a number. Ain’t that sweet?
There’s a bit more to this, to make sure we have all the numbers just right. Like in developing the Sum Rule (I do this proof in the video):
Pr(A + B|C) = Pr(A|C) + Pr(B|C) – Pr(AB|C),
where “A + B” is Boolean and means “A is true or B is true.” All the niggling details are in Jaynes. (He also writes, say, p(A|C) whereas I always write Pr(A|C) to emphasize the probability.)
We also get for free:
Pr(A|B) + Pr(not A|B) = 1.
Not for the last time I am going to warn you NEVER write “Pr(A)”. No such creature can exist. Only arguments like this exist: “Pr(A|B)”. There is no probability, just like there is no logic, without premises, or conditions. Which is to say, ALL probability and ALL logic is conditional. I will continue to shout like, like de Finetti did (like him, I have shouted myself hoarse, but it never seems to stick).
Everybody recognizes this in logic, but it’s forgotten in probability. Again, I blame the desire to rush to the math, à la Kolmogorov. Probability is not math! It’s logic. To which numbers can, sometimes, be applied.
Now my guess is only a scant handful of you will have read this far, and so most will have missed the most important qualification. If you have been reading carefully, you will have noticed I said probability can be a number. I nowhere said it must always be. I do not, and will not, say so. Cox does not say so, though Jaynes is less careful about it. Cox says (p. 29):
It does not follow that all probabilities can be estimated with the same precision. Some probabilities are well defined, others are ill defined and still others are scarcely defined at all except that they are limited, as all probabilities are, by the extremes of certainty and impossibility.
That choice of the word “impossibility” is not optimal. However, that, and the incorrect notion that probability is a measure of belief we shall return to.
Do the reading.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.
William isn't the Boolean "or" like this V and the "and" an upside-down V? Or perhaps you mean U from set theory?
HAHAHA.....to B or not to B. I heard that somewhere...hmm...wasn't it "is to B"?