Video
Thanks to an anonymous donor for the gift of the new equipment, which has resulted in a New & Improved! picture and sound. Thanks, too, to Matthew Bushey of Milwuakee who showed me how to use it all. You will see that I still have much to learn. I put the walls up mere moments before the video was shot, and are a work in progress.
Links:
YouTube
Twitter LECTURE IS NOT ON TWITTER, BECAUSE I AM ONCE AGAIN IN TWITTER JAIL—FOR A WEEK; FIFTH TIME SINCE MUSK TOOK OVER, THIS TIME FOR TEASING JUSTIN TRUDEAU. SO IF SOME KINDLY SOUL COULD POST THIS ON TWITTER, I WOULD APPRECIATE IT.
HOMEWORK: Continue reading Chapter 2 of Jaynes, up to where we quit today.
Announcement: NO CLASS ON MEMORIAL DAY.
Lecture
I’m starting to get some good questions. Some of them by folks who have clearly had training in probability and physics before.
These people will have the most difficult time following this Class.
For the very excellent reason that we, all of us, when confronted with new information seek to put it into buckets in our mind, if you will, buckets which we have formed over many years. This is entirely natural, and even helpful.
Unless those buckets are the wrong shape. Which if you have had training about “random variables”, “p-values”, and the like, are, I insist, wrongly shaped. Not always, and not always badly, but to some extent.
If I thought in the same way as classical probabilists—-whether they call themselves frequentists, Bayesians, information theorists, or physicists—I would not have done this Class. There would be no need. But I do not. I, of course, think treating probability, information, and cause in Old Way is the right way.
Where by Old, I mean ancient. We need to purge materialistic ideas and restore cause in its proper sense. Which means, at the least, separating the nature of things from our knowledge of the nature of things. These two are not the same thing! But, I claim, the current ways of thinking about probability mix them up and conflate them. Which isn’t even recognized because, well, philosophy is scarcely thought of at all.
Ontology is not epistemology. Logic is a matter of epistemology. And probability, being the completion of logic, is, too. We would never claim logic inheres in objects or processes (like “Nature”), but for some reason we often say probability does. Which it cannot, as we’ll see.
So if you have had training in these areas before in the classical way, pause for a moment to see what we have done so far. Your instincts will be to try and jump ahead and see where everything fits with what you already know. Try not to do this, as difficult as that is.
We have done so far, and only what we have done, is this:
Posed some logical questions: could logic handle uncertainty?
Demonstrated the crucial differences between local and necessary truths;
That having a philosophy is inescapable, and that belief was an act, a choice; and faith a necessity;
That logic was a mix of subjectivity—picking premises and proposition of interest—and objectivity—rigorously showing the connection between the premises and POIs. That logic was only about those connections; that logic was therefore a matter of the mind and not things;
That induction and intuition were of at least five different kinds, and that induction provides our most certain knowledge (induction provides us with the rules of logic, for instance, as do axioms, for which there is no and can be no empirical proof); and that there was no escaping faith (at least that your senses were working properly at times);
That probability could be represented as a mathematical function, and we discovered the form of that function (Bayes’s Theorem); that certainty was given by the number 1, and falsity by the number 0.
That’s it. That is all we have done. Nothing more. Well, it’s a lot, and much of it subtle and sweeping in consequence. But we haven’t even demonstrated that probability can be represented by numbers other than 0 or 1! We do that next time.
This means there is no way, given what we have learned so far, to put probability into the classical buckets you might already know. All I can do is beg you will be patient. We need to build all this slowly so that you are convinced in the future.
Today, all we do is answer the questions posed in the first lecture, using what we have deduced so far. (This is right from Jaynes.)
Recall we began with two logical equations:
If A then B
A
—————–
B
Which reads: We assume subjectively that if A is true, then B is, too. Then we subjectively decide that A is true. Then we subjectively form our proposition of interest, B. What does logic objectively tell us about the connection between this POT and our premises?
That B is true. Which can be put in terms of the function we deduced last time:
Pr(AB|C) = Pr(A|BC)Pr(B|C) = Pr(B|AC)Pr(A|C).
For ease of notation, let’s let C = “If A is true then B is true”. Obviously, we are interested here in Pr(B|AC) (if you can’t see this, stop until you do!). We so some manipulation:
Pr(B|AC) = Pr(AB|C)/Pr(A|C).
Obviously, using the rules of logic, Pr(AB|C) = Pr(A|C), and so we have:
Pr(B|AC) = Pr(A|C)/Pr(A|C) = 1.
Or B is true given A and C. A logical deduction.
We also started in Lesson 1 with:
If A then B
not-B
—————–
not-A
We want to know about Pr(not-A | not-BC). Which we deduced last week was 1 – Pr(A|not-BC). If our mathematical function (Bayes) works, it must work with any propositions we throw at it. So:
Pr(A |not-BC) = Pr(A not-B|C)/Pr(not-B|C).
We know that Pr(not-B|C)=1, and that Pr(A not-B|C)=0, thus Pr(A|not-B C) = 0, which must mean Pr(not-A|not-BC) = 1. Thus Pr(A|not-BC) = 0.
So much for modus tollens and modus ponens, where there isn’t much surprise. Except to demonstrate these are probability judgments.
We also had, from Lesson 1:
If A then B
B
—————–
A more plausible
If the POI was A (and recall we always subjectively pick the POI!), then we have a “formal fallacy” in logical language. But note very carefully this fallacy does not say that A cannot be true; it only says that we cannot know A is true given these premises! This is where classical logical classes let us down, by forgetting to say what we can know about A given these premises.
Here we do not have A, but “A more plausible”. Which means more probable in our language.
Again,
Pr(A|BC) = Pr(A|C) Pr(B|AC)/Pr(B|C)
Now we know Pr(B|AC) = 1. But here’s the real innovation. Pay close attention. We also know that Pr(B|C) ≤ 1. We know that B is contingent from the setup. Because of C, we know that B is possible. Only that it is possible. We also know that probability is a number between 0 and 1—but what numbers we haven’t yet learned.
So we deduce that Pr(B|C) ≤ 1. That being so, logic tells us that Pr(A|C) is multiplied by a number greater to or equal to 1. Which means
Pr(A|BC) ≥ Pr(A|C).
And that means that the POI “A is more plausible” is true, given the argument we just made.
Brilliant! Logic is probability; probability is logic.
We had one last example:
If A then B
not-A
—————–
B less plausible
If the POI was just plan B, then we again have a formal fallacy. But that’s where logic stops, forgetting we still can deduce something about B. So:
Pr(B|not-A C) = Pr(B|C) Pr(not-A|BC) / Pr(not-A|C).
Now we have deduced already that Pr(A|BC) ≥ Pr(A|C), a fact we can now use for all time in all future equations. From it we can also deduce (think this through!) Pr(not-A|C) ≥ Pr(not-A|BC). So it follows that Pr(not-A|BC) / Pr(not-A|C) ≤ 1. And so
Pr(B|not-A C) ≤ Pr(B|C).
Wonderful stuff. Probability is logic; logic is probability.
If you think about this carefully we realize probability doesn’t have to be a number, but can be an inequality or even an interval. We explore that later.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.
“BECAUSE I AM ONCE AGAIN IN TWITTER JAIL”
Thank GOD his highness St. Elon is SAVING FREE SPEECH!
Now I hope he gives us all free rides on the Hype-R-Loop.
I'm late to the party and surely missing something obvious, but I have to ask, why is
`Pr(not-B|C)=1`
given C=if A is true B is true, what is the provability of B being false.
if I know nothing of A, how can I be sure B is false given C is 100% certain?