**This post is really for those adept at math, a necessary avenue to understand this subject entirely. Most will want to skip the video. BUT DO READ THE LECTURE. IT'S SHORT!**

Jaynes’s book (first part): https://bayes.wustl.edu/etj/prob/book.pdf

Permanent class page: https://www.wmbriggs.com/class/

Uncertainty & Probability Theory: The Logic of Science

*Link to all Classes.* Jaynes’s book (first part):

**Video**

Links:

Bitchute (often a day or so behind, for whatever reason)

**HOMEWORK:**

**Lecture**

Many will want to skip today’s video. This lecture is primarily for math bros. I try to minimize math in this series, because as I repeatedly emphasize, probability is not math. It uses math, but so do you when using a credit card, and your spending needs are not math. Our prime focus is always the philosophy of Uncertainty.

However, we cannot do without math, especially as we’ll need it to criticize so-called mainstream procedures. You can get all the math right out of Jaynes, directly following upon last week’s reading, or in the step-by-step in the video lecture. There is no need for me to repeat here anything written better in his book (links above). So I will only add a few clarifications.

Not for the last time I’ll remind us that the way to get any problem onto the continuum, because there the math is easier, is to start finite, and discrete. Which is where *all* our measurements live. *Only* once we have put the problem in its finite, discrete form do we dare to go to the limit, and only go along the path dictated by the evidence we assume. Yes, indeed, the path you take to Infinity matters.

This is Jaynes’s specialty. I, in my mumble-mouthed way, try to show the nifty aspects of how these kinds of Trips To Infinity look like.

In ordinary stats books, they *start* with continuum-based models, with their infinitely sized parameters. As if they are real! All of these models are *ad hoc*. They are used with wild abandon. Nobody knows from whence they come. They are thus misused in terrible manner.

Here we slowly built our model, starting from simple premises, like we have a machine that will put out widgets, some of which will be bad. That’s it! Then, to take advantage of the easier math on the continuum, we let our sample go to the limit. This is the first approximation.

I repeat: this is our first *approximation*.

Then we did a Taylor series expansion around the first approximation, to further simplify the math. This becomes our second approximation.

I repeat: the normal distribution is an approximation to an approximation.

Let me say it one more time: the resulting normal distribution *is an approximation to an approximation*.

It’s not a bad one, either. Works well in many applications. It would be the mistake most grievous, and, alas, most common, to say that the “distribution of the fraction of bad widgets *is* normal.” Nothing *is* normal. If you say things like this, slap yourself. Stop.

Say instead, “the probability the fraction of bad widgets is between this and that is approximately P”. Substitute “this and that” for the limits that make sense to your problem. Calculate P using your *approximation to the approximation*.

Speak, instead of unobservable parameters, of Reality: speak of the measurable: speak of *observables.*

**Subscribe or donate to support this site and its wholly independent host using credit card click here****. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.**

I just finished teaching a 3-week "Bayesian Inference and Reasoning" course at the African Institute for Mathematical Sciences in Cape Town. The hardest step, as it has been elsewhere where I have taught such a course, was the transition from a discrete to a continuous set of hypotheses. Even knowing how difficult this step is, I needed three attempts to get it mostly right.

(Like your _Uncertainty_ book, the course is very much inspired by Jaynes's approach.)

William, just to clarify—how are we counting defective widgets in your scenario? If we check every single widget, we'd know the exact number, and there'd be no need for statistics at all. So, I assume we're sampling, right? Random sampling, it must be (Yoda). I know you and randomness don’t always get along, but if the sample isn’t random, how can we be sure it represents the entire population of widgets from that machine?

So, let's say we sample (random or not), and that gives us an initial prior. Now, we start updating the posterior. But how many times do we update? From what I gathered in your lecture, it seems there’s no real end to this process. Theoretically, don’t we just end up counting all the widgets anyway? If that’s the case, why even bother with the whole sampling and Bayesian process in the first place? Practically speaking, what’s the point if it leads us to count everything anyway?