Uncertainty & Probability Theory: The Logic of Science
Video
Links:
Bitchute (often a day or so behind, for whatever reason)
The Infamous Coin Flipping Machine!
HOMEWORK: We did Pr(M_6|E) = 1/6, and we did Pr(M_6|E + “fair”) = 1/6 (a circular argument!). Now give us Pr(M_6|E + “unfair”) = ?
Lecture
The homework used as evidence:
E = M_1 or M_2 or M_3 or … or M_n (n < infinity)
One and only one M_i must obtain
n = 6
Pr(M_6|E) = 1/6. Which we conclude from the statistical syllogism. But that syllogism is beloved everywhere, and so various proofs of it, operating from simpler premises, have been offered. Two such approaches, by Jaynes and Diaconis, I think are failures, because they end up being circular arguments. Stove is best, which we’ll do next time. This time we go over Jaynes and mention Diaconis.
Meanwhile, adding “fair”, as we’ll see in the excerpt below, does nothing for us. There are no such things as “fair” dies, or coins, or anything. Adding “fair” to a list of premises assumes what it sets out the prove; that is, that each outcome is equally likely. You do NOT get to add information to this E. Somebody commented that one side of the die may be shaved, or whatever. That applies to that die. This die does not have that information: that is, we do not have it. We cannot wily nily add to E. We take it as it is. If we do add information to E, then yes indeed, we change the probability.
Here, again, is the one single lesson of this entire class: change the evidence, change the probability.
All we have done so far mathematically, apart from the epistemology, is prove probability has a certain mathematical form (Bayes), and that we can give it two numbers: 0, for locally or conditionally false, and 1, for locally or conditionally true. We want, however, to see if other numbers work, too. Above, Pr(M_6|E) = 1/6. Which, again, we got from the statistical syllogism. Let’s now see how we try to prove the validity of the SS.
This is an excerpt from Chapter 4 of Uncertainty. All the references have been removed.
Fairness
Any premises about “fairness” are superfluous to probability, which is to say, to the epistemology of the situation, though they might be important to the ontology. Saying that a Metalunan die is “fair”, if it means anything to the epistemology, is no more than a restatement that each side is “equally likely”, a conclusion we had already reached with the proportional syllogism. That is, given the premise that a device is “fair” then the probability of equal outcomes is uniform; a circular definition. It is like saying, “Given the probability of X is 𝑝, the probability of X is p“, which is tautological.
But to the ontology, to call a Metalunan—or any Earthly—die “fair”, what else can it mean but to claim that each side is perfectly symmetric, even down to the quantum level (or whatever, if anything, is below that)? To call an object fair, symmetric, balanced, equally weighted or whatever is to say that no inspection would reveal any conceivable asymmetry. What a remarkable claim! This pristine state of proportionality, which I suppose might exist in some fanciful physics experiment of the future, is impossible in practice to verify. How do you know, except by great expense and effort, whether any device is symmetric across all its constituents? How can you ensure any die or coin toss is “symmetric” or “fair”? How can you even define what that means? How can a toss be “fair” except that it is designed to produce equal numbers of heads and tails, or equal numbers of sides, etc.? The answer is obvious; and contra others.
Now it is a separate question whether the manner in which any particular, necessarily physically real, device produces more or less uniform outcomes. There can be no real tosses of a Metalunan interocitor, but that does not stop us from learning its probabilities. But for a real device, we have to do a lot more thinking. To say a device is “fair” says nothing about the mechanism of how that device will register a state. In a real die toss, even if we claim the die itself is “fair”, i.e. perfectly symmetric, we have said nothing about how it will be tossed. These are ontological matters. There will be a gravitational field, perhaps varying. There will be air at a certain density, temperature, and moisture content through which the die flies. The die will leave some person’s hand, perhaps coated with traces of sweat and skin, with a certain spin and momentum; it will have begun its position in the hand in a certain orientation. It will hit the floor or table or whatever at a certain angle, and the floor itself will be more or less elastic and which will give some level of frictional resistance. And this does not exhaust the characteristics of the physical environment of the real toss. Indeed, the number of things which might influence the outcome are very large (but not infinite). Experience tells us that most of these things will have scant or negligible effect, but perhaps, for this toss, something happens which gives more weight to a previously unconsidered dimension. Who knows? Let’s have no more talk about tossing “fair” dice.
We can extend fair to include not only symmetries of the device but also to the environment where the device will be “activated”, but as you can now see, this is to say a lot. To the epistemology, nothing changes: we still have a circular definition of the probability. But to the ontology, it is everything. Perhaps, as in highly controlled experiments, we will have a lot of evidence about the physical set up. But often we do not, especially when investigating the behavior of people, which do not act as predictably as dice. It is boastful to say even of a simple coin or die toss that the environment is “fair.” But we do not have anything like that level of omniscience when it comes to people. Of course, experience over a great many actual dice tosses show us which environments produce uniform outcomes. Casinos rely on this! That experience feeds into our premises and is then used to deduce probabilities.
So what do we say about the chance this real object comes up this or that number? Well, that is the subject of modeling, which we will do later. A brief summary: we begin with whatever clear evidence (premises) we have, judging that some characteristics are important and others ignorable, and then move forward to either make predictions or to experimentation, and after experimentation we produce more predictions.
Details
SUBSTACK ONLY ALLOWS LATEX EXPRESSIONS IN SEPARATE BLOCKS, AND NOT INLINE (INSIDE A SENTENCE/PARAGRAPH), WHICH MAKES READING THIS HORRIBLE. I THEREFORE HAVE LEFT THE LaTeX CODE AS IS HERE. IF YOU WOULD LIKE TO READ THIS IN A BETTER FORMAT, SEE MY MAIN WEBSITE WMBRIGGS.COM.
The statistical syllogism cannot be escaped, and neither can the symmetry of individual constants from which the syllogism is derived. Yet some authors have attempted escapes. The most noteworthy are Jaynes, Diaconis, and Stove. All were interested in assigning equi-probability to events like die tosses. But since assigning equi-probability, or uniformity, has historically been seen as dogmatic, each author tried to derive the assignment of equi-probability from what they saw as different, less dogmatic premises. These attempts are ultimately failures, as I demonstrate below. Stove's come closest, and indeed has the answer hidden in his effort. This section is necessarily mathematical and could be skipped for those already convinced of the statistical syllogism's utility; though all should at least skim Stove's effort. Our first notion of "parameters" arises in these proofs, too.
The following arguments start with the definite knowledge $E$ that $M$ is contingent and can be decomposed into a finite number of possibilities (like sides in coin flips or states of interocitors, or whatever) $M_1, M_2,\dots,M_n$, $n<\infty$.
Jaynes gives a permutation argument in an attempt to deduce the statistical syllogism (he does not call it that), but which relies on an unacknowledged assumption. Introduce evidence $E$ which states that either $M_1$ or $M_2$ or etc. $M_n$ can be true, but that only one of them can be true. In the case where $M$ is a coin flip, the result can be either $M_1$="head" or $M_2$="tail". Thus, $\Pr(M_1\vee M_2\vee\dots\vee M_n|E)=\sum_{i=1}^n \Pr(M_i|E)=1$. At this point, there is no assertion that each of these probabilities is equal, only that the sum is 1. We want to assign the probabilities $\Pr(M_i|E)$ for $i=1\dots n$. The set of possibilities is $M={M_1,M_2,M_3,\dots M_n}$. Let $\pi$ be a permutation on the set ${1,2}$. Let $M'={M_{\pi(1)},M_{\pi(2)},M_3,\dots M_n}$. That is, the set $M$ and $M'$ are the same except the first two indexes have been swapped in $M'$. The evidence $E$ is fixed. Therefore, it must be that $\Pr(M_1|E)_M=\Pr(M_{\pi(2)}|E)_{M'}$ and $\Pr(M_2|E)_M=\Pr(M_{\pi(1)}|E)_{M'}$. Jaynes then makes a crucial step, which is to add to $E$ evidence which states that the total evidence is "indifferent" to $M_1$ and $M_2$, i.e.
if it [the evidence] says something about one, it says the same thing about the other, and so it contains nothing that would give [us] any reason to prefer one over the other (p. 39, emphasis mine).
Accepting this for the moment, $E$ then says that our state of knowledge about $M$ or $M'$ is equivalent, including the order of the indexes. Thus, (note the change in indexes) $\Pr(M_1|E)_M=\Pr(M_{\pi(1)}|E)_{M'}$, $\Pr(M_2|E)_M=\Pr(M_{\pi(2)}|E)_{M'}$ and $\Pr(M_j|E)_M=\Pr(M{j}|E)_{M'}, j=3,\dots,n$. Which implies $\Pr(M_1|E)_M = \Pr(M_2|E)_M$: that is to say, equi-probable or uniform prior assignment.
We seem to have proven equi-probability. And this argument is fine if what Jaynes says in the quotation holds. But we can see in it the presence of two tell-tale phrases, "indifferent" and "no reason", which are used, and are needed, to justify the final step. This is just begging the question all over again, for how else could the evidence $E$ be "indifferent"? It cannot mean non-probative or irrelevant. That is, Jaynes has assumed uniform probability (and thus, the statistical syllogism) as part of the evidence $E$, which is what he set out to prove.
De Finetti has a famous "exchangeability" theorem which states that if an "infinite series" of "variables" exists and the order in which the variables arise is not probative, then a "prior" probability of the states exists. The form of the prior is not given by the theorem; that is, how the probabilities are assigned is not stated by the theorem; we know only that it exists. Diaconis investigated finite exchangeability in an attempt to see how assignment might arise.
This argument is more mathematically complicated. De Finetti's theorem, which can be found in many places, states that in an infinite sequence of exchangeable 0-1 variables there is hidden, if you like, a formal (induced) representation as a probability model with a unique measure of the probability model's parameters. The key, of course, is that the sequence must be infinite. Diaconis, after showing that some finite exchangeable sequences fail to be represented as probability models with unique measures, goes on to offer a proof for certain other finite exchangeable sequences that do. The word "hidden" was apropos, for in exchangeability arises the concept of parameters (in parameterized probability models), a concept which relies on the existence of infinite sequences. I investigate this important topic in Chapter on probability models.
Here, I follow Diaconis (1977) as closely as possible, almost copying the theorem as it stands but using my notation; interested readers should consult the original if they desire the details, particularly since the original uses graphical notions which I ignore. Let $\mathcal{P}_n$ represent all probabilities on $M=\prod_{i=1}^n M_i$ where $M_i={0,1}, \forall i$, where $M$ is a finite ($n<\infty$) sequence of 0-1 variables. $\mathcal{P}_n$ may be thought of as the probability models on $M$: it may be written in coordinate form by $p=(p_0,p_1,\dots,p_{2^n-1})$ where $p_j$ represents the outcome $j$ where $0\le j< 2^n$ is the binary expansion of $j$ written with $n$ binary digits. Diaconis gives the example if $n=3, j=1$ refers to the point $001$. Let $M(m,n)$ be the set of $j$ with exactly $m$ ones. The number of elements in $M(m,n)$ is ${n \choose m}$: this much is true---the number of elements in $M(m,n)$ is ${n \choose m}$---regardless of what the actual probabilities of any outcomes are.
Now, let $\mathcal{E}_n$ be the exchangeable measures in $\mathcal{P}_n$: $\mathcal{E}_n$ will take the place of the measure on $\mathcal{P}_n$'s "parameters". The theorem is stated thus: $\mathcal{E}_n$ has $n+1$ points $e_0,e_1,\dots,e_n$, where $e_m$ is the measure putting mass $1/{n \choose m}$ at each of the coordinates $j\in M(m,n)$ and mass 0 at the other coordinates. (Uniqueness of each point in $\mathcal{E}_n$ is also covered, but not of interest here.) How is this theorem proved?
$e_n$ represents the measure of drawing $n$ balls without replacement from an urn with $n$ balls, $m$ of which are marked 1, and $n-m$ marked 0, so each $e_n$ is exchangeable. If $e_n$ can be written as a proper mixture of other exchangeable points, it has the form $e_n=pg_1+(1-p)g_0$, where $0<p<1$: also, $g_1, g_0$ must assign 0 probability to the outcomes which $e_n$ assigns 0 probability. But because of exchangeability of the coordinates $j\in M(m,n)$ $g_1$ and $g_0$ must be equal. And because the probability for any $j\in M(m,n)$ must sum to 1---and here is the big assumption used in the proof---the mass of each coordinate is $1/{n \choose m}$.
Clearly, the intuition that gave rise to these particular masses asserted in the proof came from the fact that the number of elements in $M(m,n)$ is $latex {n \choose m}$. However, other masses work too, as long as they sum to one and assign a probability of 0 to coordinates not in $M(m,n)$. For example, for $j\in M(m,n)$ assign $1/2m$ for the first $m$ coordinates and $1/(2({n \choose m}-m))$ to the remaining ${n \choose m}-m$ coordinates.
The reason that the $1/{n \choose m}$ mass was chosen is understandable, but there was no explicit reason for it (other than having the probabilities sum to 1) and the desire for symmetry and the equi-probable assignment. So again, the statistical syllogism/equi-probability is tacitly assumed.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.
Flashbacks
9 years of K-8
Think I missed the last one D: but it seems to me that P(M_6 | E + "unfair") we're asserting that the probabilities of each state are not evenly distributed, so we can only say that the probability of any state is equal to 1 minus the sum of probabilities of all the other states. I can't think of a way to go any further than that. Am I missing something?