Class 44: What Causes Cancer Of The Albondigas?
This class is a must watch for all; or at least a must read. I've been showing you how badly cause is misidentified in sciences which use statistics. Today a logic puzzle for you.
You don't need to have watched or read any of the previous material to watch this.
Uncertainty & Probability Theory: The Logic of Science
Video
Links: YouTube * Twitter - X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Given below; see end of lecture.
Lecture
This is an excerpt from Chapter 7 of Uncertainty.
This section is relevant for all statistical and probability models which form the conceit that they have identified the cause of some data; the material is based on [an earlier paper of mine]. Suppose we learned that 1,000 people were "exposed" to PM2.5---which is to say, particulate matter 2.5 microns or smaller---at some zero or trace level, and that another group of the same size was exposed to high amounts. Call these two groups "low" and "high PM2.5". Suppose, too, it turns out 5 people in the low group developed cancer of the albondigas, and that 15 folks in the high group contracted the same dread disease. (If you don't love this example, substitute placebo versus drug or some other on-and-off, yes-or-no dichotomous state.)
What caused the observed difference in cancer rates? Some thing or things caused each unfortunate person in our experiment to develop cancer. What could this cause or these causes be? Notice I emphasize that there may be more than one cause present. It needn't be the same thing operating on each individual. Each of the 20 people may have had a different cause of their cancer; or each of the 20 may have had the same cause. And this is so even though it may be that cancer of the albondigas is caused in the human body in only one way. Suppose some particular bit of DNA needs to "break" for the cancer to develop, and that this DNA can only break because of the presence of some compound in just those individuals with a certain genetic structure. Then the cause or causes of the presence of this compound become our main question: how did it come to be in each of these people? That cause may be the same or different.
There is no proof in the data that high levels of PM2.5 cause cancer of the albondigas. If high levels did cause cancer, then why didn't every one of the 1,000 folks in the high group develop it? If high PM2.5 really is a cause---and recall we're supposing every individual in the high group had the same exposure---then it should have made each person sick. Unless it was prevented from doing so by some other thing or things; e.g. perhaps a counter-balancing cause operates that acts "oppositely" of PM2.5. High PM2.5 cannot be a complete cause: it may be necessary, but it cannot be sufficient. And it needn't be a cause at all. The data we have is perfectly consistent with some other thing or things, unmeasured by us, causing every case of cancer. And this is so even if all 1,000 individuals in the high group had cancer.
This always-or-nothing is true for every hypothesis; that is, every set of data. The proposed mechanism is either always an efficient cause, though it sometimes may be blocked or missing some "key" (other secondary causes or catalysts) or be counterposed by some other cause, or it is never a cause. There is no in-between. Always-or-never a cause is tautological, meaning there is no information added to the problem by saying the proposed mechanism might be a cause. From that we deduce a proposed cause, absent knowledge of essence, said or believed to be a cause based on some function of the data, is always a prejudice, conceit, or guess. Because our knowledge that the proposed cause only might be always (albeit possibly sometimes blocked) or never an efficient cause, and this is tautological, we cannot find a probability the proposed cause is a cause---conditioned only on that tautology, that is.
Consider also that the cause of the cancer could not have been high PM2.5 in the low group, because, of course, the 5 people there who developed cancer were not exposed to high PM2.5 as a possible cause. Therefore, their cause or causes must have been different if high PM2.5 is a cause. And even if PM2.5 is a cause, it is not necessary the only cause. The same cause that operated in the low group, or some other cause entirely, might have struck some or all of the afflicted in the high group. In other words, since we don't know if high PM2.5 is a cause, we cannot know whether whatever caused the cancers in the low group didn't also cause the cancers in the high group. Recall that there may have been as many as 20 different causes. We conclude that nothing in the plain observations is of any help in deciding what is or isn't a cause. That statement has tremendous importance when considering standard statistical procedures.
Given the multitude of possible measures we can make on actual people---everything from whatever they've eaten over the course of their life to the environments to which they have been exposed, and on and on almost (but never in reality) endlessly---it is more than reasonable to suppose that we can discover some thing which is also different between the two groups besides exposure levels. Suppose it turns out---and something like this almost surely will---every person in the high group ate at least one more banana than did folks in the low group. That means whatever conclusions we reach via some statistical analysis, we could have equally well put down to having eaten more bananas. This is because the label "low PM2.5" and "high PM2.5" can be swapped for "low banana" and "high banana", a set of measurements just as true and valid. Call this the banana test.
Clearly, there was some thing or some things different between the two groups. There must have been, because the number of people who got cancer was different, and the difference was caused, as must be true. But there is absolutely nothing in the observations alone that tell us what this cause was or what these causes were. We are not just discussing PM2.5. The criticisms here apply to every classical statistical analysis ever done.
Yet there is plausible suspicion that PM2.5 and not bananas might cause disease. We know this because we suspect it is in the nature of fine particulate matter to interact with, and possibly interfere with, the functioning of the lungs, the nature of which we also have some grasp. We do not know just based on the raw data---and never forgot that we can only know what is true: though we can believe anything---that PM2.5 causes cancer. A reasonable condition, given what we have learned from other dose-response relationships, is that greater exposure to PM2.5 will give more opportunity for whatever it is in PM2.5 that causes cancer to operate. But we don't have that in this experiment. So we can only assume PM2.5 is a cause and make verifiable predictions to test this assumption.
Notice that in this approach we must assume that (high) PM2.5 is always a cause but that sometimes it is stopped from operating because of some lack: say, a person has to have a specific genetic code, or must inhale the dust only when breathing is labored, or some chemical must be present, or whatever---the exact conditions may be exceedingly complex. As we saw above, the only other assumption is that PM2.5 is not a cause, and if it is not, then we must not use a probability model supposing PM2.5 is a cause.
This implies the following curious result. Probability models aren't what you might have thought. If we assume PM2.5 is a cause, then we must conclude that it is sometimes blocked, else all 1,000 in the high group would have become ill. And recall that if we assume PM2.5 is a cause, it necessarily implies there is at least one other cause, a cause which must exist to account for the illnesses in the low group. Saying PM2.5 is a cause thus creates a mystery: what is this other cause (or causes)? But it also means that the probability model in the high group is not a model of cause: it is a model of blocking. The probability models doesn't say, not really, "This person has a this-or-that chance of developing illness if exposed to PM2.5", rather, "The chance the causal effect of PM2.5 is blocked is this-and-such." And even that pronouncement is still conditional on believing the other cause or causes besides PM2.5 don't operate in the presence of PM2.5, and where is the evidence for that? There is none. Probability models always belie uncertainty. They are never proof of cause, which is why automated attempts to "prove" cause in large collections of data, must fail. Uncertainty always lingers unless there is knowledge of power and essence. Probability models themselves are explored in depth next Chapter.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.



Cancelled indeed, William! That’s what happens when you speak the truth. What a phenomenal lecture—I absolutely love this stuff! Why? Because this is exactly what I’ve been pounding the table about for the last 20 years.
Just a while back at dinner with friends, this very topic came up. Someone casually mentioned, “Oh, that Moderna vaccine has 95% efficacy,” and I couldn’t help but call bullshit! It’s precisely what you’ve laid out here. Those studies don't prove anything. It's crap! They leave out a googelplex of variables. When I look at these vaccine studies I just start laughing. I mean really do they expect anyone with a working brain cell to buy this crap?
Keep up the fight, William. This work is crucial, and the world needs more of it!
Variables are the killer in RCTs. Each human being is unique in biochemical makeup, genetic inheritance, miasmatic inheritance, even just blood type. There is no such thing as one size fits all.
"One man's meat is another man's poison".