Class 30: Hypothesis Testing Stinks I

Nov 25, 2024

Today the math, which you don’t strictly need if you can’t calculate, behind how to think about Research Shows headlines and papers. This will be an invaluable exercise. You must try this, even if you have no other statistical, mathematical, or scientific experience.

Happy Thanksgiving. No class on 2 December.

Jaynes’s book (first part): https://bayes.wustl.edu/etj/prob/book.pdf

Permanent class page: https://www.wmbriggs.com/class/

Uncertainty & Probability Theory: The Logic of Science

Link to all Classes. Jaynes’s book (first part):

Video

Links:

YouTube

Twitter

Rumble

Bitchute (often a day or so behind, for whatever reason)

Link to all Classes.

HOMEWORK: Read Jaynes, and see below how you will find a Research Shows paper and apply the methods you learned today.

Lecture

You and I, dear reader, have seen hundreds, and more than hundreds, of Research Shows headlines and papers. We have investigated these works in depth, and showed how to critique them in a rigorous fashion. But we need not always have the ability to that, but we can think, or have experience which contradicts, or even confirms, what these Research Shows claims show. Today, following Jaynes, we show you how to think about these things.

If you want to grasp the entire mathematical argument, you must read Jaynes 5.1 and 5.2. I do not detail it below. There is no point my repeating any of the material which he covered, and said better. I go over the highlights in the lecture, and add amplifications, but there’s a lot I left out, too.

Here are also clues to what is coming. What we did today could be considered a form of hypothesis testing, only it wasn’t. A hypothesis test is a bizarre blend of probability and decision, where the decision is made for you, using criteria that may have no relevance for you, and which answers probability question no one ever asks. Except certain statisticians, who have confused the point of their studies. All of which we’ll come to in time.

Here Jaynes simply calculates the probability of a proposition given background information, and then again using that same background information and adding to it new information, in the form of certain claimed experimental evidence. That’s it. There’s nothing more. As I have told you many times, this is it. This is everything. Every lesson in this class is a variation of Pr(Y|X). That is the beauty of logical probability. No special apparatus is needed.

This is what hypothesis testing should be, but isn’t. It will turn out the real hypothesis testing is nothing like this, and is a bizarre strange peculiar procedure better termed mathematical magic.

What I want emphasize here is that the technique of adding if your own hypothesis to those provided by “Research Shows” headlines and researchers. We do not have to be limited to what is put in papers. We can judge evidence for ourselves. Which doesn’t mean, of course, that we’ll get it right.

If a Research Shows (RS) paper uses ordinary statistics (frequentist or Bayes), it will have a hypothesis it wants to tout. Call it H_r; ‘r’ for research. This is usually “tested” against a simple or trivial hypothesis, usually called a “null”. Often, and too often, this is a straw man. In the example of today it wasn’t: it was no skill in ESP, or no ESP ability. Call this H_n; ‘n’ for null.

We have ordinary Bayes theorem, using our background knowledge (or assumptions, etc.) X, and whatever new data we have in the RS paper. We can write

$Pr(H_r|DX) = Pr(H_r|X)\frac{Pr(H_r|X)Pr(D|H_rX)}{Pr(H_r|X)Pr(D|H_rX) + Pr(H_n|X)Pr(D|H_nX) + c}$

where c is usually 0. Make sure you say to yourself what each of these terms means. (I mean it. Do it.) Now, using ordinary hypothesis testing, you don’t get any of this. You should, but don’t. But you do get the implication from the typical RS paper that Pr(H_r|DX) is high, if not “certain”. Sadly, almost never is this calculation made, and we do have to infer it. Once we do, we can deduce Pr(H_n|DX) (since the two must sum to 1).

Jaynes’ point is that we do not have to accept the word of the RS researchers. We are free to add contrary hypothesis that also explain the data D. In the ESP example, we listed a bunch that all came down to various forms of cheating, bias, bad data, or mistakes. These all become our c:

$c = \sum_i Pr(H_i|DX) Pr(D|H_iX)$

This notation is a little misleading because we are also considering c, so when we do we have to adjust Pr(H_r|X) and Pr(H_n|X), so that Pr(H_f|X) + Pr(H_n|X) + sum(Pr(H_i|X)) = 1. And remember this is our X, not the researchers. That said, if c is large, if we are able to give good weight to alternate hypotheses beside or beyond those given by the researchers, then Pr(H_r|DX) will have a tough time being large. Which is to say, it will be small, and therefore H_r will be more difficult to believe.

Whether to act like H_r is true is entirely different from the probability we calculate for it. That depends on what you will do with it, or what somebody will do with it to you. Alas, ordinary hypothesis testing makes the decision for you, even if, for you, or for anybody, it is a lousy decision. These are matters we’ll leave for another day.

Finally our homework. I (and you will too) searched for “research shows” and this was the first study headline that came up: “New research shows younger and middle-aged adults have worse long COVID symptoms than older adults“. This points to the paper “Neurologic Manifestations of Long COVID Disproportionately Affect Young and Middle-Age Adults” in Annals of Neurology by Choudhury et al.

Their conclusion: “Younger and middle-age individuals are disproportionally affected by Neuro-PASC regardless of acute COVID-19 severity.” By “Neuro-PASC” they mean neurological post-acute sequelae covid symptoms. They said technical things like this (with my emphasis):

…10?months from COVID-19 onset, we found significant age-related differences in Neuro-PASC symptoms indicating lower prevalence, and therefore, symptom burden, in older individuals. Moreover, there were significant age-related differences in subjective impression of fatigue (median [interquartile range (IQR)] patient-reported outcomes measurement information system [PROMIS] score: younger 64 [57–69], middle-age 63 [57–68], older 60.5 [50.8–68.3]; p?=?0.04) and sleep disturbance (median [IQR] PROMIS score: younger 57 [51–63], middle-age 56 [53–63], older 54 [46.8–58]; p?=?0.002) in the NNP group, commensurate with higher impairment in quality of life (QoL) among younger patients.

Those “p”s in the parentheses are the results of ordinary hypothesis testing, which the ritual tells us, have to be smaller than the magic number of 0.05. Their conclusion said:

Younger and middle-age individuals are disproportionally affected by Neuro-PASC regardless of acute COVID-19 severity. Although older people more frequently have abnormal neurologic findings and comorbidities, younger and middle-age patients suffer from a higher burden of Neuro-PASC symptoms and cognitive dysfunction contributing to decreased QoL.

They ask us to believe, roughly, that the young or close to middle age people experienced subjective impression of fatigue because of “long covid”, and that old people, who, you will recall, had worse covid symptoms than the young, did not have as many. This is their H_r. Their null, H_n, is that there are no difference in these kinds of symptoms or impressions. Naturally, they believe Pr(H_r|DY) is very high, there Y is their background information about such matters.

My Pr(H_r|DX), however, is low. Mainly for two reasons. First, my Pr(H_r|X) was low. I don’t buy that “long covid” is real. In the sense that the symptoms are all over the place, inconsistent, and not really suffered by those who haven’t heard of it, or were skeptical of it. I do leave open the possibility I’m wrong, and that some long-term specific hard-measurable effects might exist in some people for whatever reason, as opposed to vague feelings and goofy scores on questionnaires (which try to quantify the unquantifiable).

Second, I also have a few alternate hypothesis that would explain this same kind of data, when these results are meant to be applicable to the public at large. One of which is the narrow range of people chosen for this study. I’m not confident the same results would be found if somebody else did the picking. I don’t mean fraud, I mean bias. For instance, some of the patients were from ” video-based telehealth visit[s].” Another: I trust very little quantified questionnaires for unquantifiable things. Another: they tested for a lot of things, and found only a little. Something is always going to be “correlated”, and verified with wee Ps, eventually.

The end result is that, for me, my Pr(H_r|DX) is not that different from my Pr(H_r|X). I started low and end low. Notice I do not have to quantify any of this. I could, and could use the math, but that’s only necessary if you’re ate up about it. Which I’m not. But if you are, then you go ahead and plug some numbers in.

Indeed, this is your homework. You too much find a Research Shows headline or paper and critique in a way sensible to you.

Which doesn’t mean sensible to anybody. This kind of analysis doesn’t make you right and the researchers wrong. It only explains how both sides view the evidence.

And that is what probability is! The expression of uncertainties given certain assumptions.

Happy Thanksgiving. Again, no class on 2 December.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE

Man of the Atom

Nov 25

"Mathematical magic."

Same as in most areas of Current Year Science:

"Mythematics"

Expand full comment

Paul Fischer

Nov 26Edited

Here's a few alternatives for "Research Shows":

• Evidence suggests

• Findings demonstrate

• Data reveal

• Analysis confirms

• Empirical evidence points to

• Scholarly work highlights

• Research suggests

• Investigations reveal

• Studies support the idea that

• The evidence highlights

• Observations suggest

• Studies have found

• Research backs up

• Findings point out

• The data show

• Studies Indicate

All of these will return equivelent "research" or "studies' loaded with the same garbage Briggs has shown above. I've personnally read hundreds of research papers and rare is the incidence of analysis that deviates from the above outline. What I have found is that papers written in the mid 1960's and prior were done more thoughtfully and were less inclined to depend on wee p's. Some of the best research in medicine occured prior to WWII.

9 more comments...

Science Is Not The Answer

Discussion about this post