The Difference In Means & Why P-Values Should Not Be Used
Excerpt From The Lake Michigan Dialogues
"Say, Briggs. Since you're Statistician to the Stars!, explain to me how I can tell if two means are different in a simple way that even I can understand."
You got two groups of some measurable observable? And measures from items in each group?
"Yes."
Cinch. Calculate the mean for both groups. Got them?
"Yes."
Are they the same?
"No."
Then you know they are different.
"No. I mean I want to know if they are really different."
Are they the same?
"They are not."
Then they are really different.
"Yeah, okay, ha ha. But how can I tell if they are truly not the same? Isn't there some kind of test?"
Sure. Look at them.
"I don't understand this at all."
I'm not sure how I could make it any simpler. If they're the same, they're the same; if they're different, they're different. End of story.
"Wait. This isn't what other statisticians tell me."
Ain't it? I'll be dogged. What do these other, classical statisticians tell you?
"Look. I want to know if the populations are different."
Aha. You don't want to know if these means are different. You want to know if the means in samples you haven't yet collected will also be different.
"Yes. I think."
And you want to know if they are different now because something caused them to be different?
"Yes."
I got you now. Yes, I understand. I follow you. No, sorry. Those other statisticians were right. They can't tell you these things, or don't. Best they can do is to calculate some bizarre number that tells you whether what you have already observed is different if you assume they are not different.
"Wait, what? They're different if we assume they are the same?"
That's it.
"Oh, wait. You mean the p-value."
Yes, utterly useless.
"Why?"
Doesn't answer any question anybody wants answered.
"They say it does."
And politicians say they want what is best for you. Those statisticians are just as wrong. The p-value can't tell you if the differences you saw have different causes. And it can't tell you the chance new samples might be different. It also can't tell you the chance they will be the same. It assumes they are the same.
"Then why do people calculate them?"
Why do people do anything? Magic, habit, custom, appeal to authority, ignorance of alternatives. Getting a wee p-value is like winning a science lottery. It is always a cause for celebration, but nobody knows why. It is silent on Reality. The p-value has no bearing on the two questions you want to know. At all.
"There has to be more to them than that. There wouldn't be so many smart people using them otherwise. And no smart ass remarks about smart people, please."
Well, think of it like this. If you have a fixed unchangeable unalterable determined set population, where every object in each group has a measurable number that will not change no matter what, then there are actual means for each group. Got that?
"Sure."
Okay, then if you take a sample, as you did, you can make a guess of what the remaining observations would be; and also what the means would be. Here's an example. You know there are 10 in each group---this is a fixed unchangable number: it will and must be 10 forevermore---and you sample 9 of each. The two means you calculate in your sample will likely be close to the actual means of all 10 each. Yes?
"Yes, I can see that."
You're just one number away from knowing the true actual means, right?
"That's so."
All right. Most of the time, in real numbers and real situations the resulting means of all the population won't be the same. Take ages of your 10 closest male and female relatives, for example. Yes?
"Not exactly the same. No. But they may be close."
We're not talking "close". We're talking the same or different. Close wins no prizes.
"I'm not following you."
Is it that hard? You have two fixed populations that can be measured, and their actual means, experience shows, likely won't be the same. Not precisely the same. That's easy enough, isn't it?
"I guess so."
All right. Now calculate the p-value on the 9 you observed in each group. That, experience also shows, will produce a non-wee p-value.
"Non wee?"
One bigger than the magic number. Do I have to tell you what this magic number is?
"No, I guess not."
Okay. That means the p-value, since it assumes the means are the same, forces you to say they are the same, even when we see they are different.
"Yeah, but sometimes means in these small samples can be equal."
Make those populations smaller still. And look, even better, we can specify them in advance. The population is 2 in each group, and there are no more. Ever. You observe only 1 in each. You observe the number 5 in group A and 10 in group B. The other numbers, which are not yet observed, are 5 again in group A and 9 in group B. But the p-value doesn't know or see these. The p-value, if you can even calculate is, and you might not be able to, will certainly be non-wee. We know the means are different. Yet the p-value insists they are the same. It is absurd.
"Yeah, okay, maybe. But your examples are tiny. What about large examples?"
It makes no difference. That's the point. All samples and all populations in real life are in actuality finite. They are fixed. The same criticism thus applies. It's worse, even, because as you increase the population or sample the chance of the means being equal with real numbers on real measurable things decreases fast. You see that, don't you?
"Maybe. They still night be the same. And don't p-value people say something different? I don't think they'd agree with you."
You're right, they wouldn't. Because they believe, and must believe, all populations are infinite, at least potentially. Then, being infinite, they are imbued with "real" or "true" means, which can never be known, but only "estimated." The p-value is making statements about these forever unobservable so-called true means.
"Wait. Now I'm remembering. That sounds more like what I was taught."
It's even better, because these classical statisticians have the power of the gods. They create one of these "true" means every time they imagine a new population or use a p-value, which must have one of these p-values to justify its use.
"They don't say that."
They don't, but that's what the theory they claim to believe insists upon. That theory is frequency and it must have infinite sequences.
"I really don't follow that at all."
Doesn't matter. Ignore it if you don't get it. The point remains the p-value says nothing about the sample you have in hand. It says noting about the chance future samples will be the same or different. And it says zippety-do-dah about what caused any numbers.
"So what do I do?"
Investigate the cause or calculate the chance new samples are different.
"That's it?"
That's it.
"So I'll ask again: what do I do?"
About what?
"About telling if the two means are different."
I must be doing a poor job. I have already told you. Are they the same?
"No."
Then they are different. Look. It's not complicated. That's the answer. There is no other.
"Oh, right. No, I mean, how do I know they will be different in new observations."
You could reason that since they're different now, they'll likely be different again. And be done with it. No need to quantify the uncertainty.
"But I want to have a solid number behind this. I want to do science."
All right. Are you sure you only want to know if the means will be different? Or are you only asking about means because everybody else does?
"Let's start with saying I really want to know about the means."
It's your party. It's easy any which way. Propose some probability model for the two groups, condition it on the observations you already made, then calculate the probability new observations will be different.
"It's that simple?"
It's that simple.
"And the p-value doesn't do that?"
Nope, no way. The p-value has nothing to do with anything, except for the belief the two means you already measured are the same, even if they're different. Like we just saw, it's so bizarre that nobody ever remembers what it means.
"Well, what about the Bayesian posterior? Isn't that the same idea?"
Nope. Nuh-uh. The posterior says something about model innards. There may be technical, mechanical reasons to look at these, But they are of no real interest to man or beast. The posteriors don't answer your questions either. They aren't as nutty as p-values, but they are just as misleading. Don't play with them.
"Let me get this straight. All I have to do to quantify the probability the means will be different in new observations is to use a probability model to just, what, give me that probability?"
Amazing, ain't it? And not only that. You don't have to do just means. You can do any function of the data---don't forget the mean is just one function of an infinite number of them you can apply to observed data. Everything works in this universal probability scheme. Data can differ in more ways than the means. Like maximums, minimums, chance over or under this or that value. Whatever you want. Don't look at means unless it's means you really want.
"I want means. How do I do this?"
You mean, mechanically?
"Yes."
Oh, it's easy enough. Lots of software out there. I can give you some tips once you show me what the data looks like.
"Isn't there a true probability model I can use?"
Maybe. Depends. If you knew the causes of the measures, you have your true probability model. If you don't, you still may be able to deduce one based on other considerations. Stuff you know about the observables, things like that. Or you can do like everybody else: Don't think about it and use a standard model.
"That works?"
Maybe. Only way to find out is to try. If you can't deduce a model, though, don't get too excited about the results. You're flying blind. Like everybody else.
"Okay, fine. I'll wait until you show me which software. What about knowing the cause?"
We'll leave that for another day.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Visit me at wmbriggs.com.