Statistics (Philosophy) Quiz: See If You Really Know What You're Doing Using Tests
This is from Gerd Gigerenzer's "Mindless statistics" in The Journal of Socio-Economics, 33, (2004) 587–606.
Have a go before looking at the answers (I'm giving my own, not quoting Gigerenzer). Send this to anybody you see using null hypothesis significance testing.
Suppose you have a treatment that you suspect may alter performance on a certain task. You compare the means of your control and experimental groups (say 20 subjects in each sample). Further, suppose you use a simple independent means t-test and your result is significant (t = 2.7, d.f. = 18, p = 0.01). Please mark each of the statements below as "true" or "false." "False" means that the statement does not follow logically from the above premises. Also note that several or none of the statements may be correct.
1. You have absolutely disproved the null hypothesis (that is, there is no difference between the population means).
[] true/false []
2. You have found the probability of the null hypothesis being true.
[] true/false []
3. You have absolutely proved your experimental hypothesis (that there is a difference between the population means).
[] true/false []
4. You can deduce the probability of the experimental hypothesis being true.
[] true/false []
5.You know, if you decide to reject the null hypothesis, the probability that you are making the wrong decision.
[] true/false []
6. You have a reliable experimental finding in the sense that if, hypothetically, the experiment were repeated a great number of times, you would obtain a significant result on 99% of occasions.
[] true/false []
No cheating.
No cheating.
No cheating.
No cheating.
No cheating.
No cheating.
No cheating.
1. FALSE. Obviously you have proved nothing about any "null" hypothesis. We don't even know what the means in the two groups were, but we can deduce they were not equal. If they were equal, we'd have t = 0 (if you can't recall why, look up t-test formula).
The obvious confusion begins here, thinking this is a "sample" from some "universe", which it might be, or it might not be. Either way, we know everything about this data, so we don't need to make any probability judgments about it. Unless we want to make predictions about new observations. If we're not going to have new observations, again, we don't need probability.
We don't know the cause of every value. We can guess the treatment might be a cause. If it is, it joins the list of all the other causes operating on the measurement (whatever it is).
2. FALSE. The standard null hypothesis is some parameters representing the central value of normal distributions, parameters which have a real existence in some kind of Platonic realm, are equal to one another. The real existence of these parameters is taken as a given. They are never observed. Not ever. They cannot be observed. Not ever. So there is no way, not ever, to know whether they are equal or unequal. That they exist is a pure matter of faith.
3. FALSE. Here we go. By "population means" the authors mean those Platonic parameters of the normal distributions representing the uncertainty in the measure. We have proven nothing about them, even if perchance they do exist. We don't know if they're equal, unequal, or anything.
The p-value in particular says nothing about their value.
Memorize this: The p-value is the probability of seeing a t-statistic larger than 2.7, or less than -2.7, if the same experiment were repeated an infinite number of times, and if those Platonic parameters existed, and it they were equal to one another. Only that, and nothing more.
4. FALSE. You do not deduce it universally, but you can do it locally, in the following sense. If you assume the only two causes operating on the measure are your treatment and whatever it is operating on the control, because the measures were not equal (which we know because t does not equal 0), then we know the treatment is a cause.
Which isn't learning much, because we started by assuming the treatment is a cause.
5. FALSE. The null, we saw, was the equality of two unobservable parameters. Rejecting this, and saying they are unequal, is an error when they are in fact equal.
We do not know the probability the two parameters are equal; thus we cannot know the probability they are unequal. The p-value is silent on both these probabilities. Thus we don't know the probability of our mistake, assuming we made one, nor do we know the probability of correct act, assuming we made one.
6. FALSE. We know nothing about the reliability of the observations. Given how badly much of today's The Science is done, we can make no assumptions, either. But, assuming scrupulosity on the part of the experimenters, the statement is still false.
It is not "a great number of times" the experiment has to be repeated to get to that 0.01---which only works if the null is true. It must be an infinite number of times. Frequentist theory is silent on all finite measures. It only works at the limit.
Also, it must also be true that the Plantonic realm holding the parameters is real, as are the parameters.
Conclusion Do not use p-values or testing of any kind.
Subscribe or donate to support this site and its wholly independent host using credit card or PayPal click here