Question from Anon (with added paragraphifications):
Message: Dear Dr Briggs, I am a lay man in statistics, who has tried to avoid this kind of sport as much as possible, so I would be very grateful if you could help me make to understand the following issue: I have a discussion about reporting the number of significant digits of mean values, calculated on basis of multiple measurements of a given property.
As a biochemist, my take on this is very simplistic: if a device has a resolution of say 1 digit, mean values are reported with a significance of at best 2 digits.
For instance: if volumes are measured on a device having a 1 ml resolution (precision), mean values are reported with a 1 digit precision at best, e.g 1.3 ml.
On the internet, however, the common opinion is that resolution is increased by repeating a measurement many (infinite) times. Example: sea level is measured with a resolution of 1 cm, but by taking many measurements, it is claimed to get a mean value with a precision of, say, up to 1 mm.
My (simplistic) take on this is that mean value is a mathematical artefact per se and as such does not necessarily have a meaning in the real world.
For instance: in the Netherlands the average number of children in elementary school classes is 30.66 (SD: 1.47), yet there are no classes with this exact number of children (as far as I am aware of). I would be much obliged if you would explain to me (as if I am a labrador) what's wrong with my line of reasoning, and what then the sense is of reporting mean values etc with precisions that exceed the instrument's physical capabilities. With kind regards Anon
Thanks, Anon. Here's my semi-contrarian take. For the busy: the answer is there is no answer; not a single one-sized-fits-all answer.
Start with the idea that there is no such thing as probability. Therefore, there are no such things as probability distributions. And if there are no such things as probability and probability distributions, there are no such things as parameters of probability distributions.
That is, none of these things have existence in Reality. They are useful, at times, mathematical tools to help quantify uncertainty. They are epistemological aids. But they do not exist, i.e. they do not have being.
Now any set of measurements of your beaker, each capable of being measured to 1 ml, or to whatever level, will have a mean. That is, add up all measurements, divide by the number of measurements, and that is the mean. Of those measurements. It can be reported to as many digits as you care to write. Infinite, even. 10.1876156161257852715566561871371215118128712121010121012021 ml. Or whatever.
That is the mean of those measurements. By definition. This many-digit number is also verifiable, in the sense that you can go back and check your calculations and see if you made any mistake. All assuming the measurements are the measurements. I mean, the measurements might not be accurate representations of the contents of those beakers, due to error, bias, or whatever. But they are the measurements you used in calculating the mean.
Well, so far we've said nothing (though it took many words to say this nothing). Except that if you take an average, it is the average of those numbers. Indeed, if you do not report all the digits in that average, you are cheating, in a sense. You have shaved away information. If the mean was 10.1876156161257852715566561871371215118128712121010121012021 ml, and you report only 10.2 ml, then you have said what was not so.
We're done, really.
Unless we want to say something about future (or unknown) measurements. Then we can use those old measurements to inform our uncertainty in future measurements.
Any set of future measurements will have a mean, just like the old set. The one big assumption we must make is, if we want to use the old measurements, that future measurements will have the same causes in the same proportions as in the old set. (I'll stop here today with just saying that on that large subject.)
If we believe in the similarity of causes, we might model our uncertainty in that new mean using the old measurements. We can also, of course, model our uncertainty in the new measurement values, or of any function of them (the mean is one among an infinity of functions).
So then. Given the old data and old mean, what is the probability the new mean will be 10.1876156161257852715566561871371215118128712121010121012021 ml? Can't answer, because it depends on the model you use, naturally. Though we might guess that the probability of 10.1876156161257852715566561871371215118128712121010121012022 ml (note the last digit) is not too different. And very likely, again depending on the model, the probability of one or either is weer than wee. Mighty small numbers. Negligibly small, I would guess, for almost any decision you care to make.
And it is the decision you make with your mode-slash-prediction that drives everything.
For ease, let's call our old calculated mean m. The probability the new one will be exactly equal to m is probably exceedingly small (depending on the model, of course). The probability it is m +/- 0.000000000000000000000000000000000000000000000000000000001 ml (a number an order of magnitude more precise than m; i.e. one less zero) is also likely negligibly small.
And so on up to some point. Maybe that point is +/- 0.01 ml, or maybe it is +/- 0.1 ml, or it could even be +/- 1 ml, such that we are, say, 90% sure the new mean will be in the old mean +/- some window. The window depends on the decision you would make. If you would not do anything differently for a new mean of (say) 10.18 ml or 10.13 ml or 10.25 ml, then maybe buckets of 0.25 ml are fine. Then you'd report to the nearest 0.25 ml. Or maybe you're in some new wild physics experiment of the disappearingly small, and then you want that tight, tight window.
Of course, you cannot verify the actual future observations closer than +/- 1 ml, but you can verify the mean to greater accuracy. The calculated mean, I mean (a small pun). You can't verify the actual contents mean to closer than 1 ml. Not just with the measurements you take (you may be able to verify if other things downstream in the causal chain from the contents mean can be measured). So if that's your only goal, then there's no reason to report other than +/- 1 ml. If that's too confusing, ignore it for it.
And why 90%? Why indeed? Why not 80%? Or 99%? There is no window confidence number that should be picked, except the one that matches the decision you will make, and the consequences you face based on that decision. Not that I will make. That you will make. (This is analogous with betting.)
The whole point of this is that there is no answer to your question. That is, there is no one correct answer. The answer is: it depends.
It's not only your particular data or model, it's all of them. Everywhere and everytime.
<strong>Bonus!</strong> It's pretty easy to do math on this in some standard problems. Maybe we do this some day. But you try it first.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Visit wmbriggs.com.
The example of number of children in school classes provided by Anon is illustrative. Neither the average nor the standard deviation are of any use in this case. There are only a finite number of possibilities (most likely, there is an upper bound; for Germany this bound is around 33; any school class larger than this is due to exceptional circumstances). Any decent reporting of class sizes should come in the form of a histogram, period.
Mr. Anon,
Regarding your remark:
"On the internet, however, the common opinion is that resolution is increased by repeating a measurement many (infinite) times. Example: sea level is measured with a resolution of 1 cm, but by taking many measurements, it is claimed to get a mean value with a precision of, say, up to 1 mm"
This is possible because uncertainty is multiplicative (Bayesian). So if there is a 0.25 error per measurement, then two measurements (.25 x .25) = 0.0625 and so on until you converge on 1.11cm with a 0.0000001 error. It is horse-patooty, of course, and most times disingenuous.
What if there is a huge difference between the data points? Random scatter? Fraud? Obviously, this is abused and used in all sort of ways, as Mr. Bragg said. What does a "mean" mean when the data is a total scatter or skewed? This is why housing prices and sales are (or should be) reported as medians, not means.
I likely don't have all this right in my head this time of day and only 0.90025 cup of coffee in me.