Reader question:
Dear Dr. Briggs
I regularly read your blog and have purchased both your books. I especially liked your book on Uncertainty, although I need to study it a second time.
I recently came across this statement on an EPA site, and wondered immediately what you would make of it:
"Available software cannot distinguish between variability and uncertainty. Some factors, such as body weight and tap water ingestion, show well-described differences among individuals. These differences are called 'variability'. Other factors, such as frequency and duration of trespassing, are simply unknown. This lack of knowledge is called 'uncertainty'. Current Monte Carlo software treats uncertainty as if it were variability, which may produce misleading results”
Source: [EPA link]
I am really interested to know your thoughts on this statement.
Kind regards,
[Anon]
This is from an EPA article on the "Use of Monte Carlo Simulation in Risk Assessments". Before we get to that, if you haven't already, read "The Gremlins Of MCMC [Markov Chain Monte Carlo]: Or, Computer Simulations Are Not What You Think".
Don't be lazy. Read it.
There aren't any such things as "random numbers", so MCMC models are just like all other models: they only say what they are told to say. If you feed a model "random normals", or whatever, you are just giving it numbers, which are manipulated exactly as you say they should be manipulated. And the numbers you give the model are exactly the numbers you specify. There is nothing mystical to them.
In other words, attempts to feed models "random" numbers so that they behave like nature "picking" distributions or whatever aren't anything like that at all. They are just models doing precisely what they are told to do by modelers. That modelers sometimes don't know, or can't anticipate, what the outcomes of their models are doesn't mean the model isn't doing what it was told.
So much for the throat clearing. Let's examine EPA's terms. The article starts like this:
EPA's current risk assessment methods express health risks as single numerical values, or "single-point" estimates of risk. This technique provides little information about uncertainty and variability surrounding the risk estimate.
Now risk is just this: the probability of a bad thing (death, disease) given certain premises or assumptions. Risk is not cause. If it was, then this probability would always be 0 or 1. Given those certain premises, we can come to, as they say, a "single-point" estimate.
For instance, the probability (risk) of dying a horrible death given you are walking blindfolded on a soaring cliff in a blizzard is high. If you quantify all the connections of those premises, you can come to a single-point estimate. If you don't quantify them, we use fuzzy words like "high".
If those premises vary, or are changed, then the risk changes. Change the premises, change the probability. Add the premise that you can fly, then the risk drops to zero (because cause). Add the premise that you are wearing waxed flat bottom shoes, then the "high" becomes "Egads".
Something closer to the CDC example. Being fat and having, say, diabetes. Knowing only that you are fat, quantified in some way, such as BMI, we can quantify a risk of diabetes. As BMI varies, so does the risk. It's still not cause, just correlation. This risk is a model. And all models only say what they are told to say.
Does the model apply to you? Only if you want it to. All we know, as said, was BMI and probability of diabetes. We don't even know what "data" informed this model. We just have the model.
But in this model, as BMI varies, so does the risk. This is one kind of variability the CDC mentions.
The unknowns, or unknown unknowns, are everything not in the model. Such as whether the model applies to you. If you knew it applied to you, that is because there must necessarily be other premises in that model that allowed you to say it applied to you. For instance, suppose you learn the data that informed the model was of 40-55 year olds, Americans, men. And you say to yourself, "That's me!"
Well all those things vary, too. That range of age is a second kind of variability. But not in the model. In the model, they are fixed. It says nothing about 56 year olds, or Russians, or women. If you decide to apply the model to yourself, and you are Canadian, male, and 56, you have created a new model. One that also includes the premise "And Canadian man, 56". Maybe it's a useful model, maybe not.
Now you want to know whether you will get diabetes. That has a cause, or causes. If all causes were in the model, and the model applied to you, again the probability would be 0 or 1. Supposing it isn't, then at least one of the causes of diabetes is unknown to the model.
Those are the unknowns the CDC means.
I'll end with my usual harangue. Nobody has a probability/risk of diabetes, or of anything. Probabilities are always conditional on the premises you assume. Once you make assumptions, i.e. create premises, you create the probability. Change the premises, change the probability.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Visit wmbriggs.com.
Very detailed and interesting explanation.
It is the goal of scientists, in general, to try to
eliminate all the variables but one - then they
have a conclusion. This is typically done under
controlled conditions.
This is so often difficult or impossible to do
in real life. There just too many unknowns.
Life is not a controlled condition.