Simple (Optimal) Decisions
Class 90
Reminder: The Thursday Class is only for those interested in studying uncertainty. I don’t expect all want to read these posts. Please don’t feel like you must. Yet, I have nowhere else to put them. Your support makes this Class possible for those who need it. Thank you. Much much math alert!
How to make the best decisions based on goodness criteria you pick.
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: Read!
Lecture
“Suppose an observer is given a voltage varying with time during a prescribed observation interval,” said Peterson, Birdsall and Fox, “and is asked to decide whether its source is noise or is signal plus noise. What method should the observer use to make this decision, and what receiver is a realization of that method?”
Excellent questions. The voltage level here was displayed on a scope hooked to a radar, which is to say a radio (transmitter and) receiver, and the signal plus noise was the identification of an aircraft during one of our rulers’ Twentieth Century wars. The “noise” is misnamed, for all noise is signal of some kind (as we discussed before). We only call unwanted signal “noise.”
The wanted signal here is a real aircraft, which results in a voltage which itself is displayed on a screen (think oscilloscope). The unwanted signals is everything else, all of which adds to the voltage level.
The idea, in strict probability, is easy, though implementation can be more or less difficult. We have a stream of data from which we calculate, at any time, the probability the data represents the signal we want. Such as a plane (from the voltage level) or a disease (from the PSA) or whatever. I.e. Pr(Y = signal | X = data, E), where as always the E is whatever other evidence we bring to the problem.
After Pr(Y|XE), and only sometimes, comes the decision, which is not the probability. If Pr(Y|XE) ? p, we might decide to act as if “Y = signal” is (or will be, or was) true, else we act as if it is false. From which only four things can happen:
We act as if “Y=signal” is true; it is true; a true positive;
We act as if “Y=signal” is true; it is false; a false positive;
We act as if “Y=signal” is false; it is true; a false negative;
We act as if “Y=signal” is false; it is false; a true negative.
Now here is where it can be tricky. Suppose, as is often the case, your Pr(Y|XM) is a one-to-one function of x; I mean the level of x, which can be a voltage or some other measure such as some antigen or whatever (and where I use ‘M’ to indicate we have a model). This measure goes from small to large (or vice versa), with greater (lesser) values indicating higher probability of Y (shorthand for “‘Y=signal’ is true”). Then picking a p or picking an x_p, the subscript indicating the one-to-one relationship, to decide or act as if Y, is the same.
That is, using the probability or using the measure itself leads to the same decisions. That means any verification method used to judge your judgements would lead to the same outcome. And you will often see the measure or score used directly, and not the probability. But it is all the probability after all. It is probability even if you never compute a quantitative number for the probability. And it is probability because there is uncertainty: it is not certain that Y if X > x_p, nor if Pr(Y|XM) > p.
It’s not that one must first calculate a p from the X in order to decide or verify. But to remind ourselves it’s as if we have done that very thing. We are not escaping probability, nor decision theory by working with X. Which brings us right back to our initial discussion of judgment functions and worthiness premises. Because, as you can now see, it’s the same thing.
So we could use any of the tools we’ve already developed to verify our model (or implied model).
What is, therefore, the best p or X to use? There is no answer. No universal one-size fits all answer. It is true that p is entirely objective, once X and M are specified, as all models are entirely objective. But what is best depends on the uses to which your model (the p, or X indirectly) are put.
A good part of that discussion was reminding you that not all decisions and consequences can be quantified. I remind you of that, and of the strict warning that one must not lather on quantification merely for the sake of putting numbers on things. I double that warning as we do just that, because of custom and sometimes quantification is desired.
Assuming we want or need quantified verification, like before there are two situations: (a) we judge past performance of a model (picking winner like in a footrace), or (b) we use the past data and verification to improve future judgements. The two (as we said before) are often confused.
Let’s revisit the table (in bullet points) above, only now assigning quantitative measures to the outcomes:
Y = 1 & X = 1 :: a_11;
Y = 1 & X = 0 :: a_10;
Y = 0 & X = 1 :: a_01;
Y = 0 & X = 0 :: a_00.
Where I trust the shorthand is obvious: “X” is overloaded as notation to the decision: X here is a function (of the measure x or p). Most worthiness premises (But see Paul Erhlich) would say a_11 and a_00 are desirable, and thus positive. They would also say a_10 and a_01 are undesirable, and thus negative. A user has gained more than lost if
There is no harm if we assume a_{10} and a_{01} are positive if we remember what we are about. This allows us to drop the annoying absolute value notation, and is the approach I will use henceforth.
If we are in situation (a) and judging past decisions, then our job is simple. We calculate
where the n_{ij} are the obvious counts of the table. If S is positive, then our decisions gave more benefit than produced harms. If we are rating different models, then the one with the largest S is the winner.
If we are in situation (b) the job is harder. For one, we do not have a fixed S, but a suite of them, one for every value of p or x that could have been used; i.e. S_x . For instance, if the measure X had (or has) ordered values (x_{(0)},x_{(1)}, … , x_{(q)}) we have the decision rules: “Act as if Y = 1/signal if X > x_{(0)}” (or less than or equal to) which gives S_0, “Act as if Y = 1/signal if X > x_{(1)}” (or less than or equal to) which gives S_1, and so on.
This will produce a string of S: S_{(0)}, S_{(1)}, …, S_{(q)}. One or more of these will be maximum, and will correspond to an x_{(j)}.
If all S_{(j)} ≤ 0, then my friend your measure x stinks.
Or your p, hence your M and X, if that it what you were using. Above I said p and X are equivalent, and that’s true. But it’s a local truth, because you must remember Y does not have a probability. Nor does Y given X. We only get the probability by adding premises M.
This is utterly crucial to keep in mind, because when you read other authors on this subject, such as Peterson, Birdsall and Fox above, then either make the mistake or given the impression that Y (given X) or that X have distributions. And so universally recognized optimal decisions can be had. They cannot. Because all probabilities are conditional on the assumptions you—as in you—make and most decisions are local and do not involve necessary moral truths. For instance, there is no moral law about the voltage represented on a CRT.
So while again it’s true that p and x can be one-to-one, there can be many different p for the same x. How many different p? How many? That’s right: infinity.
This being so, we can introduce skill in this situation. Skill is when S_max(M2) > S_max(M1) for two competing models M2 and M1.
We always have the natural model M0 to compare against here. WATCH THE VIDEO FOR CLARIFICATION HERE. Suppose we always acted as if Y = 0/no signal. Which is another way of saying X = 0 always. Then we would score
and if instead we picked X = 1 always, we’d score
Let n_0 = n_01 + n_00 (i.e. all the times Y = 0), and n_1 = n_11 + n_10 (i.e. all the times Y = 1). Then we would pick M0_0 as our best model iff
Which is when
If all a_ii are equal, the decision rule becomes simple: choose X = 0 (M0_0) if
𝑛0≥𝑛1,
else choose X = 1 (M0_1). Example: if you are deathly afraid of false negatives, then you might have a large a_01, and so would tend to pick M0_0.
If you have a rival model M1 (or rival models), ir or they ought to beat M0, or the rival model has no skill.
Next time: examples and such things as sensitivity and specificity.
Note: Let A_t = (a_11, a_10, a_01, a_00). Thus A itself can vary from observation to observation or in time, as happens in real life, too.
Here are the various ways to support this work:


