In Which Feynman Makes A Mistake: Negative Probabilities (Used In AI, QM, Finance)

Dec 10, 2024

In Which Feynman Makes A Mistake: Negative Probabilities (Used In AI, QM, Finance)

Richard Feynman wasn’t the first to suggest negative probabilities. That recognition goes to Paul Dirac, who introduced them and negative energies at the same time—because he thought these things explained curious behavior in quantum mechanics, Dirac not knowing of substance (blog/Substack), and because these strange concepts got some equations to work out the way Dirac wanted them to work out.

Getting equations to match Reality is our theme today, and what that matching means and doesn’t mean. Our thesis is that because equations (i.e. models) might match Reality, up to whatever standard, it does not mean that the guts of equations themselves necessarily represent Reality, or are real substances in themselves. This thesis is yet another version of correlation is not causation. Which is often forgotten, especially when you’re really good at math.

I’ll illustrate this thesis using negative probabilities, but it applies to equations of any kind.

Most will only want to read up to More Details, and skip everything after, which is mathematicalities for those who want to go deeper.

No one ever claimed Dirac, who was great at math, was easy reading, so it was left to Feynman, also great at math but who wrote well, to explain the idea of negative probability. He has a paper anybody with standard college mathematics can follow with ease, which you should read. I’ll paraphrase the first part here, stripping it to bare minimum. This paper is extremely useful because Feynman does us the service of making a mistake clearly.

(I’m changing the notation slightly to match that we use in the Class. Don’t forget: we deduced from first principles what probability means; it’s not frequency or bets, but logic.)

There is a roulette wheel was three slots, 1, 2, and 3. The wheel can vary its conditions two ways, A and B (say a man throws a switch to employ magnets in B), such that Pr(1|AE) = 0.3, Pr(2|AE) = 0.6, and Pr(3|AE) = 0.1; and Pr(1|BE) = 0.1, Pr(2|BE) = 0.4, and Pr(3|BE) = 0.5, where “E” is our evidence of this setup, and A and B the conditions.

Turns out the wheel is condition A 70% of the time, and B 30% of the time, evidence which is also part of E. But when you walk up to the wheel, you don’t know whether the wheel is in A or B.

You want to bet. Given all this evidence, what is the probability of slot 1? Easy to figure:

Pr(1|E) = Pr(1|AE)Pr(A|E) + Pr(1|BE)Pr(B|E) = 0.3 x 0.7 + 0.1 x 0.3 = 0.24.

This is the probability to you, with respect to E. Not to the guy flipping the switch, who has different evidence. You can easily figure the probabilities of slots 2 and 3 in the same way. Because probability is not in the wheel, or in anything, except in your mind.

Next thing Feynman does is invoke negative probabilities. Everything stays the same for condition A, but for B, according to Feynman, Pr(1|BE’) = -0.4, Pr(2|BE’) = 1.2, and Pr(3|BE’) = 0.2. That E’ signals our new evidence.

Not only is there negative probability, but there is the curious “1.2”, which of course is larger than 1. It has to be so that Pr(1, 2, or 3|BE’) = 1. What do numbers larger than 1 mean? Feynman really doesn’t say. He cannot say specifically, because they don’t have any meaning. Nor do negative numbers for probability. We don’t need a negative number for Pr(not 1|BE’), because “not 1” is logically equivalent to “2 or 3”, thus Pr(not 1|BE’) = Pr(2|BE’) + Pr(3|BE’).

We conclude that the strange numbers are nothing more than numbers in equations. What is Pr(1|E’) now?

Pr(1|E’) = Pr(1|AE’)Pr(A|E’) + Pr(1|BE’)Pr(B|E’) = 0.3 x 0.7 – 0.4 x 0.3 = 0.09.

Feynman emphasizes that all is well because this probability, of slot 1, is still positive, so it can be measured against Reality (we also get positive numbers for slots 2 and 3). This is his error, made clearly.

To see, let’s try Pr(1|BE”) = -0.8, Pr(2|BE”) = 1.6, and Pr(3|BE”) = 0.2. Pr(1,2, or 3|BE”) =1, as required. But then

Pr(1|E”) = Pr(1|AE”)Pr(A|E”) + Pr(1|BE”)Pr(B|E”) = 0.3 x 0.7 – 0.8 x 0.3 = -0.03.

Meaning, of course, that just because we can get some equations to work out in one aspect, like sensible probabilities for Pr(1|E’), it does not mean that the innards of the equations that got us there represent Reality. The error is that those good at math and models don’t always pause to reflect what the guts mean, not wholly. Or they are too quick to give interpretations to the innards As long as the “main” equation (for Pr(1|E’) here) works, then the models must, we usually think, have something to do with Reality. Yet this need not be so.

The game was given away by no less that Stephen Hawking, here quoted by Haug in another attempt to get people to love negative probabilities because they make the equations work out:

I have done some work recently, on making supergravity renormalizable, by adding higher derivative terms to the action. This apparently induces ghosts, states with negative probability. However, I have found this is an illusion. One can never prepare a system in a state of negative probability. But the presence of ghosts that no one can predict with arbitrary accuracy. If one can accept that, one can live quite happily with ghosts.

Physics have been too quick for a long time to ascribe being to the bits of their equations because the math, more or less, works out.

Think about this as you read the rest of Feynman’s paper, where he has lots more examples of equations that match Reality more or less well, but which have impossibilities, like negative probabilities, for guts. This is not uncommon in quantum mechanics or string theory. The lesson is that each of the bits in a model must themselves be verified, or deduced, against Reality. Mere matching is insufficient.

More Details

People trying to sell negative probabilities appeal to what David Stove called the Columbus Argument (from the old song, “They all laughed at Christopher Columbus…”). People didn’t like negative numbers, at first, and you can’t have negative five apples, but look how useful negative numbers are! All mathematicians accept them! Therefore, negative probabilities must be believed.

I don’t have a proof that such creatures cannot exist. But we do know all appeals to the Columbus Argument necessarily fail. We also cannot point to usefulness of final equations that have NPs in them, as we saw above. To show they exist, we at least need to know exactly what they are, and precisely what larger-than-one (in absolute value) numbers mean as probabilities.

They couldn’t be negative evidence, because that’s the “wrong side of the bar.” That is, if we have Pr(A|E) for some A and E, we already can have Pr(A|E\N), where “E\N” is the evidence E subtracting (logically) the evidence N. That still produces a probability in [0,1], as expected. I made this point on Twitter to somebody touting Mark Burgin’s Theory of Knowledge:

Problem with this is that there is no p(r), but there is p(r|e_i), the probability of e_i accepting the evidence e_i. Could be that p(r|e_1) = 1, and that p(r|e_2) = 0, and that p(r|e_3) = q, where 0 < q < 1. In each of these p(not-r|e_i) = 1 – p(r|e_i), and logic works. The difficulty [above] originates in an equivocation on what “r is not true” and other cases where it is. It can only be known to be not true based on evidence assumed, which changes. Thus negative probability is not needed. And this difficulty in equivocation happens because it is forgotten all probability is conditional, with no exceptions. Spelling out the evidence (e_i) clarifies this, and shows that the real magic in “negative probability” lies in understanding what happens to changing evidence.

The lure of infinity is strong in negative probabilists. Gábor J. Székely invented something he called “half of a coin” to boost NPs. He gave his half coin the probability generating function 1+z2; he assigned this an infinite sum, and proved it converged to 1. Call his evidence G.

Now you can “pull out” the probability mass function from PGFs with differentiation and evaluating at z = 0, and dividing the lot by k!, where k is the k-th derivative. The 0-th derivative is the PGF itself, and so Pr(0|G) = sqrt(1/2) ~ 0.707. The 1-st derivative is (1/4)sqrt(1/2), so Pr(1|G) ~ 0.177.

The 2-rd (dividing by 2!) is -(1/32)(1/2)^(-3/2), so Pr(2|G) ~ -0.0884.

What is this? How do we interpret it? The PGF of half coins does converge at infinity to 1, as required in a probability. But you can see that every even derivative will give a negative probability. Can a “2” of this half coin ever be realized? If you say No, precisely what do you mean by No? You can wave your hands and say it’s a “superposition” or some such thing, but that’s only to give it a label. It does not explain it.

Now Székely has a theorem (from this paper) that says if f is a PGF from a probability mass (density) function (pmf or pdf) that has negative probabilities, then there exist two pdfs g and h such that fg = h. He remarks that for two PGFs x and y, multiplying them, xy, gives the probability mass function for x + y. This is true. But in fg=h we have a PGF multiplying a pdf, so g is not another PGF, to get a convolution of f and g. Which merely means we have a computational aid of the following sort: Pr(A|E) = p + q, where we can let p and q (or functions of them even) roam wherever we like as long as the sum p + q is in [0,1]. However this helps in calculating what we want, we haven’t invented negative probabilities, we’ve just used equations in making presumably harder calculations easier.

Haug (who quoted Hawking) uses NPs in finance models, and these models don’t always work out. To that he says, “Personally I don’t know of any financial model that at some point in time has not had a breakdown—-there’s a reason we call them ‘models’.” Very true. Finance models, much more often than physical models, have nothing to do with Reality and are only correlational. NPs in this context, if they acknowledged as mere crutches, might have some use.

But one mustn’t let them go to one’s head. Else one will end up like Székely, touting not only NPs, but negative variances, too!

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE

Man of the Atom

Dec 10Edited

I'm no physics historian, but it can be seen that one can "follow the math" to discover if there are useful physical principles that result. An example is Max Planck who, when wrangling with the "Ultraviolet Catastrophe" of black body radiation, tested various mathematical models to see which fit the experimental data. The result was a mathematical relationship that indicated energy might be quantized in some physical cases, and not always be continuous in nature. That was a useful path, and a productive one for both science and engineering.

The demand that negative probabilities exist -- demanding that what should be an absolute value quantity is negative, appears to be torturing the mathematics rather than following it. As a thought experiment, it led Dirac to the negative energy sea of virtual particles. But, like virtual particles, is the negative probability model just as "virtual?" Fitting Reality to your preferred formula has become more prevalent with time. We are likely experiencing more and more dead-ends in science because of this desire to conform to the "pretty" mathematics rather than conforming to "ugly" experimental data.

Expand full comment

Malenkiy Scot

Just for kicks, here is an example of a ghost solution from middle school.

A boat travels from point A to point B and back. The speed of the current is R, the velocity of the boat without current is V, and its average velocity for the whole round-trip is U. Assume that the speed of the boat is V + R when it travels with the current, and V - R when it travels against the current. Then it is easy to see (math!!) that the three quantities V, U, and R are connected by the equation V^2 - VU - R^2 = 0.

Let R=2, and U=3. Then solving for V we get 2 solutions: 4 and -1. That -1 is a "ghost solutions" - it does not have a physical interpretation.

2 replies by William M Briggs and others

7 more comments...

Science Is Not The Answer

Discussion about this post