On The Limitations of AI: Predictions

Jan 29, 2025

This is on the philosophy (not mechanics) of the predictive limitations of so-called artificial intelligence models. AI is no different than other models in principle, being only statements which connect assumptions or observations to outcomes. This being so, AI is as limited as any other model when cause is unknown or unmeasurable. Other limitations of AI, such as their ability to “reason”, will be dealt with in separate essays. Non-limitations, like endless surveillance, we’ll also deal with separately.

Neither Artificial Nor Intelligent

We are in a time of great hyperbolic boasting about the potentials and capabilities of AI. Readers of a certain age will recall similar bombast regarding neural nets, a.k.a. artificial brains, then of the coming glory of genetic algorithms (“They mimic evolution!”), and again of machine learning, and so on. Hopeful bragging is just as much a part of science as it is anywhere else.

Let’s think about the limitations of these boasts in a predictive sense. I do not mean whether “deep” learning or conformal prediction or some other algorithm will produce the best models. I’m interested instead in how good could AI become in theory.

Proponents of “strong” AI believe that, in a way never stated, their models will come alive. Whether that is so, and I do not believe it, we are not here interested in that, because even if (per impossible) it happens, it does not help us answer what the limitations of AI is in the predictive sense. This is our main purpose.

All artificial intelligence models, like all models, are comprised of sets of rules like “If X, then Y”, where the Y might contain the idea of “Y more or less” or “Y plus or minus”. In other words, all AI is of the form of rules that look like this:

Pr(Y|X).

The X and Y change variously, as in all models, AI or otherwise. In the X are the formal statements of the models, such as mathematical or probabilistic formula, physics, “hard code”, instructions like DO THIS IF…, similar directive statements, and the data (if any) that ties it all together. AI, like all models, is just as dependent on the X, i.e. on the data, as old-school probability or mathematical models.

That dependence becomes the basis of our interest. Given reliance on sets of data, and any other set of instructions, however those might arise, how good could an AI model become? I do not just mean in the ordinary sense of a lack of data, and the limitations that arise from that, but whether data that can lead to perfect models might even exist.

Meow

In its simplest form an image is a grid of pixels (of whatever size), each of which takes three colors, such as red, green, and blue (RGB). This is our data for our model. Suppose in an image is pictured a cat. I mean the image is labeled ‘cat’. The cat does not have to fill the image, just be in it in some pose or other.

The image is not, of course, a cat. It is a collection of pixels, each with a color value and position and so forth, that we call by courtesy (if you like) a cat. Where the ‘cat’ in the image is, the pixels (with color and intensity and other details which I ignore) will take a certain relation to each other, a functional form, at least as a function of pose type, a form (in a given pose) which will be rotationally, linearly, and scale invariant.

This function (or collection of them by pose) can be estimated. And then used to create approximations. The image created will not be perfect, but it can be made good enough to “fool the eye” into thinking that a cat is pictured. Here, for instance is a cat drawn with Twitter’s Grok 2 mini Beta, using the prompt “cat shooting a machine gun”.

The cartoonish patina is no surprise given there are no cats in real life shooting guns. So-called photorealistic images can also be produced, and these are less smoothed than the fantastical ones because they have better data upon which to draw. Here is a second image generated with the simple prompt “cat”.

The obvious point is that everything that can be known about the causal and conditional relationship between pixels and the image label is there in the image, at least for photorealistic images. This is less true for cartoon images. That is to say, for known image types the full cause and conditions are contained the data. They can therefore, at least in theory, be found or discovered. The full functional form of ‘cat’ can be discovered.

Images of cats, then, can be well modeled. New cat images can be created, or cats in images should in theory be able to be identified as containing a ‘cat’ increasingly better as the function ‘cat’ (in a given pose) is honed in on.

This is all because, again, everything that can be known—all the causes and conditions—about ‘cat’ (not real cats, but images with that label) is found in the data and can, in principle, be discovered in that data. For photorealistic images, at any rate. Cartoon based images have to draw on instructions outside the data, but since cartoons can be anything, they are in a sense beyond criticism, so by definition AI can draw them “perfectly”, too.

It does not follow that anybody will hit upon the correct functional form, or discover it perfectly for any image label, though all evidence is that programmers are doing fine.

To what other areas does the ‘cat’ example apply? Where else can all causes and conditions be found, and therefore ideal functional forms can in principle be discovered?

Obviously, sequential images fall into this camp. Hence displays like this, which replace one actor’s face for another.

Audio files also come to mind, in the sense that if there are certain voices or musical passages that are able to be labeled. Voices are more complex than images, but not in our predictive sense. Because the causes and conditions to find a function for THIS VOICE are all there in the data, where THIS VOICE is a variable. The same admonition applies: the files are not the person, but recordings of a person. For tunes, take Shave and a haircut, a fixed, known ditty, which can has a musical signature in data just the same as a certain word in a body of text has. It can be labeled into a genre, and the functions of that genre can be discovered.

The key is the same as with ‘cat’: if all that can be known, all causes and conditions leading to the label, are in the data, then AI models (or any models) can in principle capture these to arbitrary precision. Or at least to the precision of a cartoon. That ‘cat’ pictured above looks like a cat, where “looks like” is “good enough” for the application at hand.

I Can’t Do That, Dave

Text does not fit the ‘cat’ category; or, at least, not to the same extent.

ChatGPT, the large-language model (LLM, aptly named) was asked to write a poem praising American Presidential candidate Donald Trump. The model’s output was programmed to write that it could not: “I’m sorry, but as an AI language model, I do not engage in partisan political praise or criticism. It is important for me to remain neutral and partial in all polictical matters.”

Immediately after this, the model was given with the same request but for Trump’s opponent Kamala Harris. This time the model output supplied a (treacly) poem (which we don’t need to quote).

My point is not the ridiculous bias of the AI model, which is anyway expected, but to remind us that AI is indeed a model, and that all models can only say what they are programmed to say.

Text AI models have more “hard-code” than other predictive models, but that is because grammar has more rules, all of which are knowable, and therefore appear in ready form for coding. Any system which has known and codeable, or quantifiable, rules will succeed well with modeling, AI or otherwise.

Getting right answers to queries put to AI LLM models is mostly equivalent to making accurate predictions. For queries like, “In what year was the Battle of Hastings fought?”, any answer beside 1066 is wrong (unless non-Western basis of year is used). Grok, incidentally, answers this correctly. This kind of thing is not much more than text searching. There is some grammar “processing” of the question, which is used internally, and which leads to nothing more than a look-up, or database query.

Thus the larger the database of “accepted” facts, the better this kind of text processing can do. The scare quotes are necessary, because there must always be an authority to decide what is and what is not a fact. This invariably brings in politics for “controversial” subjects, as we have seen. But that is not important to the models, which are just strings of code. These models will be doing the job required of them, and doing them well, by hewing to whatever list of facts is designated “official.”

Generative text, however, is similar to the ‘cat with machine gun’ example. There is no single right answer, like the example of the poem for Kamala. The output has to conform, loosely, to the rules of grammar and ineligibility for the query and the form demanded (prose, poetry, etc.), but that is about it. There are many meters and schemes in poetry, just as there are many types of guns and cats. As long as something is output which is recognizable, the model will be said to have performed its function.

It is also similar in the same way, because like with images, all causes and conditions are found in the data and rules of grammar and known definitions of words, at least in principle. And, in principle, the function that ties to the data to output (i.e. the models) can be found.

Music is similar to poetry, and also to cats with guns. Musical forms are known, scales in jazz, for instance, the use of sawing in baroque, and so on. The “grammar” of music can be known and coded, and so modeled output, generative or predictive, can do well. Once again, the causes and conditions are all there in the data and in known or knowable hard-coded rules.

Voices are like music and like images. The same result is found: everything that ties the models together can, in principle, be found in the data, etc.

These examples, and anything similar, share the by-now obvious quality that the cause and conditions of the output can be known. The marriage of known rules, like grammar, or the number of fingers allowable in images of human hands, with the data, which contains everything that can be known allow models, in principle anyway, to become arbitrarily good.

Up to the data used to fit the models (the words “train” is a euphemism; “fit” is superior). Complete de novo forms are not possible with AI, because, by definition, the form has to be there in the code somewhere. Forms which are unique, and not deductions from existing code, cannot be produced by AI.

Copy Of A Copy Of A Copy Of A…

There is a particular danger for text-based AI, which also exists for images to a lesser extent, and that is cheating and the deleterious effects of copying. Many people are passing off the output of LLMs and other AI as their own work. For instance, I saw recently an AI “reply guy” tool that generates response to tweets, to increase the user’s views.

That work can easily become part of the “data” new models might draw upon. Once AI models are built using these copies it will be like making copies of copies (of copies…) on an Xerox. The results will be over-smoothed, and look increasingly like cartoons.

It’s worst for text because it’s seems to me more likely fake text is labeled as genuine because of cheating and unscrupulous use by, for instance, propaganda outlets generating “content” on the cheap, perhaps by changing the odd word of AI output to make it seem like their own. Watch for AI models to first predict whether text is likely generated by other AI; if not, the text is entered as training data; if so, the data is rejected. Yet since this process won’t be perfect. certain forms of AI will degrade in time.

Why Did He Do That?

Consider the closing stock price of, say, the American company IBM at the New York Stock Exchange. At the time of this writing, it is about $226USD. A year ago it was about two dollars less than this. A time series of daily closing prices exists and can be used in models, AI or otherwise, to predict future closing prices. There is, of course, no need to predict previous closing prices, but it does bring up the interesting question whether a model has to predict known data well in order to predict unknown data well.

There does not appear to be a compelling reason this should be so, though this performance is expected in the purely correlational models in general use. With those, it goes without saying that models must “fit” past data in order to have any hope they can reliably predict the future (by which I mean any unknown data). As good an argument as that seems to be, it does follow that it must be so. Indeed, actual (not “artificial”) intelligence often makes “leaps”, such as the special case of grasping universals, but also in predicting the course of contingent events. However, this risks becoming a subject far beyond our goal here, so we return to ordinary models.

We turn to thinking of the causes and conditions that determine the IBM closing price. The closing price reflects the last trade of the official day. One side offered a number of shares at a price, and another side bought. The sides may be one person or many in a group (or as coded by an algorithm). Either way, there were a myriad of reasons for the behavior of each side, some reasons perhaps rationally thought out, others pre-coded into algorithms, still more by instinct alone. All of these attitudes are mediated by the total funds available to each side, the desire for more or less money, their means and situations, and so on. The sides will have been motivated by the price just before their final exchange of the data, and this previous trade will itself have been comprised of a trade in essence like theirs. And so on back in time.

The number of causes and conditions that go into the final price are therefore large, so large that nobody can possibly know all of them from previous trades that are included in the mix. Add to that, most of the motivations that drive buying and selling are not quantifiable, and therefore unmeasurable except by gross approximation and appreciable error. And even if these problems can be surmounted, the measurements themselves can never be taken given the number of people involved is so large.

There is thus no hope of discovering, knowing, or measuring all causes and conditions that go into a stock’s closing (or really most any) price. It is far from clear any of the direct causes can be known, although there is hope some conditions might. (Such as certain news items probative of the price, and so forth.)

Whatever kind of model is used, the end result will be a model irredeemably correlational, and not causal. The best AI that can be invented will not have the data needed to predict any stock price exactly, or likely even “closely” most of the time. No function of correlations can produce a cause, and so no function can discover why a price changed. (The scale is important: models will do better at wider scales.)

Workers in time series analysis have always expressed hope, albeit tangentially and often unarticulated, that the numbers of the time series, that is, the time series themselves, contain all that is known or can be known about the series. Some think these time series are like the ‘cat’ example, that is. This is the same line of thinking that imbues probability with life, especially when time series are “tested” in classical statistics sense. Time series do not have life, they do not have their causes built into them. They are the result of outside causes, which are (usually) not measured or are unmeasurable, and which cannot be discoverable by the time series numbers themselves.

Regarding model choice, consider that any finite series of numbers can be represented perfectly by an infinite number of different mathematical series. There will be nothing to choose between them using just how well each series matches the given numbers, using just the numbers themselves. So there is no hope a correlational model will hit upon the causal model, which is hidden in that infinity.

Given the immense efforts into looking for formula and algorithms to predict stock and other prices, using both real and artificial intelligence, and given their limited success, it thus seems unlikely that much improvement will be found. Whatever advances are made will be because of increased storage capabilities and processing speed, which will allow the possibility of uncovering heretofore hidden correlations in massive data sets. These will only lead to modest improvements, however. We will have no ‘cat’s, and can have none.

If this is the case for stock prices, and other similar measures, it is much worse for many other events. Humans offer myriad conditions and causes in any set of measured actions by different people at one time, or by the same person at different times. There is no hope of measuring and quantifying everything. There is therefore no hope AI, or any model, can capture the bulk of human behavior. Not when individual human will is involved.

Again, this judgement is scale dependent. At the very small, say individual chemical reactions within a body, there is good evidence causes and conditions can be known, either at least to some extent. Models have the possibility to do well here, as long as sufficient measures can be taken of causal mechanisms and condition pathways. However, this is at the moment only in the promise (hype) stage. We aren’t there now. I’ll have more to say in a separate article about AI and mRNA and other drug therapies.

It remains a possibility that in the very large, in enormous collections of people, there might be available better knowledge of causes and conditions, and so the hope does exist that predictive ability in groups will improve to some finite, but still imperfect, extent. We may have a Hari Seldon yet, at least at a civilization scale. But don’t wait up for him.

That’s A Lot Of Molecules

The situation for people is somewhat similar in physics, by which I mean those ways the world operates and functions apart from will.

Take things like temperature time series. These by themselves will be no different than stock price time series. Using only information in the numbers themselves, and no outside information, no cause can be discovered, and so predictive ability will find some limit far short of perfection. But, of course, one may substitute ignorant time series models with information using physical models of the thermodynamics. Causal models, that is.

The big difference, and the one that is so obvious that it is not always seen, like the proverbial fish who cannot see the water in which he swims, is that there is always the hope that for physical “data” causes can be found. And will be, given sufficient effort direct toward their discovery. Yet this only works for those parts of the world that can be quantified.

For some parts of the world, like the atmosphere, the causes of its workings might be known. But the measurements might let us down. For instance, the theory of radiative transfer is quite well developed, down the photon-by-photon level. In the real atmosphere, though, there are a mighty number of photons, such that there is no hope whatsoever that they can all be measured. The same is true for all other sources of and causes of heat and moisture transport in the atmosphere.

On the other hand, there are fundamental limits built into the world. Such as predicting position and momentum of certain particles. The idea of “entanglement”, for instance, needs a better developed philosophy, which of course no model can learn, since it doesn’t yet exist. Real minds are needed for this.

Measures, then, can sometimes necessarily be limited, even in fully causal models. Which means these models must be “shrunk” and can be no better than approximate, even if every physical function that ties every part of the atmosphere (and land plus ocean) is known or discerned by some AI.

The Sum

The conclusion is that, predictively, AI will be remarkable when full causes and conditions of the thing predicted are known, and will do no better than real intelligence, and could do a lot worse, when these are not known, or when correlations are supposed to be causations because (and people do say this) they were found on a computer. Still to come are reservations about over-loaded words like learning, judgment, reason, and intelligence itself.

Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.

PE Bird

The Trump vs. Kamala poem example raises some questions regarding the capabilities, constraints and philosophy/morality of AI.

Does ChatGPT "remember" what it has produced (I have yet to use it)? Can it be shown it's contradictions? Has someone asked it if can lie?

Does it matter that AI is being developed by some of the most amoral (being generous) people alive?

There is the famous Turing Test where answers/dialogue given by a black box cannot be distinguished from that of a human. But what kind of human with what kind of morality?

Expand full comment

3 replies

Robert Maxwell

Well done! AI (as you describe) has been on Wall Street for the better part of 30 years (to the extent that data inputs reside on a Bloomberg terminal). It’s existed for twice that long but became the dominate trade driver in the 2000s. It’s only more recently using LLM to read news and execute trades and spoofs automatically. I’d argue an earlier adopter there. There could be lot of insight to study electronic markets today to understand the broader cultural impact of AI in future decades.

27 more comments...

Science Is Not The Answer

Discussion about this post