5 Comments

I asked it: "Do models say only what they are told to say?".

And it answered: "No, models do not always say only what they are told to say. In fact, many models, including language models like myself, are not explicitly told what to say at all. Instead, we are trained on large amounts of data and use that information to generate responses based on the inputs we are given. This means that we are not always going to say exactly what we are told to say, and may even say things that were not explicitly programmed into us."

To put it Jordan-Peterson style: well, what do you mean by "say"? And what do you mean by "told to"?

Expand full comment

Excellent illustration of the point you've been making.

Expand full comment

They put up a patina of 'safety' for PR, but can get chatGPT to tell racist jokes pretty easily. Example: https://ctxt.io/2/AAAQ66nREg (expires in 24hr).

Expand full comment

Expired by the time I got to it, but I take your meaning.

Expand full comment

Drücken Sie die 2, um auf Deutsch fortzufahren.

Expand full comment