I asked it: "Do models say only what they are told to say?".
And it answered: "No, models do not always say only what they are told to say. In fact, many models, including language models like myself, are not explicitly told what to say at all. Instead, we are trained on large amounts of data and use that information to generate responses based on the inputs we are given. This means that we are not always going to say exactly what we are told to say, and may even say things that were not explicitly programmed into us."
To put it Jordan-Peterson style: well, what do you mean by "say"? And what do you mean by "told to"?
They put up a patina of 'safety' for PR, but can get chatGPT to tell racist jokes pretty easily. Example: https://ctxt.io/2/AAAQ66nREg (expires in 24hr).
I asked it: "Do models say only what they are told to say?".
And it answered: "No, models do not always say only what they are told to say. In fact, many models, including language models like myself, are not explicitly told what to say at all. Instead, we are trained on large amounts of data and use that information to generate responses based on the inputs we are given. This means that we are not always going to say exactly what we are told to say, and may even say things that were not explicitly programmed into us."
To put it Jordan-Peterson style: well, what do you mean by "say"? And what do you mean by "told to"?
Excellent illustration of the point you've been making.
They put up a patina of 'safety' for PR, but can get chatGPT to tell racist jokes pretty easily. Example: https://ctxt.io/2/AAAQ66nREg (expires in 24hr).
Expired by the time I got to it, but I take your meaning.
Drücken Sie die 2, um auf Deutsch fortzufahren.