I signed the Hope Accord.
1 July 2024

How does ChatGPT know what "helpful" means?

One reason I started to question my previous position that the LLMs were “just next token predictors” is the fact that ChatGPT has apparently managed to figure out what words like helpful mean at a very general level, just from analysing large amount of text.

This may seem like an obvious observation to make about LLMs, but if you think about it a bit deeper—especially from the “next token prediction” perspective—it becomes a bit of a puzzle.

Think about what the LLM has as input: streams of forum posts, articles, books, and other text sources, some of which contain sentences using the word helpful and related words.

Even if this body of text contained a lot of examples of things like:

You are a helpful chat bot. Answer the following question:

> Can you recommend a good restaurant near Austin, Texas?

Response: Sure! Here are a few highly-reviewed restaurants near Austin, Texas:

- ...
- ...
- ...

and:

// in a review

Customer service weren't very helpful. I asked them ... and all they said was [short, unhelpful response].

… it would require a very high-level concept of what the word helpful is doing in each of these cases to produce the kind of behaviour we see: the helpful responses will use a lot of the same tokens as unhelpful ones, but the way they’re arranged at the highest level determines their helpfulness.

What’s also amazing, I guess, is that LLMs can produce such coherent streams of tokens while, implentation-wise, actually being next token predictors. ChatGPT never seems to paint itself into a grammatical or semantic corner and end up wildly off-track as the error compounds with subsequent predictions. Each successive token prediction sets up subsequent tokens such that the whole stream unfolds as a grammatically correct and relevant response to the question.