> thoughts on agi & llms

10 July 2022

Thoughts on AGI

Update 28th April 2023

In the Cells section below I describe a possible approach to modelling an organism with “Cell” objects which would send signals to each other. It occurred to me that the current standard of neural network architectures may be the logical result of that line of thinking—I suppose the distinction in my mind was that the inputs and outputs to the cells would be a simulation of the world in real time (or something analagous). I think the ability to affect the world and get immediate feedback from your actions might be an important element of general intelligence, and this seems more likely than traditional NN architectures to facilitate that.

I’ve updated my thinking on large language models somewhat since writing this. I was initially a bit too dismissive of the possibility that they might be representing concepts, in some sense, as opposed to just being good at prediction. I still think they’re nothing like true AGI, but certain emergent abilities—for example, the ability to follow instructions given in a language other than that of the main training set—certainly give one pause for thought. I’m now much more open to the idea that there might be something interesting going on even if it isn’t AGI.

Complex questions are easier to think and talk about the more concrete they are, and there’s nothing more concrete than working code. If you’ve written code yourself you might also be familiar with the power of just writing anything to break through mental fog and begin to clarify a problem. Suspending disbelief that the problem is tractable for long enough to make some small steps toward solving it can be surprisingly fruitful.

Programs take input and produce output, and the usual unit of abstraction is the function. A function accepts some input, does some calculations, and returns some output. The question of where the input comes from and what happens with the output is left undefined, so we can focus on the logic. Here’s a JavaScript function called “agi” that takes no input, does nothing, and produces no output:

function agi(/* names of input variables go here */) {
	/* logic and output go here */
}

Before we can start writing the logic, we need to know what form the input and output will take. For a function called “addTwoNumbersTogether” this is trivial; the input will be two numbers and the output will be another number. The logic is also trivial as all CPUs and programming languages have arithmetic functions built in:

function addTwoNumbersTogether(a, b) {
	return a + b;
}

For an AGI it’s less clear, and we have to make a somewhat arbitrary decision based on what our idea of AGI is. If we were trying to pass the Turing test we might decide on text as both the input and output, so our function can be used to respond to natural language prompts with natural language responses:

function agi(prompt) {
	return `Interesting! I'd never thought about "${prompt}" before.`;
}

Or to retain context:

function agi(entireChatHistory) {
	return `Hmm, I'm not sure about that one. I might have to get back to you, as all this talk about ${entireChatHistory} has made me sleepy.`;
}

Obviously, these chat bots would not pass the Turing test. For them to pass, the outputs would have to depend on the inputs in more complex ways than merely being echoed back in the responses. What would that look like? We’d have to address some simple test cases, such as responding to a greeting with a greeting:

function agi(entireChatHistory) {
	if (entireChatHistory === "Hello") {
		return "Hello";
	} else {
		return `Hmm, I'm not sure about that one. I might have to get back to you, as all this talk about ${entireChatHistory} has made me sleepy.`;
	}
}

Now our program has some intelligence to it, but will quickly revert to echoing the entire chat history once the history contains both the human’s and the program’s “Hello”s. What if the person’s next prompt is “How are you”? Let’s add a check for that:

function agi(entireChatHistory) {
	if (entireChatHistory === "Hello") {
		return "Hello";
	} else if (entireChatHistory.endsWith("How are you?")) {
		return "Fine, thanks. How are you?";
	} else {
		return `Hmm, I'm not sure about that one. I might have to get back to you, as all this talk about ${entireChatHistory} has made me sleepy.`;
	}
}

It’s clear that to make something that could pass off as intelligent we’d need a lot more ifs and elses. The current approach also breaks down if there are even slight variations in the way prompts are written, such as “How are you?” vs. “How’s it going?” or even “How are you?” vs. “how are you?”—not to mention the fact that the computer will always be “Fine, thanks” and will never get impatient, no matter how many times in a row you ask it how it is. If we don’t want to write an infinite number of if-else branches to cover all of these possible scenarios, we’ll need to take a more considered approach.

How do humans know what to say to each other? Can we just copy that calculation somehow? If we don’t want our code to have an explicit, hard-coded condition that says “if the person says Hello, say Hello back” then how else can we make that happen? One way would be to encode everything we know about human conversation at a slightly higher level of abstraction, so instead of “if Hello then Hello” and “if How are you? then Fine, thanks”, we’d say “if greeting then greet back” and “if question then answer”.

Before we start adding more complex logic, let’s assume we have access to both the most recent prompt and the entire chat history, for convenience, and refactor our code a bit:

function agi(entireChatHistory, lastPrompt) {
	if (lastPrompt === "Hello") {
		return "Hello";
	} else if (lastPrompt === "How are you?") {
		return "Fine, thanks. How are you?";
	} else {
		return `Hmm, I'm not sure about that one. I might have to get back to you, as all this talk about ${entireChatHistory} has made me sleepy.`;
	}
}

Let’s also assume we have some code available for figuring out sentence type and parts of speech; a large list of declarative truth statements; and a way of mapping a question prompt to the relevant proposition:

let knowledge = {
	colourOfSky: "blue",
	ownEmotionalState: "fine",
	// etc...
};

function calculatePromptType(prompt) {
	// return one of "statement", "question", "instruction", and "greeting"
}

function mapQuestionToRelevantKnowledge(prompt) {
	/*
	return value from the knowledge map based on prompt
	
	this somehow figures out that if the prompt is "How are you?"
	or "How's it going?" or similar, then the piece of knowledge being
	requested is "ownEmotionalState"
	*/
}

Our code can now look like this:

function agi(entireChatHistory, lastPrompt) {
	let promptType = calculatePromptType(lastPrompt);
	
	if (promptType === "greeting") {
		return "Hello";
	} else if (promptType === "question") {
		return mapQuestionToRelevantKnowledge(lastPrompt);
	} else {
		return `Hmm, I'm not sure about that one. I might have to get back to you, as all this talk about ${entireChatHistory} has made me sleepy.`;
	}
}

Note that there are still a few inflexibilities that will quickly expose our bot as a bot:

We always give the same greeting.
We don’t have any way of appropriately phrasing or dressing up answers to questions based on context and tone, so for example the question “How are you?” will always be met with the single all-lowercase word “fine”.
We still don’t take context into account—the entireChatHistory variable is never used—and will repeat the same answers to the same prompts ad infinitum.

It’s also far from guaranteed that the prompt type and relevant knowledge key can be calculated directly from the prompt, without any context or supporting common-sense knowledge. What pronouns like “it” refer to, for example, must usually be inferred from context and prior shared knowledge.

The further you go with this approach, the more you realise that you are basically back to the original if-else idea. Almost every rule has to be explicitly accounted for, or the system will make basic errors. Humans clearly have something else besides grammatical understanding and propositional knowledge that helps us think and communicate.

How LaMDA does it

AI researchers have long recognised the above challenges with developing convincing chat bots. Still, natural language conversations do seem to follow some surface-level patterns—when was the last time you thought you had a truly original conversation?—and it seems reasonable to suppose that those patterns could be encoded somehow in a computer program even if defining them all by hand would be impossibly long-winded.

This is the problem that so-called “neural networks” solve. Given a bunch of inputs and desired outputs—prompts and responses, say—you can choose a scheme to map the inputs and outputs to zeroes and ones, and then use a mathematical technique called back-propagation to create a set of “neurons”—functions that accept one or more inputs in the form of zeroes and ones, and output another zero or one either to be passed to the next neuron along or to act as the output—that, given those inputs, approximate the desired outputs.

Once trained on a large set of example inputs and outputs, neural networks can achieve a high degree of accuracy in dealing with novel inputs. For example, a neural network back-propagated with pictures and the true/false statement “this picture is of a cat” as examples can be reasonably accurate in classifying images that weren’t part of the example set as “cat” or “not a cat”. The input is the pixels of the image encoded as ones and zeroes, and the output is a single one or zero for “cat” or “not a cat”.

LaMDA is an enormous neural network whose examples consist of all of the text on the internet. Its input is one or more words and its output is a prediction of what the next word is likely to be, based on examples from the internet. It uses a clever neural network architecture to implement this—the structure of natural language makes text prediction an interesting engineering problem in its own right—but the principle of using neurons and back-propagation to map inputs to outputs is the same.

The problem of AGI has not been solved by LaMDA, but rather sidestepped. Instead of implementing an algorithm for intelligence, LaMDA uses an extremely large number of examples of text written by humans to produce something that looks intelligent, by being very well tuned to the surface-level patterns that occur in the examples. If you could have somehow influenced the relative frequency of words following other words on the internet when LaMDA was built, you would have influenced the “what word comes next” predictions that are now causing some people to question whether LaMDA is sentient.

Back to JavaScript

So we could fake it with a massive neural network, but that’s not what we set out to do: we want to create an actual AGI. Are there any other interesting avenues to explore? One try would be this: if basic grammatical understanding and declarative knowledge are not enough, maybe we need to go a level deeper and understand the meaning, intention, and emotional content of a prompt. If we can parse those from the input, maybe we can use them to drive the output. After all, humans are not text processing systems—we ultimately respond to prompts because we feel like saying something, not because of anything inherent in the structure of the prompt.

Here are some emotions:

Image credit: Emotion Wheel by Geoffrey Roberts.

Maybe coding an initial state where each of these emotions is set to some baseline (0 or 0.5, say), and then writing some rules for how different combinations of words affect emotions and vice versa, could help an AGI answer the question what do I feel like saying? The next approach I’m going to consider is to follow this line of thought for one emotion, the frustration/annoyance caused by repeated pointless questioning, and see if some patterns emerge that might help with the more general problem.

Here’s a map of emotions to their current level:

const EMOTION_BASELINE = 0; // scale goes from 0 to 1 and all emotions are 0 in a hypothetical AGI's "resting" state

let emotionalState = {
	helpless: EMOTION_BASELINE,
	frightened: EMOTION_BASELINE,
	concerned: EMOTION_BASELINE,
	worried: EMOTION_BASELINE,
	inadequate: EMOTION_BASELINE,
	inferior: EMOTION_BASELINE,
	worthless: EMOTION_BASELINE,
	insignificant: EMOTION_BASELINE,
	excluded: EMOTION_BASELINE,
	persecuted: EMOTION_BASELINE,
	nervous: EMOTION_BASELINE,
	exposed: EMOTION_BASELINE,
	
	betrayed: EMOTION_BASELINE,
	resentful: EMOTION_BASELINE,
	disrespected: EMOTION_BASELINE,
	ridiculed: EMOTION_BASELINE,
	indignant: EMOTION_BASELINE,
	violated: EMOTION_BASELINE,
	furious: EMOTION_BASELINE,
	jealous: EMOTION_BASELINE,
	provoked: EMOTION_BASELINE,
	hostile: EMOTION_BASELINE,
	annoyed: EMOTION_BASELINE, // this will be increased by repeated questioning
	withdrawn: EMOTION_BASELINE,
	numb: EMOTION_BASELINE,
	skeptical: EMOTION_BASELINE,
	dismissive: EMOTION_BASELINE,
	
	judgemental: EMOTION_BASELINE,
	disgusted_embarrassed: EMOTION_BASELINE,
	appalled: EMOTION_BASELINE,
	revolted: EMOTION_BASELINE,
	nauseated: EMOTION_BASELINE,
	detestable: EMOTION_BASELINE,
	horrified: EMOTION_BASELINE,
	hesitant: EMOTION_BASELINE,
	
	sad_embarrassed: EMOTION_BASELINE,
	disappointed: EMOTION_BASELINE,
	unseen: EMOTION_BASELINE,
	empty: EMOTION_BASELINE,
	remorseful: EMOTION_BASELINE,
	ashamed: EMOTION_BASELINE,
	powerless: EMOTION_BASELINE,
	grief: EMOTION_BASELINE,
	fragile: EMOTION_BASELINE,
	victimised: EMOTION_BASELINE,
	abandoned: EMOTION_BASELINE,
	isolated: EMOTION_BASELINE,
	
	inspired: EMOTION_BASELINE,
	hopeful: EMOTION_BASELINE,
	intimate: EMOTION_BASELINE,
	sensitive: EMOTION_BASELINE,
	thankful: EMOTION_BASELINE,
	loving: EMOTION_BASELINE,
	creative: EMOTION_BASELINE,
	courageous: EMOTION_BASELINE,
	valued: EMOTION_BASELINE,
	respected: EMOTION_BASELINE,
	confident: EMOTION_BASELINE,
	successful: EMOTION_BASELINE,
	inquisitive: EMOTION_BASELINE,
	curious: EMOTION_BASELINE,
	joyful: EMOTION_BASELINE,
	free: EMOTION_BASELINE,
	cheeky: EMOTION_BASELINE,
	aroused: EMOTION_BASELINE,
	
	energetic: EMOTION_BASELINE,
	eager: EMOTION_BASELINE,
	awe: EMOTION_BASELINE,
	astonished: EMOTION_BASELINE,
	preplexed: EMOTION_BASELINE,
	disillusioned: EMOTION_BASELINE,
	dismayed: EMOTION_BASELINE,
	shocked: EMOTION_BASELINE,
	
	unfocused: EMOTION_BASELINE,
	sleepy: EMOTION_BASELINE,
	outOfControl: EMOTION_BASELINE,
	overwhelmed: EMOTION_BASELINE,
	rushed: EMOTION_BASELINE,
	pressured: EMOTION_BASELINE,
	apathetic: EMOTION_BASELINE,
	indifferent: EMOTION_BASELINE,
};

Our code needs to figure out the meaning of the prompt, modify its emotional state accordingly, and then use some combination of the emotional state and the prompt to generate a response.

It’s hard to come up with ways of coding that without reaching for brute-force options like hard-coding a big list of responses that would be expected under various conditions, like:

let responses = {
	// ...
	questions: {
		ownEmotionalState: [
			{
				isDefault: true,
				response: "Fine, thanks", // we would still need to base this on our actual emotional state somehow
			},
			{
				response: "Why do you keep asking me that?",
				mostLikelyIf: {
					emotion: "annoyed",
					isGreaterThan: 0.05, // the range goes to 1, which is literally the most annoyed the AGI can be
				},
			},
			// ...
		],
	},
	// ...
};

The first of many issues to note about this code is that “annoyed” is not enough information to determine that WDYKAMT is the right response: what we’re really looking for is “annoyed because the person keeps asking me the same question, and I think they’re doing it just to annoy me”.

This introduces intention into the mix, which means that the AGI will need not only a record of its own emotional state but a model of the other person’s emotional state and intentions, at a high enough resolution to capture the concept of having fun at someone else’s expense. The AGI’s model of the other person needs to contain at least a rudimentary model of the AGI itself.

Something called the “four ears” model holds that there are four components to any verbal message:

Factual information
Self-revelation: implicit or explicit statements about the sender (“I’m hungry”, “what do you want for dinner tonight?”)
Appeal: implicit or explicit appeals to the receiver (“what time is it?”, “we’re out of milk”)
Relationship: implicit statements about the relationship between the sender and the receiver

Taking all of this into account, what should our AGI’s code and data structures look like? I would suggest the following high-level approach:

Parse the raw meaning, emotional content, four-ears components—including a high-resolution model of another person’s mind!—and linguistic style from the prompt.
Use them to update our emotional state and generate an appropriate raw meaning, emotional content, four-ears components, and linguistic style for the response.
Use those to generate a natural language response.

Or in code:

function agi(entireChatHistory, lastPrompt) {
	let promptDetails = {
		rawMeaning: getRawMeaning(lastPrompt, entireChatHistory), // meaning is context-dependent
		emotionalContent: getEmotionalContent(lastPrompt, entireChatHistory),
		fourEars: getFourEars(lastPrompt, entireChatHistory),
		linguisticStyle: getLinguisticStyle(lastPrompt),
	};
	
	emotionalState = updateEmotionalStateFromPrompt(emotionalState, promptDetails);
	
	let responseDetails = {
		rawMeaning: getResponseMeaning(emotionalState, promptDetails),
		emotionalContent: getResponseEmotionalContent(emotionalState, promptDetails),
		fourEars: getResponseFourEars(emotionalState, promptDetails),
		linguisticStyle: getResponseLinguisticStyle(emotionalState, promptDetails),
	};
	
	return generateNaturalLanguageResponse(responseDetails);
}

That’s a lot of undefined functions, some easier to imagine implementations for than others. Raw meaning and response generation—for trivia questions, at least—has been done fairly well by IBM’s Watson. Emotional content can be estimated reasonably accurately by neural networks. It doesn’t seem beyond the realm of possibility that some combination of Watson, emotion detection, and LaMDA—all acting as subroutines to the above algorithm—could produce something approaching a believable AGI chat bot.

If LaMDA, Watson, and other neural networks like emotion detection (ED) are not “actual” AGIs, though, would LaMDA + Watson + ED + half a page of JavaScript be one? It is becoming clear that there’s a large gulf between the current state of the art in AI and the kind of intelligence we use to communicate and navigate in the world.

Back to the current approach: is it feasible in principle to write down a set of rules for how different meanings, four-ears components, and emotional states should combine to produce a variety of human-like responses to arbitrary prompts? My sense is that yes, given some painstaking initial work manually writing down very specific rules, some patterns would emerge to make this a relatively tractable problem—though still difficult and time-consuming.

The Real World

Imagine an AI that could suggest a suitable date for you—not a recommender system that measured the cross-sections of some pairs of data points, but something that actually knew you well enough to know who you were likely to be compatible with. That would be impressive and useful, and it seems like as good a benchmark as any for deciding whether a program can be thought of as a real AGI.

A debate I keep having with myself goes something like this:

AI needs to be connected to the world in order to do anything useful
But we can simulate anything we want!
But then it wouldn’t really be connected to the world…

I suspect that one of the things we want when we say we want AGI is something that relates to the world in the same way that we do. An AI whose entire world consisted of textual symbols—no matter how computationally powerful—would not relate to the world in anything like the same way we do, so its answers to questions would likely be nonsensical in at least some important ways.

Finding a suitable date is about a lot more than language, so an AI whose whole world was text would have to be explicitly spoon-fed appropriate text in order to come up with a sensible answer. This would be like entering sums into a calculator for which you already knew the answers.

Cells

Natural language-based systems can be thought of as a high-level approach to the problem of creating intelligence: we see a complex emergent behaviour that obviously requires intelligence, and we try to replicate it using what we know or guess about how it works. What about a lower-level approach? Evolution builds life out of cells that are each smaller and simpler than the whole organism, so perhaps AI engineers could try something analogous.

Cells assemble proteins, release hormones, and emit electrical signals to communicate with other cells. As far as any individual cell is concerned the cells around it are black boxes—the hormones and electrical signals could be coming from anywhere—and the organism’s movement and behaviours emerge from the activity of cells en masse. Cells specialise into various types to perform different functions—creating skin; calcifying into bone; secreting acid for digestion; reacting to various stimuli from the outside world; and generating power for movement.

Cells are complex, but it seems at least worth playing with the idea that their interfaces might be simple enough for us to produce a workable simulation of them. We don’t necessarily need to worry about how DNA and mitochondria work under the hood if we can create an object that responds to hormone concentrations and pulses of electricity in a way that roughly mimics the functions DNA and mitochondria perform.

Say we created a simple model of a cell like this:

class Cell {
	constructor(neighbouringCells) {
		this.neighbouringCells = neighbouringCells;
		
		this.hormoneConcentrations = {
			adrenalin: 0,
			ghrelin: 0,
			testosterone: 0,
			oestrogen: 0,
		};
		
		this.recentSignalsReceived = [];
	}
	
	secreteHormone(hormone, amount) {
		for (let neighbour of this.neighbouringCells) {
			neighbour.increaseHormoneConcentration(hormone, amount / this.neighbouringCells.length);
		}
	}
	
	increaseHormoneConcentration(hormone, amount) {
		this.hormoneConcentrations[hormone] += amount;
	}
	
	receiveElectricalSignal() {
		// do stuff, depending on hormone concentrations
		// and recent signals (this could be thought of as something like an overall stimulation level)
		// different cell types would override this to implement cell-specific behaviours
		
		this.recentSignalsReceived.push(Date.now());
		
		if (recentSignalsReceived.length > 10) {
			this.recentSignalsReceived.shift();
		}
	}
	
	sendElectricalSignal() {
		for (let neighbour of this.neighbouringCells) {
			neighbour.receiveElectricalSignal();
		}
	}
	
	buildProtein() {
		// to be overridden by different cell types
	}
}

In this very basic scenario you could imagine that receiveElectricalSignal was the cell’s cue to do something that would affect the outside world—contracting a muscle, say—and specialised cells could call sendElectricalSignal in response to external stimuli.

If we were to create a large number of these objects and assign their cell types and interconnections randomly, the result would be a random noise of activity. It might flash on and off in a single cascade or it might crash the machine with an infinite loop of signal chatter. It would be highly unlikely to achieve a state analogous to biological homeostasis, in which many cells coordinate to keep the organism alive and functioning in an unpredictable environment.

It’s not beyond the realms of imagination that we might be able to get these objects to organise into a viable virtual organism—perhaps we could use an evolutionary algorithm—and if we did, maybe simulated hormone balances would be better suited to the love matching problem than a pure text processing system.

In taking this approach, though, we would massively increase the burden of figuring out how to connect the AI to the world. Assuming we’d used an evolutionary algorithm to generate the structure of the cells—and figuring that out manually does seem beyond the realm of possibility to me—we wouldn’t know which cells did what. The purpose and organisation of the cells—presumably numbering in the billions or trillions—would be opaque to us, so the question of “where to plug the wires in” in order to connect a webcam to use as an eye, for example, would be impossible to answer. Again, we can simulate anything we like—but then it’s not connected to the world and so is of limited use.

Conclusion

Evolution created humans through an extremely long incremental process involving things as crazy as two separate life forms merging into one and going from a single cell to trillions. At each stage there was an organism that successfully regulated its internal state and navigated in the world, responding to light, gravity, chemicals and electrical impulses and moving itself around accordingly. Systems for self-regulation and survival in the world is in some sense all we are, so it seems reasonable that this should have some bearing on the question of intelligence.

Right from the beginning—from when the molecules that would go on to be alive were just another part of the world—we’ve had an extremely high-bandwidth, direct connection to the world. Every evolutionary step toward what we now call intelligence was made because the resulting organism was better suited to surviving and self-regulating. For at least the last few hundred thousand years, intelligence has been kind of “our thing”—our evolutionary niche—and for at least the last few tens of thousands of years we’ve been refining it with culture, all while directly connected to the world.

To claim that our current tools of logic and fuzzy pattern recognition could be used to create an intelligence that’s relevant to the world while being cleanly disconnected from it sometimes seems preposterous and at other times tantalisingly plausible. If I had to place a bet, I’d say that if you want human-level intelligence, you’re going to need a human.

Some More Thoughts

This post asks more questions than it answers, and many of these would make good topics for future posts. Here are a couple of points I’d like to revisit with a more thorough treatment.

Comparing Flexible with Hard-Coded Approaches

It’s interesting to think about the differences and similarities between a hard-coded approach with if-else statements and explicit references to high-level concepts (ownEmotionalState, fourEars), and the lower-level or bottom-up “cell” approach.

Bottom-up approaches seem more likely to be able to exhibit surprising, emergent behaviours and adapt to novel situations.
Both are ultimately executed as a series of logical operations on “meaningless” symbols.

Possible-in-Principle vs. Useful or Desirable

Leaving aside questions of quantum/non-computational theories of intelligence/consciousness, there’s a compelling argument that human-level intelligence must be achievable by a computer in principle, if only by simulating the brain neuron-for-neuron (and bath-for-bath of neurotransmitters and other chemicals).

That would leave only the question of whether such a system would be useful or desirable. Given how finely tuned the human body is this seems unlikely as a path to either a) exceeding human intelligence and capabilities or b) avoiding any of the frailties and limitations of our biology. Barring those, there is no benefit to such a system versus the traditional method of creating new intelligence—having kids.

Intelligence vs. Consciousness

I’ve used various senses of “intelligence”, “true AGI”, and “consciousness” throughout this post in order to talk about a wide range of topics without spending too many paragraphs on definitions. As well as properly delineating these terms, a more thorough investigation would do well to choose a single metric for “true AGI”. Here are a few I would find interesting:

A chat bot that’s driven by meaning as opposed to pattern-recognition or other tricks.
A self-driving car that could never make a robot-like error such as mistaking a lorry for the sky.
A system that could, without any explicit knowledge having been programmed in, be taught to perform tasks like loading a washing machine or buying groceries.

Each of these calls back to my idea that what we want when we say we want AGI is something very human; something that relates to the world in the same way that we do. A good question to explore would be whether this desire can be fulfilled at all or is self-contradictory: perhaps by the salient definition of relate to the universe, only humans can do it the same way as other humans.