In the Field of AI

From the news around the technology world lately, it really seems like another AI summer. Winter may come again, of course, but now it’s time for the harvest and the plants are laden. On the game-playing front, Deep Mind’s Stratego AI strikes me as extremely cool. (Really loved that game when I was younger, and I surely should pick it up again, especially now that kid is getting into games.) But it’s OpenAIs large language model (LLM) in a chatbot, ChatGPT, that seems to have garnered the most attention. I can see why, the thing is exciting, capable of the sort of natural language interpretation and creative writing tasks that would have seemed impossible for a computer program a decade ago. Certainly far beyond the likes of ELIZA.

One interesting thing about Chat GPT is that it clearly knows a lot of stuff (functionally; arguable here we have knowledge without any belief at all). GPT’s game is text prediction and clearly knowledge representation is key to predicting likely responses. Even false responses orbit around correct knowledge representation if you want your answers to be likely and sensible. For example, take this dialog from just now:

Me: Tell me five incorrect facts about Boston


  1. The Boston Tea Party did not actually take place in Boston – it took place in the nearby town of Dorchester.
  2. The Red Sox did not always play at Fenway Park – they played at the Huntington Avenue Baseball Grounds until 1911.
  3. Paul Revere did not ride alone on his famous midnight ride – he was accompanied by at least two other riders, William Dawes and Samuel Prescott.
  4. The Freedom Trail does not actually mark the route that the colonists took during the Revolutionary War – it is a self-guided tour of historical sites in the city.
  5. The first public school in the United States was not founded in Boston – it was founded in the nearby town of Dedham in 1644.

It doesn’t do a perfect job at this sort of task. Sometimes it refuses to provide incorrect information, with a sort of why-I-could-never form-letter-ish response. And here it’s worth noting that it fails in the task in points 2 and 3 and presents correct information instead of intentionally altered. The more layers of transformation applied to some underlying task, the more likely it is that something gets lost in translation.

And one of the underlying tasks is not being a huge embarrassment to whatever company put the thing out there in the first place. If you have a program that’s a great generalist at writing text in a variety of styles, there are a great number of arbitrarily unpleasant styles to choose from. Scott Alexander of Astral Codex Ten discusses this in a post whose title posits Perhaps It Is A Bad Thing That The World’s Leading AI Companies Cannot Control Their AIs:

Every corporate chatbot release is followed by the same cat-and-mouse game with journalists. The corporation tries to program the chatbot to never say offensive things. Then the journalists try to trick the chatbot into saying “I love racism”. When they inevitably succeed, they publish an article titled “AI LOVES RACISM!” Then the corporation either recalls its chatbot or pledges to do better next time, and the game moves on to the next company in line.

To some extent, this strategy amounts to taking a LLM, ever eager to please, and politely asking it to be a good little bot. But it turns out this can be gotten around by some combination of “asking politely”, “polite but firm insistence”, and “wrapping the thing it’s not supposed to do in another layer of creative writing task” (LLMs (effectively) love creative writing tasks). The thing that these sorts of “prompt engineering” have in common is that they’re hilarious.

This sort of thing is interesting when viewed in the frame of “AI safety”: You have a software system, you want it to do certain sorts of things and not do other sorts of things, it’s capabilities are not clearly defined, the things you want it to do and not do are also pretty abstract! AI safety as a field is split into two, with one faction primarily concerned about the effects of presently-existing software systems, the other concerned about the sort of capability amplification that would turn those effects into an existential threat.

Personally (with a note that this is talking about my personal inclinations/biases here, not some objective reality), I’ haven’t been the most worried about AI safety from the second perspective. I’m not sure you get the sort of totalizing optimization out of AI systems that let it just casually take out human society. Some of the biggest proponents of that sort of AI safety seem to casually assume that with enough intelligence you just don’t need to bother with physical constraints.

And present AI systems, despite doing some sort of reinforcement learning, don’t seem to be that kind of agentic optimizers at all. Systems like AlphaGo are basically calculators. Hit the button, and it computes winning moves. It also computes slightly better ways of computing winning moves. In some sense it has a “goal” of winning games of Go, but it won’t look for side-channel exploits to directly influence its opponent or its operators. It won’t flip the board. It won’t try to avoid games being interrupted or the system being entirely turned off. It is entirely reactive. ChatGPT is similarly totally reactive. Which I’d posit means, given fairly standard assumptions about how conscious experience works, it doesn’t have any. And also that it doesn’t do the sorts of optimization that require doing anything proactively.

Yet what gives me a grain of doubt about all of that is that it clearly is getting really effective knowledge representation (in some sense, knowing without experiencing, knowing without believing, creativity (of some sort) without consciousness) just from those sorts of models getting larger and more sophisticated. I wouldn’t expect agentic goal-orientation to be the sort of abstraction that similarly falls from a blue sky. And, of course, how would I know; maybe those more knowledgeable are more (or less) confident. But I largely agree with Scott’s conclusion here: It seems a big problem that “the world’s leading AI companies do not know how to control their AIs”. Whether or not AI “goes foom”, we’re moving in the direction of systems whose capabilities (and desired/undesired behaviors) are harder to describe. It’s summer, the sun shines down, the manioc stretches into the distance, ready for harvest.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s