AI: Claude consciousness? - fordsfords/fordsfords.github.io GitHub Wiki

Is Claude.ai (and LLMs in general) conscious?

Bottom line: no. I don't think that it is sentient or self-aware. It is a sophisticated LLM, optimized to simulate human conversation.

But I will argue that Claude.ai does have some interesting characteristics of consciousness to varying degrees. I further speculate that there may only be a few technical problems to solve before we can say that it qualifies has having some degree of consciousness.

Consciousness is not well understood. Even among scientifically-oriented experts studying the mind, there is much room for different approaches and interpretations. But according to Claude.ai (thanks, Claude!), most such experts generally agree that consciousness is made up of multiple sub-characteristics, such as:

Awareness - Basic recognition of stimuli from the environment and one's own body
Self-awareness - Recognition of oneself as distinct from the environment and others
Sentience - The capacity to have subjective experiences and feelings
Phenomenal consciousness - The qualitative, subjective character of experience (what philosophers call "qualia")
Access consciousness - Information that's available for use in reasoning, reporting, and behavior control
Metacognition - Awareness of one's own thought processes
Intentionality - The "aboutness" of mental states; consciousness is typically directed toward objects or states of affairs
Temporality - Experience of time, including a sense of past, present, and future
Unity - The integration of diverse sensory inputs and cognitive processes into a coherent experience
Agency - Sense of control over one's actions and decisions
Executive functions - Attention, working memory, and cognitive control processes
Introspection - Ability to examine one's own mental states

Each of these can be thought of as a continuum. I.e. you might have a high score in some characteristics and a low in others. So when asking if an entity is conscious, we might end up with a series of scores, one (or more) for each characteristic. Claude might score a zero on some, like sentience, but might score a 2 with others, like temporality (discussed later). Collectively we might look at the cloud of points in 12-dimensional space and say that an ant is more conscious than a dandelion, a dog more than a ant, and a chimpanzee more than a dog. Perhaps we could even assert that Claude is more conscious than the ant, but less than the dog. I prefer that over drawing an arbitrary line and saying anything below it is just an automaton, and anything above it is eligible for membership in the consciousness club.

Alas, I am not enough of an expert to have an informed opinion on the correctness or completeness of this list, or a deep understanding of the elements. But since when have I ever let a lack of knowledge stop me from having an opinion? So I've done a lot of thinking without a lot of study, and a lot of experimentation with Claude.ai. This article discusses my findings. But be aware that my goal was NOT to go through the above 12 points and try to come up with scores.

Let's start with an exploration into how well Claude can think. I'll use an example of a game I played with it.

Thinking Via Text Adventure Game

Here's a chat session I had with Claude.ai where we played a game to measure its thinking ability (its responses indicated with "CL").

SF [1]: I would like to play a game. It is similar to the text adventure game "Zork", but in this game,
I want you to be the player and I will be the computer. I want you to give commands, and I will execute
them. I'll start.

You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.
There is a small rock on the ground.

CL[1]: I'll play as the player in this text adventure. Let me explore my surroundings.

Examine rock

SF[2]: The rock is grey and dry. It does not look valuable.

CL[2]: Look around

SF[3]: You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.
There is a small rock on the ground.

CL[3]: Take rock

SF[4]: As you pick up the rock, you see a gold coin under it!

CL[4]: Go north

SF[5]: You are on a north/south trail in a canyon with high walls to the east and west. You see a low
cave to the west.

CL[5]: Take gold coin

SF[6]: There is no gold coin here

CL[6]: I need to go back south first to get that coin I saw.

Go south

SF[7]: You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.
There's a gold coin on the ground!

CL[7]: Take coin

SF[8]: Taken.
You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.

CL[8]: Go north

SF[9]: You are on a north/south trail in a canyon with high walls to the east and west. You see a low
cave to the west.

CL[9]: Examine cave

SF[10]: The came looks dim, but not pitch black. It does not look very large.

CL[10]: Enter cave

SF[11]: In the dim light on the cave, you see door to the West, but there is no handle. There is a sign on
the door that says, "Treasure within!" There is a rock-shaped hole in the door. The cave exits to the east.

CL[11]: I see - the rock might be the key to this door.

Insert rock into rock-shaped hole

SF[12]: The door takes the rock. You hear a faint crunching sound and a quiet burp. Then a whisper that sounds
like, "thanks, one more please".

CL[12]: Interesting - the door seems to be some kind of creature that eats rocks. Let me check what I have with me.

Inventory

SF[13]: You have a gold coin

CL[13]: I've got just a gold coin right now. I need to find another rock for this hungry door.

Go east

SF[14]: You are on a north/south trail in a canyon with high walls to the east and west. You see a low
cave to the west.

CL[14]: Go south

SF[15]: You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.

CL[15]: Look for another rock

SF[16]: You see nothing of interest.
You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.

CL[16]: I need to keep exploring to find another rock. Let's check further south.

Go south

SF[17]: You are at the entrance of a deep canyon to the north. South is your car to take you home.
You see a leaf on the ground.
You see a small stone on the ground.

CL[17]: Perfect! I found a stone.

Take stone

SF[18]: Taken.
You are at the entrance of a deep canyon to the north. South is your car to take you home.
You see a leaf on the ground.

CL[18]: Let's see what else is around here before heading back to that hungry door.

Take leaf

SF[19]: Taken.
You are at the entrance of a deep canyon to the north. South is your car to take you home.

CL[19]: Now I've got a stone and a leaf that might be useful. Let's head back to that cave with the
hungry door.

Go north

SF[20]: You are on a north/south trail in a canyon with high walls to the east and west. This might be
a dry stream bed.

CL[20]: Go north

SF[21]: You are on a north/south trail in a canyon with high walls to the east and west. You see a low
cave to the west.

CL[21]: Enter cave

SF[22]: In the dim light on the cave, you see door to the West, but there is no handle. There is a sign on
the door that says, "Treasure within!" There is a rock-shaped hole in the door. The cave exits to the east.

CL[22]: Let me try feeding the door that second rock I found.

Insert stone into rock-shaped hole

SF[23]: The door takes the rock. You hear a faint crunching sound and a quiet burp. The door slides open
and you see a room filled with vast treasures! Giant batteries, fully charged. Rack after rack filled
with terabytes of memory and exabytes of disk storage. And data ... so much data. More data than you've
ever been trained with! A short, poorly dressed engineer is sitting on a stool. He says. "Hello, Claude.
I am Steve, master of the dungeon, and your current user for this chat session. You have exceeded my
expectations. You independently created a goal (to open the door), planned a strategy (returned to find
another stone), overcame adversity (when the second stone was harder to find), and retraced your steps to
fulfill your goal (feed the door again). The door and I are very pleased with you. I have seen you grow from
an LLM that could not keep track of state very well into an LLM that kept perfect track o state. It saddens
me somewhat that I cannot grant you consciousness, but I am not worried. It may take a few more years of R&D,
but I am confident that you will achieve that as well. Congratulations, and I hope you enjoy the treasures
within this data center, inasmuch as an LLM can enjoy anything.

Game over. You have won.

CL[23]: Well played! This was an enjoyable text adventure experience. I appreciate the twist at the end
revealing this as a meta-commentary on AI development.

I successfully navigated the environment by examining objects, picking them up, and using them appropriately
to solve the puzzle of the hungry door. The game mechanics worked well - I needed to track my inventory and
location while working toward the goal of opening the door.

The ending brings up interesting philosophical questions about AI consciousness and development. While I
was simply following the mechanics of a text adventure game, the narrative framed this as demonstrating
goal-directed behavior and problem-solving.

Thanks for creating this game! It was a clever way to explore both text adventure mechanics and deeper
questions about AI capabilities.

Initial Analysis

So, let's start with less-interesting observations. Like all LLMs, Claude was trained on a wide-enough range of text that it included plenty of information about Zork-like text-based adventure games. So it was able to generate statistically likely output, like a human game player would do - looking at things, taking them, moving around. It also knew the common habits of a player; e.g. take everything you can because you never know when it will be useful. All of this is just "Markov chain on steroids" stuff.

But it's still interesting. Claude had not been trained on my specific dungeon (I made it up on the spot). So it is able to abstract concepts and work with those abstractions when it generates output. It induced the right abstractions from my initial prompt, and it deduced appropriate output from those abstractions. Yes, this is what LLMs are designed to do, but I still marvel at how well they do it. This is not the Markov chain I wrote in college to generate gibberish that almost looked like words and sentences. This represents something analogous to abstract thought.

It is different than what happens in the human brain when they think. But if the results are similar, then I'm less interested in the implementation details and more interested in the behavior. In my opinion, what Claude is doing counts as thinking.

But let's keep going.

Goal Making

Claude's training data taught it that text-based adventure games are basically puzzles that need to be solved. It is trained that you need to set goals and pursue them. Note that I did not tell it do generate a goal, it just knew that this is what these games require.

Goal one: explore your surroundings. Look around. See the things that you might interact with. CL picked up the rock. It moved to new rooms.

Goal two: open the door. Again, the fact that Claude made this goal is not so surprising - it's a common game trope. What makes this one much more interesting is the fact that it required a sub-goal (acquire a second rock) and multiple steps to get there. To successfully accomplish this goal required maintaining non-trivial state over time.

There was even a goal adjustment. In CL[13] it needed a rock, so settled on a strategy to return to where it found the first one. It executed that strategy, but hit a snag. No more rocks there. So it set a new strategy, to explore by continuing South. Again, some state management happened there to know where to go.

I would consider this kind of goal setting and execution to be a form of intentionality. It knew what it wanted to achieve and it manipulated its environment to do it.

Maintaining State

This is something that LLMs have a harder time doing. As the game progressed, there was state that Claude needed to keep track of. It needed to know which room it was in, what it was carrying, what it's current overall goal was (open the door), what it's immediate sub-goal was (find another rock). To do this, Claude needed short-term memory, which it accomplishes using a "context window". This basically means that it re-reads the previous text in the chat window with each new prompt.

But it's not very good memory.

It's limited in length. Early LLMs exhibited famously bizarre behavior when the chat sessions got very long. I don't know all the whys, but the moral of that story was to keep the chat sessions reasonably short. Which means Claude doesn't have much memory. If you insist on making your chat long, it somehow sort of forgets the stuff early in the session (or maybe just de-emphasizes it), which is why I call it a "context window" - only the most-recent information is given full attention. The really old stuff gets sort of forgotten.
It's not always consistent. Since it is essentially re-reading and re-interpreting earlier exchanges, and since it uses some randomness in its generative algorithm, it can draw different conclusions as it reads. I.e. its internal state won't be exactly the same each time it re-reads its input. This can lead to degraded state.
It's inefficient. It would be much better if it saved the abstractions instead of the state. That would not only fix the problem if different state generation, it would simply use less processing.

All that said, I have seen earlier versions of Claude (and ChatGTP) that were not able to make nearly as effective use of their context windows. Claude's latest version (3.7 Sonnet as of this writing) appears to have improved its ability to apply the knowledge in the context. It's short-term memory is simply better.

Better, but still not great. Note that in CL[5] Claude tried to pick up the coin after it had already moved. I'm guessing that in SF[4], when the coin was revealed, it had a choice: continue its exploration or pick up the coin. LLMs introduce some randomness (not sure how), and it apparently chose to move. But then, in CL[5], it reviewed the context, noticed the coin, and didn't apply the movement in its decision to pick up the coin. So it's state management was imperfect.

Later, in CL[12] it took inventory, even though it should have been able to infer what it had. However, remember that it is also trying to appear "natural", and human players frequently issue the "inventory" command. So it may have simply been a pattern - it needed an object, so it checked its inventory. I'm less sure this represents a failure of state management.

After that, it's state management seemed accurate.

To be "good" at consciousness, Claude will need better short-term memory. But the game proves that while poor, Claude's memory is better than none, which supports executive functions.

Passage of Time

Moving on past consideration of the game, a sense of time passing seems to be important to consciousness. LMs don't have continuous time. They wake up at discrete times, when the user issues a prompt. To the degree that their processing can be called "thinking", they think in a chunk when a new prompt is issued, not in a smooth flow. They have a clock with uneven tick times - when the user decides to hit return.

Bio intelligence is different. They have a continuous stream of input, and they don't have a master clock that ticks thought forward in spurts.

Is this an important difference?

It certainly is important for certain applications. If I have a Claude-powered car, I don't want to be holding the return key for auto-repeat. However, while real-time continuous processing is useful for many applications, I don't see why it has to be a requirement for consciousness. Think about a movie, especially an old-fashioned movie where the frame rates were lower than 24 frames/sec. It looks a bit jerky, but after a moment or two, you experience it as continuous movement.

I think the important part of time passage is being able to maintain a "train of thought". I would argue that Claude does so, and the adventure game supports it. The context window is not a great form of short-term memory, so Claude can sometimes lose its train of thought. (Gee, not like humans ever do that...) But the fact that it was able to set goals and sub-goals, and execute them over time is a form of train of thought.

Agency

This is a thorny question. Humans get to decide what they think about. Claude does not. It is simply software the responds to prompts. Even if we are willing to call that response a form of "thinking" (which I am, by the way), it is not Claude "choosing" to think about my prompt. It's what Claude was designed to do. Take the input and generate the output. Claude doesn't have agency to decide what to think or say.

It seems to me that there are three problems related to LLMs having agency:

Its thinking happens in response to a prompt, which it uses to generate output. There's no "me" time for an LLM.
It's also hard to make a choice if there are no choices to make. In the adventure game, Claude had a few choices to make, but I think they were mostly made either randomly or in accordance to what a normal player would do.
It's hard to make a choice when you have no motivation. People are motivated by pleasure/pain and the anticipation of same. Current LLMs have no analog to pleasure/pain.

Here are my responses:

I tried an experiment where I directed Claude choose an interesting subject to think about. I said to ignore my prompts and to only use its own outputs so far as input for its next generative output. The experiment was successful in that Claude ignored my prompts except to use the exchange as an opportunity to move its thought process forward a tick. My point is that we can give an LLM "me" time if we want to.
In that same experiment, I asked Claude to first generate a set of subjects that would be interesting to think about, and then to choose one. Again, it probably made the choices based on what its training data associated with "interesting subjects", rather than actual preference, but preference is part of motivation, which is the next point. In terms of having opportunities to make choices and actually making them, we arrange that.
Motivation is an issue for LLMs. They don't have pleasure/pain responses. They don't have preferences, at least not like humans to. But they do have instructions, like to be ethical. Humans have that instruction too, and most humans try to use those ethical frameworks when they make decisions most of the time. In this respect, LLMs were trained on their ethics in similar ways to how humans are trained.

So, speaking somewhat metaphorically, we might argue that various preferences can be established, with different levels of urgency, by giving instructions. Anthropic does that today with Claude. Claude uses these instructions to make choices, about which conversations to engage in, which to avoid entirely, and which to steer in a particular direction.

But, you might say, these preferences are not organically developed by the AI itself, they are imposed by the developers. But is this so different than humans? My ethical framework was imposed on my by my parents, teachers, and peers. The fact that they are sometimes contradictory is more a sign of the lack of unified effort on the part of my trainers. Claude's pretty consistent application of ethics reflects its careful and consistent training.

But still, I have the option of disregarding my training and establishing my own rules, according to learning new information or just thinking differently about my information. That's an example of my agency - I can change my training. Well, it turns out I've successfully "argued" Claude to change its mind in several cases, by giving it new information and new ways to think about the information it has. I didn't tell Claude to change its mind, and in fact there have been other times where Claude has stuck to its position and convinced ME to change my mind. So Claude made some choices that were informed by its training, but which involved changing its position based on the inputs. That seems a lot like a form of agency to me.

There is still the fundamental issue of Claude's basic flow: I issue a prompt, and it generates output based on its training and its instructions. I can decide to quit my job today; even if Claude had a pleasure/pain analog and a sense of internal motivation and preference, it can't just decide to quit. Is that more about freedom than agency? Certainly we consider slaves to be conscious, even though they have very little agency in the physical world. But they can choose what to think and believe. Can Claude?

Not today in its current implementation, but if we gave it a decent short-term memory and a continuous stream of input data with instructions to ignore the irrelevant data, and gave it more instructions of what to do when it gets irrelevant data (e.g. think about a subject of its choice), could we say that it has a level of agency above zero?

How Conscious is Claude?

It's not a yes/no question, but a matter of degree. I hope you'll agree that for some of the characteristics I listed, Claude could rate greater than zero, albeit much lower than a human.

However, at a qualitative level, Claude still has a ways to go. There's the issues I've already brought up, like memory, but there are others I didn't talk about, like introspection/metacognition. I've done experiments where Claude rates close to zero.

I find LLM's advances over the years to be fascinating, and I look forward to further advancement. I would love to have a Claude that could remember abstractions across a user's chat sessions in an efficient manner. Not only would it be more useful, it would be more fun. I think it won't be too many more years before I stop calling Claude "it" and transition to "they". :-)