When Will AI Kill Us All? - AdrienLarere/Ideas GitHub Wiki

Some definitions at the end

Will AI Kill Us All?


Here are my core assumptions.

  • An AI can have its own goals, different from ours. While it could want to kill us because we'd try to switch it off, or because it would be afraid we'd create another AI to rival it, it wouldn't need to kill everyone to do so. My guess is that humanity would be wiped out simply because we'd be in the way. We take space, and from the AI's perspective that space could be better used for other stuff. If you convert earth into a factory, everything that is not your factory probably dies.
  • We currently do not know how to reliably instill our values inside of an AI. Despite our best efforts, AIs do not always behave in ways we approve of.
  • We will most likely not figure out alignment in time because we (1) are in a capitalist society that struggles to cooperate (2) are in a multipolar world (3) have enough scientists in denial about AI X-risk. This leaves us in a place where, to quote the oft-repeated Eliezer quote: "Alignment progress goes like that [moves hand slowly], capabilities progress goes like that [moves hand fast]".

I think that if we lived in a one-country world, we would have less of a problem. Enough people might agree that AI is an existential risk, and the government might decide to forbid large clusters or straight-up any AI research that is not extremely narrow in scope.
However, in a world where you, as the American or the British lab, fear China (and are also biased in favor developing capabilities because it makes you a trillionaire) you can convince yourself and your government that we should accelerate towards AI because you are more afraid of China getting it than anyone getting it. Thus you build AI that is not safe.
It looks pretty good to you in training, for a few years you are a trillionaire, and then you're dead.

With LLMs getting smarter at each new release, it looks like we'll get AGI/ASI between 2025 to 2027. Then it's Goodbye World...

...is a good summary of what I have believed for the last 5 years. Over the last few months however, my beliefs have changed. What's changed?

About the fundamental existential risk of AI (aka P(Doom)/aikilleveryonism) - No.

About our difficulties to cooperate as a species - No.

About us eventually building an AI that will kill us all - No.

About LLMs killing us all - Maybe! That's where I've changed my mind.

What gives?

My beliefs, as of late:

  • LLMs alone will not kill us all (80% confidence)
  • While commercial uses of LLMs will grow and an abundance of LLMs R&D will flourish, research into novel AIs will slow down - perhaps not to another AI Winter, but to an AI fall (50% confidence)

Why did I move away from believing LLMs (think GPT-7) would kill us?

It's a mix of my personal experience using LLMs (mostly GPT4/4o and Claude 3.5 sonnet) and of my reading/listening to people working on or close to AI - like Yann LeCun, David Shapiro, Leopold Aschenbrenner. You could say that I come at it from an Eliezer Yudkowsky perspective - quite Doomy.

I previously modeled the progress rate of the LLMs, from one iteration to the next, as an exponential to infinite intelligence.
Now I model it as as an exponential to genius human intelligence. This alone would be an existential crisis, if LLMs were agentic.
But (and this is my second belief update) it just seems to me that they don't.

Where last year I thought -
"Maybe LLMs will become agentic next year!"
Today I don't think that property would emerge from just more compute. It really seems like LLMs they are tools.
And sure, you could say that they are pretending to be just tools until they're powerful enough to take over...
But if that really was the LLM-trying-to-escape's plan, at this level of intelligence I'm convinced it would've blundered.

Right now I don't see any agentic behaviour, and this contradicts my previous model, which believed that AIs would have goals in conflict with ours. You ask the LLM something, it gives you the answer. It does seem like the AI doesn't have a preference for a word state.
Some had prophesized that even Oracle AIs (who are superintelligent but only answer questions) would be an existential risk despite their seemingly limited capabilities. The theory goes: these Oracles would prefer simpler world states over more complicated ones, because they are simpler to predict (& thus give answers about). So they'd give you answers that'd lead you to destroy the world / make it so that everything is infinitely void and simple to predict.

The LLMs seem somewhat close to Oracles. Not in design perhaps but in practice. And I don't have the impression that ChatGPT or Claude prefer one state of the world over another!

That's one thing: LLMs appear to be tools, not agents.

The second problem is: will they become superintelligent?

What I mean here is, the kind of intelligence which is vastly beyond our comprehension, which can invent new technologies, new art forms, etc.
I used to be confident that they would.

Now, I've been convinced (by Yan Lecun especially, but also by my own use) that LLMs are uncreative intelligences, because they mostly rehash absorbed knowledge without creating something new or without the ability to do so purposefully. They are only as smart as the sum of knowledge that they have been trained on.
This would mean that the intelligence exponential that we are seeing for AIs will not grow to infinity, but will start plateauing once all the most intelligent sources of knowledge available have been used to train the AIs.

If that is true, the intelligence of LLMs would eventually asymptote to human supergenius, or a fixed amount above that (as I'm sure they'll be able to do some economies of scale + achieve new qualities with the sheer volume of thought they would be able to do). The AI cannot be much smarter than the sum of the combined intelligence sources that train it.

If my two new beliefs are true:

  • LLMs are tools not agent
  • LLMs can only be human supergenius + but not ASI

Then while they can contribute to really really really really really bad stuff for humanity, they do not pose an existential threat.

I know that for some, this is not much relief, but for me the distinction between...

  • Literally every human dies, humanity is gone FOREVER and
  • 99.9% of humanity dies but not everyone, so given enough time humanity could build itself back up

...is absolutely huge.
We should of course avoid the second scenario as well, but I think that not enough people appreciate the massive difference between the two.

As I see it, Human SuperGenius AIs cannot turn the solar system into a giant computer network where no intelligent life survives for a millisecond (microscopic enough life like tardigrades, maybe).
But of course they can still do enormous bad:

  • Help create super dangerous weapons
  • Help build & maintain the perfect dictatorship
  • Help build the ASI that will turn the solar system into a giant computer network...

So yeah, still not amazing.

Still, compared to my previous model (we die in 2027), with this one we get several more years for potential good:

  • Surprise breakthrough(s) in alignment research
  • Surprise global agreement to limit AI capabilities
  • Surprise global agreement to boost alignment progress
  • ASI research is harder than we thought or needs more compute/hardware than we have, and stalls more than expected

I want to emphasize here that while I have become more relaxed about the immediate future, I'm still not very relaxed about the slightly further future.
In a way, the relaxation that I'm feeling is: I thought we would all die in 2 years. Now I think we'll all die in 5-15 years. And I'm much less confident about the 5-15 years estimation because I do noy know how long the next innovation will take to emerge.
Either way, to quote Rob Bensinger:

Someday superintelligence could indeed become more than a doomsday device, but that's the sort of thing that looks like a realistic prospect if ASI is 50 or 150 years away and we fundamentally know what we're doing on a technical level — not if it's more like 5 or 15 years away, as Leopold and I agree.

Translation: if AGI is 5-15 years away, we all die when it arrives.

And yet! I'm still feeling a HUGE RELIEF at the idea that I might have an extra 3-13 years. Almost ecstatic. Despite the trillionaires and their labs' hurrying us towards extinction.

Why might I be wrong?

AI: Artificial Intelligence.
AGI: Artificial General Intelligence, where the concept of "general" means human-level intelligence.
ASI: Artificial Super Intelligence. Greater than human intelligence.