Why AI Safety Can’t Work

At least AI safety the way that people talk about it with me on social media and in person.

I’ve had this conversation often enough online and in person that I believe that I am not in a meaningful sense attributing a strawman framing of the problem to people who are generally interested in the space.  (I.e. I believe this is the steelman or best possible framing of the problem commonly in use.)

If there are better framings out there, please let me know!  I would love to be wrong about this.  In particular I hope that there are specialists out there who are further along than this.

But:

The standard way that people on social media frame the problem goes something like, “we believe that, in the future, we may create an AI system that is so far advanced beyond what we are currently capable of that it may have it within its power to destroy all of humanity; how do we ensure that it can’t?”

And the answer is: ???

(I am honestly not aware of a good, concise summary of what the best current thinking about answers is that I could point you to as the steelman here.  I am, however, aware that many, many pixels have been spilled discussing this online and increasingly in more respectable places.  That even a rough consensus is not more evident is, in fact, I believe, a symptom of what I’m about to describe.  Again, pointers to others’ work in this space welcomed.)

I submit to you that the reason the answers are so hard to come by and so unsatisfying when they do come is that the question assumes its own conclusion.  To ask the question, in this way, is to answer it.  And the answer is: we can’t.  But this is not the right question.

Let’s unpack this.

In Western Christian philosophy, the Christian God is often defined as a unique being who is omniscient, omnipotent, and omnibenevolent.  (Often called the “three-O” conception of God; sometimes ‘omnipresent’ is substituted for ‘omnibenevolent’.  I link to the site for illustrative purposes only; no endorsement is expressed or implied.)

This leads to a variety of paradoxes which Christian philosophers and people who like to argue with Christian philosophers have debated over the millennia.

The Problem of Evil is the most obvious one, to a liberal Christian, or at least the one that bothered me most growing up a liberal Christian.  Very roughly, if God wants only good things to happen (omnibenevolent), and God has the full knowledge of everything (omniscient) and the power to do anything (omnipotent), why do not-good things happen in the world?

People have explored a variety of answers over the years, and there’s almost certainly a Masters of Divinity degree with your name on it if you want to answer it yourself.  I am not going for an M.Div, and I’m certainly not going to solve it here—I offer it to illustrate what all the different words mean.

Getting back to our conception of our AI, and comparing it to our conception of the Christian God, I assert that, without substantially constraining what “so far advanced beyond what we are currently capable of” means, current AI safety proponents are, in fact, arguing that this future AI will be effectively omniscient and omnipotent.

So, the good news, there’s no Problem of Evil here, the AI isn’t omnibenevolent!  (That is, in fact, kind of the problem—but I’m getting ahead of myself.)

Unfortunately there are other contradictions inherent in ‘omniscience’ and ‘omnipotence’.  One is literally called the Omnipotence Paradox—the old, “Can God create a stone so big that He can’t lift it?”

Fortunately our concerns are more practical.  Given that we might create an omnipotent and omniscient entity, how do we ensure that it can’t destroy us?

And again, the answer is, we can’t.  It’s right there in the framing.  If this entity is omnipotent, then it, definitionally, can destroy us, and there’s no way to ensure that it can’t, because, definitionally, it can.

(It doesn’t even need to be omniscient to do so!)

Now, there are two ways out of this thicket.

One is to relent and put some pretty significant, but realistic, constraints on the potential future capabilities of this AI… let’s call it an AI system, rather than an entity.  What are its inputs, what are its outputs, who built it and is therefore responsible for it, who pays its power bills, how many backhoes does it take to sever its connection to the public internet.

The other is to relax our goal.  What if, instead of ensuring that our omnipotent AI entity can’t destroy us, we merely try to ensure that our omnipotent AI entity doesn’t want to destroy us?

And the answer, again, is that we can’t ensure that our omnipotent AI entity wants any particular thing, because, again, it is omnipotent, and an omnipotent entity is capable of wanting whatever it wants, definitionally.

For some outside force, like humanity as a whole, or at least AI safety practitioners, to constrain its power like that, even optionally, is definitionally impossible.  That’s why the definition of the Christian God needs the additional, explicit omnibenevolence constraint.

We might as well make burnt sacrifices on Mount Moriah.  (Notwithstanding that there’s another God who lays claim to that particular ritual spot.)  It would certainly be more emotionally satisfying.

And, because omnipotence is inherently contradictory, given that we start with it as a premise, we can prove anything we want.  We can also disappear up our own navels.

So.  We have to significantly constrain our understanding of the capabilities of our future AI system before we can think, or act, meaningfully to ensure its safety, and the safety of anything or anyone that it interacts with.

We are not going to create God, or an omnipotent AI, because God, and an omnipotent AI, cannot exist—God, as here defined, and an omnipotent AI, again definitionally.

And if there ever comes to be convincing evidence that we have created an omnipotent AI, or a God, well.  Go, fetch a ram from the flock as offering, without defect and of the proper value.  It’s the only way.

 

(Inb4: “But what about a finite entity which is so much advanced than us as to be effectively omnipotent relative to us.”  You just said ‘omnipotent’ but using more words.  It rounds to the same thing.  We need to define constraints.)

 

For more: What AI Safety Should Be, Real AI Safety: Threat Modeling a Retrieval Augmented Generation (RAG) System