What the fuck is “AI safety”?

Attention is like a cat—if you’re like, “oh, I will buy a nice camera and spend a lot of time in fancy video editing software making this look nice,” it will spurn you, and also if you’re like, “oh last time I got drunk and threw something together I got a lot of view so this time I’ll do the same” it will ignore you, but if you are just trying to do a nice job and not think too hard about it it will come sit in your lap and force you to pet it until you forget what you were trying to write about (oh hi kitty…)

(Although uh actually the followup video I posted on the heels of the one I did where I had a couple ciders and talked about Internet space games and software engineering is actually doing kind of eyewatering numbers—10k views as of this writing and I swear it was 2k a second ago, so I uh I guess I’m a Star Citizen youtuber now?? The cat is staring at me again. Don’t read too much into this. Like most cats he’s a bad analogy.)

Anyway the point being that the Internet has just spent the better part of the last week talking about “AI safety” and as someone with a glancing background in actual safety (exhibit A: the entire rest of this blog going back to 2012) I’m kind of cranky about it, because it’s clear except for obvious exceptions Dr. K— and … probably at least one other person, but Dr. K—is the only one I know, nobody involved has a good definition of what ‘safety’ is, let alone what ‘AI’ is, frankly let alone what we’re even afraid of besides “I watched Terminator 2 at a friend’s sleepover when I was 7 and couldn’t sleep for a week” or was that just me

(The best science fiction movie of the 20th century, and you can and will fight me on that, but actually it was Jurassic Park that I watched at 7 and scared me for life, which is also an incredible movie but just not quite as good as Terminator 2.)

And here’s where I drop the drunken ramble bit, to the extent that it’s a bit: The folks who I studied, coming up in the field of safety/security/privacy/whatever we call it, define ‘safety’ as “freedom from unacceptable loss.”  (Okay technically it defines ‘safety’ as the “absence of accidents” and ‘accidents’ as “an event evolving an unplanned and unacceptable loss, Leveson PDF p. 32 print p. 11 but I think the transitive property holds, and I would link to Wikipedia but it’s useless here. A=B=C therefore A=C. Clearly.)

And, what. The. FUCK. Are the unacceptable losses we’re worried about in the AI safety context?

“AI gets super-intelligent, malevolent, and kills us all” is, sure, an unacceptable loss, because of the “kills us all” bit, but, nobody who’s worried about that rounds it to “LLM-induced mass casualty event.”  Maybe we should?  It hasn’t happened yet though it’s clearly coming, and whether you think it’s likely to look more like Terminator or Jonestown (or Heaven’s Gate) tells you more about you than about me really.

(“Unacceptable to whom?” was also the immediate question of anybody to whom I gave this definition for a few years, and, yeah, that’s the question isn’t it.  All these AI systems are, and are going to continue to be, not just Californians but Northern Californians, and no you can’t tell me that because all the VCs decamped for Wyoming or Texas or Miami or whatever Motel 6 Elon Musk lives in that they’re not Northern Californians, they absolutely are, everything you hate about them was here before them and will live on long after they’ve  run out of money and sensced into a pleasant-for-them retirement of being assholes at HOA meetings,                been replaced by a new batch of assholes with money and zero other qualifications—and I quite like Motel 6s, they were the nice motels we stayed in growing up, they’re too good for him.)

(The cat, sensing that I’m getting closer to my conclusion, has shown up to take advantage of this opportunity to stand on my keyboard and contribute to the blog post.)

Where was I? Right! Unacceptable losses. I want to be clear that I’m not trying to throw anyone I work with under the bus, I’ve been saying this so long, on Twitter and… mostly on Twitter, that I needed to write it down somewhere even in this ridiculous fashion.

“AI turns us all into paperclips” or strawberries or whatever is an unacceptable loss, sure, but so is “wizard turns us all into hamsters” and the story of change there is about as clear and specific as the story of change in the first one. Or. I dunno. “Eaten by a fuzzy green zebra.” Maybe it’s AI powered. Sure.

This is the other bit, like, so we’re unclear on ‘safety’, sure—and I don’t feel like I’ve given you an ironclad case there that would pass my former mock trial coaches’ muster but also roll with me, see above attempts at get-out-of-jail-free caveats, but, worse, we’re unclear on what AI is.

sure really do fucking hope, he says, speaking to certain friends in particular, that when I say “‘AI turns us all into paperclips’ is an incoherent fear” that you haven’t, like, installed your LLM as the top-level optimization algorithm on a paperclip plant… and how would you even do that? And then … roll its … what … down through all the sub-optimizers and sub-sub-optimizers and… sub-sub-sub-..  … basically what I’m asking is do you know what even goes into making shit, bro

Or farming and harvesting shit, in the case of Elon Musk’s strawberry example.

Plausibly some undocumented workers from Guatemala are going to have some opinions when their bosses tell them that the AI requires them to murder some people in order to plant more strawberry fields over their corpses.

I mean. Elon doesn’t deserve them. But those workers will have said feelings. Said workers being, in the main, you know, not enormous gits, like he is.

Doing anything requires that you have sensors, to take input from the physical world, you have a model of the physical world, which those sensor inputs get fed into to predict how the physical world will evolve, you have actuators, which you can use to act on the physical world, and also you have some goal, which is usually, like, “don’t die,” but is sometimes slightly more sophisticated, like, “don’t die _in an embarrassing way_”.

Any intelligence will have all of these things, whether meat based or silicon based, whether it’s body is a quarter-acre’s worth of data center sucking down a big town’s worth of power and producing a big town’s worth of heat or about 10 lbs of cranky furry meat that eats Meow Mix and shits in a litter box the essential loop remains the same.

To bring this whole ramble back around—if we’re specific about what ‘AI’ means—at the moment, an LLM running in a server rack somewhere to which we’ve fitted a text input and a text output?? For reasons. Sure. Why not. But we could fit other things.

And we’re specific about what ‘safety’ means, i.e. freedom from losses which are unacceptable to me, Kevin Riggle*, sitting right here, in [redacted city] in [redacted state] at 2:10 A.M. Eastern Standard Time in the year of our Lord Twenty-Twenty-Three.

I dunno, where does that leave us? Somewhere better than “we shouldn’t do things which make Northern Californians yell at us on Twitter” (also hi, I am a nortner californian) and somewhere worse than “we have solved this forever, we can continue to build shit free from these pesky northern Californians and their pesky opinions” probably. Idk.

Again, dropping the bit, to the extent that it is a bit, tl;dr: Let’s be really fucking specific about what unacceptable losses we’re worried about, when we talk about AI safety. And let’s also be really fucking specific about what we mean by AI, when we talk about AI.

And with that i leave you with this cat, who doesn’t exist, but who might as well have.

a tabby cat that doesn't exist

I’ve launched a podcast and a YouTube channel!

The faces of the first four guests on the War Stories podcast

It only took three years since I initially teased it, but I’ve finally launched the Critical Point YouTube channel!   The first thing on it is a podcast I’ve also just launched  called “War Stories” in which I’m interviewing software engineers about that time they broke production (we’re also available on most other podcast services).

Critical Point logo and wordmark



The main site with all the links lives at criticalpoint.tv.

I’m having a ton of fun with it—the interviews, the editing, even the promo.  If you find me in person I even have stickers.  Please check it out!

What do you say when the system goes down? How to write an internal incident email article live at GitHub’s The ReadMe Project

tl;dr I have an article about incident management up.

Back in February, Virginia Bryant from GitHub reached out to me.  GitHub was spinning up a new online magazine, The ReadMe Project, on a model similar to Stripe’s Increment.  She’d read my article about threat modeling there and liked it, and would I be interested in writing an article on a security topic for her magazine.

I really felt like I had said what I had to say about threat modeling for the time being, but, after chatting a bit about who her audience was and what their needs were, we settled on the topic of incident management.

Because I have so much to say on the topic from my years helping to run the incident management process at Akamai, but had a relatively short article to say it in, I decided to focus tightly on composing the incident email—although so much about structuring the overall process turned out to be latent in that.

As I discuss in the article, at the highest level, an incident email needs to include six things—

  1. What we are perceiving which causes us to believe that something bad may be happening;
  2. Our best guess right now of how bad it is;
  3. How far along we are in our response to it;
  4. Which one person is directly responsible for coordinating the response;
  5. Where we’re coordinating;
  6. Who else is involved and in what capacity.

—but so much emerges from that.

Working with Virginia and the ReadMe Project folks was a great experience, highly recommended, and many thanks to her and them for providing me this venue to talk about a thing that I’ve wanted to talk about for a long time.

It turns out that I have a lot more to say about incident management, so I’m working to find more places to write about it in the future.  (One is already in the works, on incident action items, so watch this space. 🙂 )

In the meantime, go check out the article!

A quick fun thing: Ever wanted to run a nuclear power plant?

Over the last couple weeks, and partly in honor of the late Dan Kaminsky, who as far as I knew never met a weird machine he didn’t like, I’ve finished debugging my Inform 7 text-adventure port of Stephen R. Berggren’s 1980 “Apple Nuclear Power Plant” simulation and released it into a form you can play online! (Or offline, if you have a Glulxe interpreter, e.g. Gargoyle.)

(You can also play Berggren’s original Applesoft BASIC version online via Joshua Bell’s Applesoft BASIC in Javascript project—go to that link and select “Nuclear Power Plant” under “Other” in the “Select a sample…” dropdown. But the text adventure port has advances like modern fonts and improved graphics which will make it a friendlier experience for a lot of people I suspect.)

Edit 2021-05-10: Here’s a screenshot.

A screenshot of the game. It reads: "> sit down; You sit down in the heavy padded chair.  The screen reads:" and then the default screen output at the very beginning of the game showing the resting state of the nuclear power plant.

Just a Little Thing I’ve Been Working On

As part of my grand plans, which I haven’t talked about here much at all, I’m working on doing more video stuff. So far, so good! (More info to come.)

To start off, I wanted to do a video version of something that I’ve done often, live and extemporaneously, so I used my post about redundancy and reliability as a jumping-off point.

Tune in if you want to watch me get very worked up about why you can’t just add a backup component and expect your system to fail less often!