Implementation Detail or Invariant?; or, The Hardware Interlock Fallacy

If you are the kind of person who enjoys reading accident reports for fun (hi), you may have heard of the Therac-25, a medical radiation device. Some of us learned about it in school, including yours truly, by reading this paper by Leveson & Turner describing how it had some software bugs and ultimately killed some people.

In 2026 I think we’re all pretty aware that software can have serious negative real-world consequences up to and including death, but when I first read the paper in 2007 or so it was a bit of a wake-up call for me.

When I started at Akamai in 2012, my manager had just gotten really into the work of a Prof. Nancy Leveson, whose book Engineering a Safer World (open access PDF at link) was just out. Eventually he convinced me to read it, but I didn’t realize until well after I finished that the Prof. Leveson of the book was one of the authors of the Therac-25 paper I’d read in school. Safety is a small world.

Some time after that, what I remembered of the paper and what I’d learned from EaSW suddenly came together in my head with a force like an X-ray burst, and I realized that one of the lessons I’d originally taken from the paper was quite incorrect and, better yet, EaSW taught me why.

If you talk about software and safety a lot (also hi), the Therac-25 will come up in conversation from time to time, and I’m always very interested now to hear what lessons people take from the paper. There are a few common observations, but perhaps the most common is the one I now describe as the Hardware Interlock Fallacy.

Yeah, that’s the real issue.
I should’ve phrased it to point to the replacement of proven safeties being replaced by unproven ones, with those responsible dismissing reports that the new ones weren’t working.
Altytwo Altryness, BS (@whereIsTheSpai)

This is the tweet that set me off (and in fact this blog post is a cleaned-up version of my tweet thread), but I want to be really explicit when I say that I’m not trying to rag on @whereIsTheSpai here. I believed the Hardware Interlock Fallacy too, when I first read the paper. The paper’s authors maybe even believed it when they wrote the paper! We get to watch the field advance in real time, how cool is that?

So what is the Hardware Interlock Fallacy? First we need to understand a little bit about the Therac-25. The best way to do this is to go read the paper, and then I’ll recap the important bits. (I hadn’t read the paper in a decade; I had to go read it again before I could finish the thread.)

Okay. You’re back. Very quickly: the Therac-25 is a medical radiation machine. Its purpose is to apply a narrow and metered beam of radiation to a small area of a patient’s body in order to kill the cancer cells growing there. The Therac-25 in particular output an electron beam at various powers and then either delivered it directly to the patient or converted it to X-rays and then delivered it.

A diagram of the Therac-25. The electron beam source emits a full-strength beam, which is processed by devices on a turntable and redirected as an attenuated beam on to the patient. — Therac-25 (side view)

There’s a turntable between the beam source and the patient which looks like this and controls what happens to the beam before it hits the patient. And it is very, very important that the patient not be exposed to the unfiltered beam.

Therac-25 (top view). A view of the turntable showing three devices placed equistantly: a mirror, an electron filter, and an X-ray converter. — Therac-25 (top view)

The mirror is there to make it possible for staff to position the patient correctly under the beam so only the treatment areas are exposed and they’re exposed correctly, and in the mirror mode, the electron beam source is inactive and instead a lightbulb simulates the beam. Or the beam source is supposed to be inactive.

You can probably tell where I’m going with this already.

When we read the Therac-25 paper for class, essentially everybody came away with the idea that the “root cause” of the problem was that the hardware interlocks has been removed between the Therac-20, which had similar software bugs but no accidents, and the Therac-25. The paper itself argues against this implicitly but not explicitly, and given the prominence of the interlocks throughout the paper and the authors’ own recommendation they be reinstated it’s a seductive reduction to make.

The software was supposed to check that the turntable wasn’t exposing the patient to the full strength beam before turning it. However, under certain machine configurations, there was a race condition with a 1/256 probability that the turntable position wouldn’t be checked. (This was not the only software bug, or the only software bug implicated in some of the accidents)

And this is really important, so I’ll say it again: It is really, really important that the patient not be exposed to the unfiltered beam. People who were so exposed died fast, horrific, and painful deaths. You could almost say that “the patient must never be exposed to the unfiltered beam” is a design invariant of the machine, which it must uphold in order to be operating safely.

And while hardware fails differently, it does fail. The machine’s designers were intimately familiar with hardware failure, and seem to have believed that software, which is not subject to wear, weathering, etc should be more reliable. At many years’ remove that may seem silly to us, but it doesn’t prevent us from making implicitly the same mistake in reverse of trusting hardware more than software, with also bad consequences (see e.g. making the mistake of trusting the hardware encryption of your SSDs).

So the Hardware Interlock Fallacy is this: It doesn’t matter the precise mechanism by which a safety design invariant is maintained, only that it is maintained.

Share this:

Leave a ReplyCancel reply