A Case-Study in Securing LLM Applications From Consumer Reports

This is not my work but some colleagues and friends at Consumer Reports and Include Security have a nice post up about their, uh, security adventures developing an LLM-powered chatbot application.  (They independently discovered the vulnerabilities recently published as LLM4Shell at BlackHat Asia.)

Who’s Verifying the Verifier: A Case-Study in Securing LLM Applications

tl;dr:

The code is a little confusing, but it basically executes ( exec() ) all the code except for the last line (as system commands) generated from the LLM and then evaluates ( eval() ) the last line of code (as python). We also notice the sanitize function, which should be doing something to reduce risk; however we found that it only removes spaces and “python” from the beginning of the code, as well as backtick marks around the code:

And it gets, uh, “better” from there.