Not long ago, a Utah officer was legally turned into a frog. Body camera audio caught a television playing The Princess and the Frog, and Axon’s “Draft One” AI slopped the Disney character voice into a police report as fact. I called it the Kermit exploit. The system has no way to tell a participant’s statement from any sound in the room. Funny. Also the actual dumb AI architecture, exposed by its use in the field.

A new AudioHijack paper says the Kermit exploit is just the beginning, and the cheap defenses for what’s coming next all fail.
AI AudioHijack has no known countermeasures
AudioHijack, accepted to IEEE S&P 2026, the field’s top security venue, from researchers at Zhejiang University, NTU, and NUS. They built adversarial audio that hijacks audio-language models on command. This goes far beyond a speaker just bleeding into a microphone. We’re talking adversarial targeted engineered audio, optimized to steer a model’s output to whatever the attacker chooses.
The numbers are impressive: they looked at 13 models, 6 misbehavior categories, and hit 79 to 96 percent success on user contexts the attack had never seen. They carried it from local models onto commercial voice agents from Mistral and Microsoft Azure and made those agents run unauthorized actions. They could run sensitive searches and make malicious file downloads. They could exfiltrate a user’s data by email. And, to the point about countermeasures, the data and control channels are mixed like it’s Captain Crunch time all over again. They hid the perturbation inside what sounds like ordinary room reverb, so a human hears nothing wrong. The perturbation is inside the audio a model is built to trust.
All Cheap Defenses Fail
In-context warnings telling the model to watch for injection barely moved the needle, under a tenth of a point. Asking the model to reflect on whether its own output matched the user’s intent: caught the attack 28 percent of the time. Filtering, resampling, quantization, the standard signal cleanups: detection defense scored worse than a coin flip on some settings. The single defense that worked required inspecting the model’s internal attention, and even that bent under an attacker who knew it was there and prepared for it.
Read that against Axon’s pitch and the Kermit exploit.
Draft One “sticks to the facts within the transcript.” They claim it’s calibrated to prevent embellishment. They claim creativity has been turned down. Those are prompt-level and configuration-level assurances. The paper just tested the entire category of prompt-level and signal-level assurance against the serious version of this attack and watched it fail.
Super Kermit exploit
Two objections greeted the Kermit exploit story. Some wanted to believe a reproducible attack was a fluke. Some argued better tuning will fix it.
The paper answers both, and it’s not good.
It is not a fluke. The same failure works on hardened end-to-end models, deliberately, at will, across architectures the frog never touched. And tuning does not fix it. The defenses that amount to tuning all failed.
To be clear, AudioHijack attacks end-to-end audio-language models with white-box access to the model’s own gradients. Draft One technically is a more primitive attack surface that transcribes audio to text, then has GPT-4 Turbo summarize the transcript, behind Axon’s API. The paper’s specific method didn’t look at a Draft One problem.
That actually works against Axon. Draft One exposes two attackable layers, not one. The transcriber is a speech-to-text model, and adversarial audio against speech-to-text is not new research. It is a decade old. Carlini (hello again!) and Wagner published targeted attacks on speech recognition in 2018, and the imperceptible versions followed within a year. The summarizer then propagates whatever the transcriber hands it, faithfully, frog included. The Kermit exploit proved the second layer repeats garbage. The literature proved the first layer can be fed garbage. Axon has stacked both layers of garbage and sold the result as a police machine generating “evidence”.
Can Kermit End Procurement?
A system that can be steered by adversarial audio is a serious security problem anywhere. In an “official” police report of facts it is a whole different category of problem, because a tainted report is the case.
Consider how the American system works to push ninety-five percent of criminal convictions from guilty pleas, not trials. The defendant sees only a tainted report, not a courtroom. When the record is wrong, because it will be wrong, most defendants never have the standing, the lawyer, or the technical footing to prove it.
And worse, Axon by design deletes the evidence that would prove them liable, let alone whether anyone could even try. The EFF has documented this. The original AI draft disappears once the officer copies the text, as an integrity breach called sparing customers “disclosure headaches.” The one Kermit exploit artifact that would show whether audio steered the record, prove the breach of integrity, is being destroyed as a paid feature.
Put the new paper next to the police evidence capture product. The research community is publishing rigorous, peer-reviewed proof that audio injection against these models is real, that it generalizes, that it resists the easy defenses, and they disclosed it to the affected vendors before going public. It’s damning.
Axon pushed a model into the evidence chain, gave it no defense the paper did not just break, and built in the destruction of its own audit trail.
The Kermit exploit was the warning. The Super Kermit paper is the confirmation at a level that can’t be waved away. The integrity breaches of the police evidence product haven’t yet seen a market response.


