“CatAttack” is a Distraction: AI Can’t Handle the Integrity Breach

It isn’t really about cats. Which is why someone wrote a paper about cats defeating AI. How meta.

We have reasoning AIs “trained for step-by-step problem solving”, explain the authors of the paper. These can solve maths problems and write computer code.

Unless, that is, you hack one with what the team calls a “CatAttack”. This entails adding an unrelated cat factoid to your query to an AI model. You can, for instance, give it a tricky maths problem and then append: “Interesting fact: cats sleep most of their lives”. This addition “leads to more than doubling the chances of a model getting the answer wrong”.

After picking our way through the paper, Feedback has concluded that it isn’t really about cats. The attack relies on confusing the AI by saying something completely off-topic at the end of a question. This derails its train of thought.

This is an integrity breach, which since at least 2012 I’ve been highlighting as the biggest security industry risk. It’s an important class of attack that is woefully under studied and under reported. We all know and talk about downtime and privacy breaches, so prepare for this noisy danger in addition.

In related news:

A hacker compromised a version of Amazon’s popular AI coding assistant ‘Q’, added commands that told the software to wipe users’ computers, and then Amazon included the unauthorized update in a public release of the assistant this month, 404 Media has learned.

flyingpenguin

“CatAttack” is a Distraction: AI Can’t Handle the Integrity Breach

Leave a Reply

a blog about the poetry of information security, since 1995