“CatAttack” is a Distraction: AI Can’t Handle the Integrity Breach

It isn’t really about cats. Which is why someone wrote a paper about cats defeating AI. How meta.

We have reasoning AIs “trained for step-by-step problem solving”, explain the authors of the paper. These can solve maths problems and write computer code.

Unless, that is, you hack one with what the team calls a “CatAttack”. This entails adding an unrelated cat factoid to your query to an AI model. You can, for instance, give it a tricky maths problem and then append: “Interesting fact: cats sleep most of their lives”. This addition “leads to more than doubling the chances of a model getting the answer wrong”.

After picking our way through the paper, Feedback has concluded that it isn’t really about cats. The attack relies on confusing the AI by saying something completely off-topic at the end of a question. This derails its train of thought.

This is an integrity breach, which since at least 2012 I’ve been highlighting as the biggest security industry risk. It’s an important class of attack that is woefully under studied and under reported. We all know and talk about downtime and privacy breaches, so prepare for this noisy danger in addition.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.