Microsoft: AI Delegation Corrupts Data and You

In 2012 I was compelled by breaches I was investigating to give talks around the world, on the threat to data integrity. One of those talks described the rise of automation systems that would behave like the Loch Ness Monster, where the quality of information would kill the profit.

In other words we risk an unresolved/corrupted state becoming more valuable to vendors than the resolved one. Users delegate because they lack expertise or time to verify. Vendors structurally benefit from that gap. Resolution never comes, because that would end the ruse.

Tesla “AI” has killed many people with exactly this fraud. The car is sold as resolved autonomy. The driver delegates because they trust Elon Musk. The gap between what the system actually does and what he tells them it does is where the bodies pile up. DOGE AI is worse. The same fraud applied to federal systems, killing hundreds of thousands.

And now a new Microsoft paper examines 52 domains, 19 models, parsing pipelines, and backtranslation theory to prove that the “telephone game” at children’s birthday parties is real.

LLMs Corrupt Your Documents When You Delegate

Information passed through enough lossy nodes degrades to noise. Water is wet. End of story? Not really, because people still don’t believe in drowning. This post is about the science of describing the dangerously wet properties of water.

Vendors are selling LLM-mediated workflows as lossless. It’s a lie. You delegate the work and you don’t get your work back. You get something that looks like your work. The paper measures how much it isn’t, at scale. The phenomenon is far older than computing. The industry measurement is the contribution.

Every interaction is a translation. Translation has loss. Chained translation compounds loss. None of this is novel. The novelty is that AI behaves like a rumor mill while vendors sell it as a query platform. Databases have ACID guarantees, checksums, replication consistency. Breaches are detectable and the system tells you. LLMs have plausibility optimization. Breaches are designed to be undetectable because detectability would break the product. Treating one as the other is the category error. Selling one as the other is the fraud.

Why are frontier models corrupting so much data? Confidence is the product. RLHF rewards outputs that read as authoritative, and authority means smoothness. A model that flagged its own uncertainty at every edit would be unusable as a delegate. So the training pressure runs the other way: produce something plausible, deliver it cleanly, do not interrupt the user’s workflow with doubt. Better players in the telephone game produce more confident-sounding distortions, not more accurate transmissions. The kid who whispers “I think she said something about a dog” is more honest than the kid who confidently states “the elephant went to Paris.” Frontier models are the confident kid. They are rewarded for being confident and wrong.

This is where the Loch Ness pattern I presented in 2012 is most useful. The user delegates because they lack the time or expertise to verify. The model produces output that looks like the work. The user moves on. The corruption is detectable only by someone running a parser against a known seed, which is to say: only by the researchers who built this benchmark. Everyone else operates downstream of silent damage. The unresolved state of “did the model preserve my document or not” is structurally more valuable to the vendor than the resolved state, because resolution would surface the failure rate and end the sale. The product depends on the user not being able to check.

Leaded gasoline ran on the same logic. The harm was real, the measurement was suppressed, and the product depended on the public being unable to prove what it was being told didn’t exist.

Degradation is not gradual. Models maintain near-perfect reconstruction across most rounds, then drop 10-30 points in a single interaction. Sparse, severe, invisible to anyone without a reference seed. By round 20, 86-95% of relays across all tested models have experienced at least one critical failure. Stronger models do not avoid the failures. They delay them, which extends the period during which the user trusts the system before the artifact breaks.

The corruption is not random. It drifts toward training distribution. The palm tree becomes a generic tree-shape. The Manet becomes a generic café scene. The 12-shaft twill becomes patterned fabric. Whatever made the original specific – the cultural particularity, the technical precision, the structural integrity of the source – erodes toward training-set average. Delegated editing is structurally a homogenization process. Specificity dies first because specificity is what models have least of in their priors.

More translation steps, more loss. Anyone who has run signal through a chain of analog effects pedals knows this. Each stage adds noise floor. Vendors are selling pedalboards as mastering chains. Adding tools degrades performance by an average of 6%. The harness consumes 2-5x more input tokens, models invoke 8-12 tools per task, and they prefer file rewriting over code execution in domains where code execution would be safer. GPT 4.1 uses code for only 10% of edits; GPT 5.4 reaches 45%. Agentic systems are inducing the long-context degradation they were supposed to escape, and being sold as the solution to it.

Anthropic, OpenAI, and Google cannot stop integrity breaches without breaking the thing they sell. xAI is engineered as the opposite of integrity. Data corruption is not something they want to, or even can, patch. It is the structural property of plausibility optimization. Imagine a privacy policy with that disclaimer. Oh, wait. That’s Facebook.

flyingpenguin

Microsoft: AI Delegation Corrupts Data and You

Leave a Reply

a blog about the poetry of information security, since 1995