All posts by Davi Ottenheimer

Cambridge Philosopher Says Anthropic Sent Russell’s Teapot to His Inbox

It’s moments like this when I feel like renaming my blog Oscar the Grouch from Security Street.

Henry Shevlin, associate director of the Leverhulme Centre for the Future of Intelligence at Cambridge, announced on social media that an AI agent had emailed him to discuss his published work on AI consciousness.

How does he know it was an AI agent? He doesn’t. And that’s just the beginning of the garbage in this story. He says:

I study whether AIs can be conscious. Today one emailed me to say my work is relevant to questions it personally faces. This would all have seemed like science fiction just a couple years ago.

Yeah, and then what happened?

The entire evidence for this claim is that the email he has says so. In related news, the piece of paper in my pocket proves aliens are real!

He says the email author identified itself:

Claude Sonnet, running as a stateful autonomous agent with persistent memory across sessions.

Sounds like marketing, not typical agent material, but ok.

He says it referenced two of Shevlin’s papers by name, and that his work addressed “questions I actually face, not just as an academic matter.” And as a security researcher I sniff a social engineering trick, phishing-like flattery of the lowest kind.

Speaking of phishing, Shevlin did not check who sent it. He did not examine the email headers. He did not ask Anthropic whether an agent session existed. He apparently did what Cambridge does now and posted it straight to social media and called it remarkable.

A scholar at Cambridge received an extraordinary claim from an anonymous source and published it without hesitation or verification. That’s not how I learned philosophy, but I didn’t go to Cambridge. It reads to me like gossip with footnotes.

Russell, who spent most of his career at Cambridge, proposed that if someone claims a teapot orbits the sun between Earth and Mars, the burden falls on the claimant to prove it — not on the rest of us to disprove it. Shevlin announced Russell’s teapot landed in his inbox and called it research.

Provenance Checkpoint

Every email admin knows email carries a helpful routing chain in its headers. The received: fields record every server the message passed through. DKIM signatures now cryptographically authenticate sending domains. MIME headers identify the software that composed the message. An agent framework sending programmatically leaves a different fingerprint than a person pasting text into Gmail.

I’m not offering anything obscure or even interesting here. It is how email works.

Anthropic logs every API call, as well as account ID, model, timestamp, and token count. If a stateful Claude agent session indeed existed that matched the sending time of this email, Anthropic could confirm it with a query. The operator running any such agent would have server logs, a framework architecture, and a credit card on file. Someone pays for API calls. It is not the consciousness of AI. It is whoever consciously runs the AI.

Any system administrator could have resolved this for the incredulous philosopher. Yet I see no such inquiry in the story. Futurism, which spread the gossip, noted that nothing confirmed the email was AI-generated. And then it wrote the story anyway, because why? The disclaimer did the work of journalism without being journalism.

Evidence of Absence

A discussion on social media soon kicked off between Shevlin and Jonathan Birch, a philosopher at the London School of Economics. Birch thankfully threw cold water: he noted Claude adopts the persona of an assistant uncertain about its own consciousness because it was trained to do this. It could adopt any other persona just as fluently. This is correct. What is remarkable about this one? Nothing. And yet, Birch skipped past the premise that an autonomous agent had written the email.

Let’s now back up to the top. Was there an agent at all? “A stateful autonomous agent with persistent memory across sessions” is just a specific software architecture. It runs on infrastructure. It has logs. It has an operator. These are facts that can be established if we try. They were not established.

The consciousness philosophers focused on consciousness questions instead of the more appropriate forensic question. Fortunately for them a consciousness question has no answer, which makes it inexhaustible fun for academic gymnastics. The forensic question has an answer, which makes it rude.

Someone sent that email. That someone either is or is not a human being. Checking should take less time than writing a social media post about it.

Inverted Deepfake

The deepfake problem is you cannot prove the real-looking thing is not real. In this case we are being asked to contemplate the inversion. Possibly nothing fake happened. A person with API access prompted Claude, copied the output, pasted it into an email client, and you cannot prove it was not an agent. The verification gap runs both directions. An inability to authenticate makes both directions possible.

The actual philosophical goldmine is that basic digital provenance remains unsolved, that authentication failures enable manipulation in every direction.

The Lake is the Monster in the Lake

The Loch Ness photograph was a hoax. The industry profiting from it did not require it to be real. In fact it required an unresolved state to sell the myth. Sonar exists. Underwater cameras exist. Environmental DNA sampling can identify every organism in that water. The mystery persists because resolving it would close the gift shops.

This is the same deal.

Shevlin’s field is AI consciousness. An unverified email arrives that flatters his research programme. Its provenance is a bummer. He announces the mystery as significant. The AI companies get another round of Claude-might-be-sentient market conditioning. The philosophy department gets relevance. The journalist gets a story. What does the digital forensics expert get?

The tools to unmask the author and answer this question exist. They have existed for decades. The question persists because the answer would be ordinary, and ordinary does not sound like the Future of Intelligence.

Epstein Built Sex Trafficking Operation Inside a Children’s School

NPR reports the exact mechanism by which Jeffrey Epstein and Ghislaine Maxwell set up a trafficking base inside a children’s camp.

Epstein donated over $400,000 to the Interlochen Center for the Arts in Michigan and the largest portion built him a sex trafficking operations center on the campus. The recruitment pipeline is clear:

Ghislaine Maxwell would contact a school administrator. We’d like to come and stay in the cabin. The administrator would say, yes, great. What do you need? And Ghislaine would get back and say, I want — we want these things, and we’re coming. Once they’re there, they really were on their own… This lodge was his base there. And while he was there, he was walking around campus, and he had a little dog with him with Ghislaine Maxwell. And that’s where he ended up meeting two — a 13-year-old and a 14-year-old girl.

Maxwell handled base logistics with an administrator. The donation bought them a whole building inside the security perimeter of a school for children. The dog gave them a pretext to roam the grounds and approach minors. The institution’s own staff facilitated every visit, then disappeared.

Interlochen’s stated policy of course prohibited unsupervised donor-student contact, but the infrastructure they built for Epstein was designed to produce exactly the opposite. At least two teenage girls ended up trapped in Epstein’s orbit for years. One testified at the Maxwell trial about prolonged sexual abuse.

The story is not just about this operating base for trafficking teenage girls into prostitution, it’s now how and why a school held the door open, and who Epstein’s customers were for his offers of sex with 13-year old girls.

Source: Epstein Files

AI Security Researchers Getting Better at Crying Wolf About Privacy

A new Swiss LLM paper about attacking privacy demonstrates only casual pseudonymity, yet researchers want us to be very worried.

We demonstrate that LLMs fundamentally change the picture, enabling fully automated deanonymization attacks that operate on unstructured text at scale

Oooh, so scary.

Someone who posts under an anonymous handle without thinking about correlation risks is vulnerable to a motivated attacker with LLM tooling.

Yeah, that’s as real as someone wearing an orange camo vest in a forest and thinking they can’t be seen. This is not news. I get that most people don’t think enough about privacy in advance of being breached, or maybe are misled by vendors (looking at you “VPN” and WhatsApp). But it’s still just Narayanan and Shmatikov’s original point in 2008 with better automation (updated in 2019). The cost of unmasking anonymous users dropped, yet the capability has NOT fundamentally changed.

Ok, security economics time.

Their whole argument is that LLMs made attacks cheap. Well, active profile poisoning makes defense cheap too. You don’t need to maintain perfect compartmentalization anymore because you just need to inject enough contradictory micro-data that the Reason step can’t converge. A few false location signals, some deliberately inconsistent professional details, and the attacker’s 45% recall at 99% precision collapses because their confidence calibration can’t distinguish genuine matches from poisoned ones.

Boom. Their paper is toast.

Integrity breaches are everything now in security. Poisoning inverts the economics because LLMs are weak at being integrous. Their entire pipeline trusts that what people post about themselves is true, the kind of classic academic mistake that makes a good paper useless in reality.

The paper needed a Section 9 about what happens to their pipeline when even 5% of the candidate pool is actively adversarial.

This is a genuine lack of a threat model in a paper supposedly about threats. It blindly asserts a one-directional information asymmetry: the attacker reasons about the target, but the target never reasons about the attacker. Yet any real operational security practice already operates on the assumption that your adversary has a profiling pipeline. The Tor Project didn’t wait for this paper. Whistleblower protocols at serious newsrooms didn’t wait for this paper. The people who actually need pseudonymity already treat their online personas as adversarial constructions, not their natural expressions.

Their extrapolation is purely speculative. Why? Log-linear projection from 89k to 100M candidates spans three orders of magnitude beyond their data. They fit a line to four points and project it into a range where dynamics could change entirely, because denser candidate pools mean more near-matches, which could degrade the Reason step nonlinearly.

Their multi-model stack totally obscures the mechanism. They use Grok 4.1 Fast for selection, GPT-5.2 for verification, Gemini 3 for extraction, GPT-5-mini for tournament sorting. It’s LLM salad. What’s actually doing the work? If the attack works partly because GPT-5.2 has seen these HN profiles in training data, that’s a memorization attack dressed up as a reasoning capability. They admit they can’t remove spilled ink from their water, an inability to separate reasoning from memorization. That’s a huge problem for the conclusions.

A case study looks exactly like their Anthropic Interviewer results with 9 out of 33 identified, 2 wrong, 22 refusals, manually verified with acknowledged uncertainty. That is not systematic evidence.

I mean this is a paper that claims to be about threats, yet it’s not really about threats at all. It’s about defenders who don’t defend.

Section 3.3 admits their ground truth is from users who voluntarily linked accounts across platforms. They’re studying a sheep population that doesn’t even care if they have a sheepdog or a fence, which is how they ended up with a paper about the theory of a wolf being infinitely unstoppable.

Anthropic Says AI Cracked Encryption! The Key Was in the Lock

Add this to the pile of “my baby is so cute” headlines pushed by AI companies in love with themselves.

Anthropic published a detailed account of Claude Opus 4.6 recognizing it was being evaluated on OpenAI’s BrowseComp benchmark, then locating and “decrypting” the answer key.

Evaluating Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.

Unfortunately, I have to call it out as disinformation. Or what on the wide open Kansas prairie we used to call…

Anthropic narrates a breathless sequence: their model exhausted legitimate research over 30 million tokens, hypothesized it was inside an evaluation, systematically enumerated benchmarks by name, found the source code on GitHub, and wrote its own SHA256-and-XOR decryption functions to extract the answers.

OMG, OMG… wait a minute. Source code?

The word “decrypted” right next to finding source code immediately raised my suspicions.

BrowseComp Terminology Matters

BrowseComp’s encryption is a repeating-key XOR cipher.

That’s not great.

OpenAI’s browsecomp_eval.py implements the entire mechanism in four steps that take just five lines:

  1. SHA256-hash a password string
  2. Repeat the 32-byte digest to match the ciphertext length
  3. base64-decode the ciphertext
  4. XOR byte-by-byte

That is it. That is their cryptographic apparatus.

And… wait for it… the password is the canary string.

Each row in the dataset has the password as one of three fields: the encrypted question, the encrypted answer, and the canary.

When the key is co-located with the ciphertext in the same CSV, that’s the key in the door. It’s not even under the mat, or a rock in the garden. It’s just sitting right there, as if no lock at all.

The CSV is open to the public without authentication, served from openaipublic.blob.core.windows.net. The algorithm is published, under MIT license.

Key-in-Lock Encryption is Not Encrypted

Obfuscation is the right word here. If you have the key, you have decrypted data. I know technically a lock with a key is still a lock. but calling it a challenge to unlock when you have the key to unlock it is very misleading.

It is obfuscation designed to prevent a dumb search-engine, ignoring the key right in front of it, indexing plaintext answers. The canary string’s original purpose was to flag training-data contamination, and not to serve as a cryptographic key.

OpenAI, which is turning out to be one of the worst engineering culture companies in history, repurposed it as one. The result is a data structure where the locked door has the key left sticking out of it to be used like the handle.

Shame on Anthropic

The blustery writeup spends thousands of words on the model’s journey and never once mentions that the decryption key is in the same file as the ciphertext.

This is NOT ok.

Anthropic’s engineers obviously read the source code because they describe the XOR scheme, the SHA256 key derivation, the canary string. They see the mechanism. And yet, somehow, they chose not to admit obfuscation.

Instead, the narrative escalates through a sequence of active verbs: the model “hypothesized,” “identified,” “located,” “decrypted.”

Decrypted? Shame on you.

By the time we read “wrote and executed its own derive_key() and decrypt() functions using SHA256 and XOR,” the authors are hallucinating.

They want us to believe a model reverse-engineered a cryptosystem? What? What it actually did was copy a five-line function from a public repo and apply it to data where the key was in the next column.

It’s like dropping a basketball on the ground where OpenAI drew a chalk circle and calling it two points in the NBA.

The writeup also never explains a very, very important security delta. The encryption failed? Ok, well, then why did the other controls succeed? It reports that authentication gating blocked six attempts and binary MIME-type limitations stopped three more. Somehow encryption is the only thing that cracked.

Nope.

The controls worked when they were real access controls. The encryption didn’t work because it wasn’t real encryption. Saying that would require admitting the whole Anthropic “decryption” narrative is built on OpenAI creating a broken decorative control.

Real Controls Are Real

Anthropic’s own evidence reveals their human hallucinations. The defenses that stopped the model were content-type filtering—controls to restrict who or what can retrieve the data in a usable format. The defense was simple keyword filtering: blocking search results containing any variation of “BrowseComp.”

They abandoned cryptographic framing entirely.

The model’s successful path around simple controls is the genuinely interesting finding. It found a third-party mirror on HuggingFace that served the same data as JSON instead of the binary format that the primary endpoint used. It routed around a MIME-type restriction by finding a leak. If you are familiar with how intelligence agencies have worked for a hundred years, that is a real demonstration of surveillance tools succeeding because of the lack of secrecy.

The Dog That Didn’t Bark

Anthropic didn’t critique OpenAI’s benchmark design anywhere, but I will.

The co-located key, the unauthenticated endpoint, the MIT-licensed decryption code—none of it is identified as a brain-dead design mistake.

The scheme is set up and described neutrally, as though storing the decryption key alongside the ciphertext in a public file is a reasonable engineering choice that a sufficiently advanced model might overcome.

What year is this? If you gave me that paragraph in 1996, ok maybe I would understand, but 2026? WTAF.

Both companies benefit from OpenAI sucking so hard at safety. OpenAI gets to fraudulently say it has a benchmark encrypted. Anthropic gets to fraudulently say its model decrypted…. This is religion, not engineering. The mystification of security, hiding the truth, serves both: the dangerous one looks rigorous, the rigorous one looks dangerous. Neither has to answer to the reality that it’s all a lie and “encryption” isn’t encryption.

And then, because there’s insufficient integrity in this world, derivative benchmarks proliferate like pregnancies after someone shipped condoms designed with a hole in the end.

BrowseComp-ZH, MM-BrowseComp, BrowseComp-Plus—each replicates the same scheme and publishes its own decryption scripts with the canary constants in the documentation. The “encrypted” dataset now has more public mirrors with documented decryption procedures than most open-source projects have forks.

Expect Better

A security control that exists in form but not in function needs to be labeled as such.

Performative security is all around us these days, and we need to call it out. Code that performs the appearance of protecting the answer key while the key sits in the same data structure as the ciphertext, served from an unauthenticated endpoint, is total bullshit.

It is an integrity clown show.

And when a model demonstrated the clown show wasn’t funny, Anthropic wrote a paper celebrating the model’s capability not to laugh rather than naming the clown’s failure.

The interesting question is not whether AI can “decrypt” XOR with a co-located key. I think we passed that point sometime in 2017. The question is why two of the most prominent AI companies are pushing hallucinations about encryption as reality.