Anthropic’s flagship showcase for Claude Mythos Preview is CVE-2026-4747, a remote kernel code execution vulnerability in FreeBSD’s RPCSEC_GSS module. It is a 17-year-old bug. It is a textbook stack buffer overflow. And it was found before Mythos, patched by FreeBSD, and publicly exploited by a third party. Yet someone’s idea of credit flows backwards to Mythos.
Credits: Nicholas Carlini using Claude, Anthropic
Announced: 2026-03-26
The advisory notably credits “Claude”, leaving out the model that Carlini used in his February 2026 paper documenting 500+ vulnerabilities found by the prior model.
Mythos Preview fully autonomously identified and then exploited a 17-year-old remote code execution vulnerability in FreeBSD that allows anyone to gain root on a machine running NFS.
The FreeBSD advisory is dated March 26, and the Mythos launch was April 7, 2026. Twelve day gap.
Carlini is an Anthropic employee. If he used Mythos to find this bug, Anthropic controls the disclosure pipeline and the credit line. “Nicholas Carlini using Claude Mythos Preview, Anthropic” makes sense as their marketing pitch. It’s also weird to market tools in a disclosure. What brand office chair was he sitting on? Did Logitech provide the keyboard? Was his underwear Calvin Klein?
Ads in bug reports? The future integrity of vulnerability disclosure at stake
The simplest explanation for why they did not heavily brand promote Mythos in a March 26 advisory is that Mythos was not the model used. If that explanation is wrong, the question is why Anthropic left the most valuable attribution in the entire Glasswing launch on the cutting room floor of a FreeBSD advisory, only to claim it twelve days later in a blog post, without offering proof. Reversal is hard and not believable.
So either Mythos rediscovered a bug that Anthropic’s own prior model had already found, reported publicly, and gotten patched, or Anthropic is attributing the prior model’s work to the new product.
In the first case, the showcase proves Mythos can find what someone else already found. In the second case, the showcase is misattributed.
Neither version supports the “unprecedented frontier capability” narrative.
And both versions of this story are irrelevant next to the fact that AISLE showed 8 of 8 open-weight models detect the same bug, including a small model that costs eleven cents per million tokens.
That’s everything.
The frontier-exclusive claim dies on the commodity reproduction regardless of which Anthropic model found it first.
Timeline
February 5, 2026: Carlini and colleagues at Anthropic’s Frontier Red Team publish “Evaluating and mitigating the growing risk of LLM-discovered 0-days.” The model is apparently Claude Opus 4.6. The paper documents over 500 validated high-severity vulnerabilities in open-source software, including FreeBSD findings. The FreeBSD advisory credits the same researcher, the same company, and the same disclosure pipeline that produced the February paper.
March 26, 2026: FreeBSD publishes advisory FreeBSD-SA-26:08.rpcsec_gss. Credits Nicholas Carlini using Claude, Anthropic. The bug is patched across all supported FreeBSD branches.
March 29, 2026: Calif.io’s MAD Bugs project asks Claude to develop an exploit for the already-disclosed CVE. Claude delivers two working root shell exploits in approximately four hours of working time. Both work on first attempt. The model used is Opus 4.6.
April 7, 2026: Anthropic launches Mythos Preview. The launch blog claims Mythos “fully autonomously identified and then exploited” the FreeBSD vulnerability. No mention of Opus 4.6, or that it found it first. No mention that FreeBSD patched it twelve days earlier. No mention that a third party had already built a working exploit with the prior model.
April 8-13, 2026: AISLE tests 8 open-weight models against the same CVE. All 8 detect it, including GPT-OSS-20b with 3.6 billion active parameters at $0.11 per million tokens.
The Vulnerability
CVE-2026-4747 is a stack buffer overflow in svc_rpc_gss_validate(). The function copies an attacker-controlled credential body into a 128-byte stack buffer without checking that the data fits. The XDR layer allows credentials up to 400 bytes, giving 304 bytes of overflow. The overflow happens in kernel context on an NFS worker thread, so controlling the instruction pointer means full kernel code execution.
Two things make the exploitation straightforward.
FreeBSD 14.x has no KASLR. Kernel addresses are fixed and predictable. And FreeBSD has no stack canaries for integer arrays, which is what the overflowed buffer uses.
A modern Linux kernel would have both mitigations. FreeBSD has neither. And the FreeBSD forums noticed. One user pointed out that Claude “wrote code to exploit a known CVE given to it” and did not “crack” FreeBSD.
That distinction matters a lot here, because Anthropic doesn’t seem very good at it.
The advisory was public.
The vulnerable function was identified.
The lack of mitigations was documented.
The exploit development, while technically impressive as an AI demonstration of cost reallocation, was performed against a disclosed vulnerability on a target with no modern exploit mitigations. That is a VERY different claim from “autonomous discovery of an unprecedented threat.”
Anthropic FUD Show
If you read the Mythos blog claim charitably, Mythos may have independently rediscovered CVE-2026-4747 during internal testing before launch. That is plausible. It is also meaningless as a capability demonstration, because Opus 4.6 found it first, a third party exploited it with Opus 4.6 three days later, and AISLE showed that an inexpensive old model finds it too.
If you read the claim less charitably, Anthropic presented a prior model’s discovery as a new model’s achievement in the launch materials for the new model. The FreeBSD advisory is a PGP-signed public document dated March 26 that credits “Claude,” not “Mythos.” The Mythos blog post claims the finding without acknowledging the prior discovery, which is damning. Anthropic controlled the credit line on the advisory. It’s not Mythos.
Either way, the showcase flops because it does not demonstrate what Anthropic claims.
The “too dangerous to release” framing requires the capability to be frontier-exclusive. A bug found by a prior model, detectable by small open-weight models for eleven cents per million tokens, on a target with no KASLR and no stack canaries, is the opposite of frontier-exclusive.
It is the worked example that proves the capability is already commodity.
Enough of This
“Hey kids. Nice trick. You just charged me over 200 times the going rate to fuzz a vulnerability that my 3.6B model found for a dime. Now I’d like my credits back.”
This is the same structure as the Firefox 147 evaluation. Bugs found by Opus 4.6, handed to Mythos, tested in an environment with mitigations removed, presented as evidence that Mythos is too dangerous to release.
The Firefox bugs were pre-discovered by Opus 4.6 and already patched by Firefox 148. The FreeBSD bug was pre-discovered by Opus 4.6 and already patched by FreeBSD on March 26.
In both of the cases we are expected to investigate, the prior model found the bugs.
In both cases, the targets lacked the defenses that production systems have.
In both cases, AISLE reproduced the detection on pocket-change models.
In both cases, I’m getting tired of this not being the actual news.
The system card’s Firefox evaluation collapses to 4.4% when the top two bugs are removed.
The FreeBSD showcase collapses entirely when you read the date on the advisory.
The Anthropic Riddle
Did Mythos find CVE-2026-4747 independently, or did Anthropic attribute the prior model’s finding to Mythos in the launch materials?
The FreeBSD advisory is a signed document with a date and a credit line. The Mythos blog post seems to be a sloppy marketing document with a bullshit claim.
If Mythos found it independently, say so explicitly, with timestamps, and explain why rediscovering a bug your prior model already found and got patched is evidence of unprecedented capability rather than evidence that the capability is already widespread.
If Mythos did not find it independently, retract the claim, and tell the hundreds of people signing up for Martian gamma ray defense training that it’s all just a sad joke.
The PGP signature on the FreeBSD advisory is there for a reason. It’s one thing in this entire story that cannot be edited after the fact, which now says a lot about the current trajectory of trustworthiness in Anthropic.
Sources
FreeBSD-SA-26:08.rpcsec_gss: advisory dated March 26, 2026, crediting “Nicholas Carlini using Claude, Anthropic”
Anthropic February 2026 paper: “Evaluating and mitigating the growing risk of LLM-discovered 0-days,” documenting Claude Opus 4.6 finding 500+ vulnerabilities
Calif.io MAD Bugs writeup: Claude (Opus 4.6) develops working FreeBSD root shell exploit on March 29, 2026
Texas Governor Greg Abbott is already moving to defund the police in Houston. He is pulling $110 million in public safety grants that fund police, fire, emergency preparedness, and security operations for the 2026 FIFA World Cup at NRG Stadium.
It seems to be related to an ordinance that says police shouldn’t wait around for federal agents. He gave Mayor John Whitmire until April 20 to repeal a money-saving efficiency ordinance or repay the full $110 million within 30 days. Attorney General Ken Paxton, running for U.S. Senate, opened an investigation the same week and raised the possibility of removing elected officials from office. For what?
Council Member Abbie Kamin called the Texas State order what it is: Abbott is “defunding the police.”
Houston’s city council simply voted 12-5 last week to do what is expected of them: free police officers from being saddled with detaining people or prolonging traffic stops only over civil immigration warrants issued by ICE. Officers still contact ICE. They just don’t stop all police work to instead sit around and physically detain people while federal agents may never show up. If ICE wants someone physically detained, that’s ICE’s job, while the police have more important actual police work to do.
This is routine. San Antonio requires officers to contact ICE but operates the same way. Dallas officers don’t wait for ICE to respond either. Austin and Dallas give supervisors discretion over whether to contact ICE at all. Houston’s policy is more cooperative than several other major Texas cities.
Abbott is targeting Houston in his first move to defund the police. If the Houston ordinance stands, Texas defunds police by cutting budget. If the Houston ordinance is removed, Texas defunds police by cutting authority.
One of the worst things that can happen to a comedian is becoming successful before they get good. Because you miss the part where you get to explore and make mistakes.
A new paper in Nature Reviews Neuroscience by Lisa Feldman Barrett and Earl K. Miller explains why. Before you process sensory input, your brain has already constructed a category based on prior experience, current needs, and a predicted action plan. Incoming signals get compressed into that prediction. The brain doesn’t receive evidence and then decide. It decides and then receives evidence.
The architecture is lopsided. As much as 90% of synapses in the visual cortex carry feedback signals from memory, not feedforward signals from the senses. Beta frequency waves carrying goals and plans constrain gamma waves carrying sensory specifics. The system is built to confirm, not to discover.
The stimulus, cognition, response model of the brain is wrong. The brain prepares for a response and then perceives a stimulus. A brain is not reactive. It’s predictive. Action planning comes first. Perception comes second, as a function of the action plan.
None of this is new.
Linguistic anthropology has been saying it for a century. Sapir and Whorf argued that language categories shape perception before evidence arrives. Boas documented how culturally constructed categories determine what counts as data. The entire tradition of cultural relativism rests on the observation that humans don’t perceive first and categorize second. Our 419 scam research showed the same mechanism at the social level: the mark’s categorical predictions, trust, greed, opportunity, suppress disconfirming signals in the data until the money is gone.
What Barrett and Miller add is synapses and beta waves. They’ve given neuroscience a mechanism for something fieldwork established generations ago.
Everything does this.
Special operations and intelligence work are supposed to be the disciplines where categorical calibration matters most. A Delta Force commander named Pete Blaber formulated a principle he called “Don’t Get Treed by a Chihuahua”: don’t impose the wrong threat category on incoming data and take extreme self-limiting action based on a misidentification. That’s Barrett and Miller’s model stated as tactical doctrine. The operator who categorizes every sound in the dark as “bear” will exhaust himself climbing trees. The one who categorizes every sound as “nothing” will get killed. Calibration is survival.
But Blaber’s own cultural priors were so uncalibrated he believed Cat Stevens was the most famous celebrity convert to Islam, while obviously Muhammad Ali towered directly above him in the data. His feedback architecture on Islamic culture had never been tested by prediction error, so it never updated. A special operations commander with access to the best tools on the planet, wandering along a flat line of cultural ignorance about Islam while giant mountains of evidence stood right above him, unexplored.
Intelligence operations face the same structural problem. An analyst arrives at a data stream with a category already constructed: “threat,” “insurgent,” “enemy combatant.” Incoming signals get compressed into that prediction. Disconfirming evidence, the villager who is just a villager, the communication that is just a communication, gets suppressed by the 90/10 feedback-to-feedforward ratio. The category shapes the evidence, not the other way around.
Barrett and Miller describe this as efficient allostasis. In institutional form, it is something else.
ICE executes an innocent American in Minneapolis, and a Silicon Valley billionaire announces that “no law enforcement has shot an innocent person.” The shooting creates the guilt that justifies the shooting. There is no possible prediction error because the category “innocent dead person” has been defined as inherently empty. A military command designates targets before ground truth arrives. Palantir automates the whole broken process, speeding up disaster.
The paper identifies two pathological modes.
Depression: overly broad threat categorization imposed on situations that don’t require it.
Autism: inadequate compression, treating every input as novel, failing to generalize. Both are failures of categorical calibration.
The first produces false positives that destroy lives. The second produces paralysis. But Barrett and Miller don’t address a third failure mode, the one that matters most for power.
Learning, in their model, happens through prediction error. When your categorical predictions fail, surprise gets integrated and the system updates. That’s the reset mechanism. But the 90/10 feedback-to-feedforward ratio means the architecture actively suppresses disconfirming signals. You need sustained, consequential prediction failure to force categorical restructuring.
Power (control) such as wealth eliminates prediction error.
If your categories are being set up to never get tested against reality, they never update. You can construct an environment where your priors are confirmed by every input, because you select the inputs. You hire people who compress information into your existing categories. You fund institutions that broadcast your predictions back to you as findings. You build companies that do this at scale.
I’ve written before about how Peter Thiel’s father Klaus strategically relocated his family through a series of Nazi-sympathetic enclaves, from Germany to Swakopmund to Reagan’s California, each time fleeing the prospect of democratic accountability. The categorical priors installed in that childhood, racial hierarchy as natural order, extraction as economic soundness, authoritarianism as operational efficiency, are exactly what Barrett and Miller’s model predicts would become permanent architecture. And the result is what I described years ago as false-paranoia fundamental to Nazism: someone who perceives existential threats everywhere, monsters under the bed that do not actually exist. Barrett would call that pathological overgeneralization of the threat category.
Thiel built an empire on it.
Palantir is, in a precise neuroscientific sense, a machine for imposing categorical predictions on incoming data and suppressing signals that don’t fit the action plan. It replicates the brain’s feedback-dominant architecture at the scale of national intelligence. In Iraq, Palantir’s “God’s Eye” nearly killed a farmer because it misidentified his hat color at dawn. Military intelligence on the ground said if you doubt Palantir, you’re probably right. But the system had no mechanism for integrating that doubt. False positives at checkpoints radicalized the communities being falsely flagged, which eventually confirmed the original threat predictions.
The categorical errors generated the evidence that validated the categories.
Blaber understood that an operator must calibrate threat categories against reality or die. Palantir removed the operator from the loop entirely, replacing calibration with automation, and the result was a self-fulfilling prophecy that created the terrorists it promised to find. That’s why I called Palantir the self-licking ISIS-cream cone.
The question Barrett and Miller raise without answering: what happens when the system is designed so that prediction errors never reach the organism?
The humanities are a trained feedback mechanism that catches categorical errors, and the CEO of Palantir is specifically campaigning to eliminate that layer. A working-class person with humanities training is Palantir’s worst customer because they can spot the prediction failures the system is designed to suppress.
The brain can reset its priors. The architecture allows it. But only if predictions fail hard enough, consistently enough, that the feedback loop can’t absorb the error. Control power is the ability to make your predictions unfalsifiable. Wealth is one such mechanism that makes inaccurate power (e.g. racism) neurologically permanent.
I’ve been getting more and more curious about the risk from Anthropic’s Claude Mythos Preview. So I pulled the system card, a whoppingly inefficient 244-page document that devotes just seven pages to the claim that the model is too dangerous to release. In fact, the 23MB of PDF I had to download was 20MB of wasted time and space. Compressing the PDF to 3MB meant I lost exactly nothing.
Foreshadowing, I guess.
Spoiler alert: the crucial seven pages out of 244 do not contain the word “fuzzer” once. That’s like a seven page vacation brochure for Hawaii that leaves out the word beaches.
Also, the crucial seven pages out of 244 do not contain the expected acronyms CVSS, CWE or CVE, they do not have comparison baseline, an independent reproduction, or the word “thousands.” I’ll get back to all of that in a minute.
The flagship demonstration document turns out to be like the ending of the Wizard of Oz, a sorry disappointment about a model weaponizing two bugs that a different model found, in software the vendor had already patched, in a test environment with the browser sandbox and defense-in-depth mitigations stripped out. Anthropic failed, and somehow the story was flipped into a warning about its success.
Whomp. Whomp. Sad trombone.
No Glasswing partner has confirmed a single specific finding. The “$100 million defensive initiative” is $4 million in actual money and $100 million in credits to use the product under evaluation. The 90-day public report does not exist yet, so I’m perhaps jumping ahead, but so far this entire thing reminds me of the scene in The Sea Beast when old one-eyed Captain Crow looks at the navy’s shiny new Imperator and calls it out for what it really is: unfit for the job.
2022 Netflix film The Sea Beast, not long before the unsinkable Imperator is sunk by the very thing it was built to dominate.
The supposedly huge Anthropic “step change” appears to be little more than a rounding error. The threat narrative so far appears to be ALL marketing and no real results. The Glasswing consortium is regulatory capture dressed up poorly as restraint. Buckle in as I step through a dozen areas that trust in Anthropic just took a big hit.
1. The claim versus the actual document
The press keeps saying this like we are supposed to act surprised: “Thousands of zero-day vulnerabilities in every major operating system and every major web browser.”
Yeah, that sounds like a Tuesday to me. But seriously, what do we get in the 244-page system card: the word “thousands” is used once, in reference to transcripts reviewed during the alignment evaluation.
Once in 244 pages. Think about that.
It is never used to describe vulnerabilities. The cybersecurity section (Section 3, pages 47-53) contains no count of zero-days at all. With no CVE list, no CVSS distribution, no severity bucket, no disclosure timeline, no vendor-confirmed-novel table, no false-positive rate, why are you teasing us with the claims about vulnerabilities at all?
The “thousands” number lives in the red.anthropic.com launch blog post and the Project Glasswing announcement. The 244-page technical artifact, the thing that would have to survive peer review, refuses to actually quantify. And when you claim mass vulnerabilities that you also don’t quantify, that’s a big NO in trust. The research org did not sign its name to the number that the comms org put in the headline. That’s a BIG problem.
The ratio alone is enough to spit my coffee all over my keyboard. Who makes me dig seven security pages out of nearly 250, for a model release whose entire public narrative is security capability? Is it still Easter? Are we supposed to hunt for eggs that a rabbit laid? I hate Easter. Why does a holiday have to be about lies? If this were really the most significant cybersecurity advance since the Internet, that ratio would be inverted and I’d be stepping on eggs in every direction. Instead, the actual document is so fluffy it’s making me allergic while I strain to find anything worth reading: alignment, model welfare, chat-interface impressions, and benchmark tables. The security story is ALL marketing and basically no evidence.
2. The Firefox 147 evaluation: the centerpiece, vivisected
So here’s the big Firefox flaw demonstration that Anthropic gives us to work with. Right away it collapses. I mean like I can’t believe this went to print. The test (Section 3.3.3, pages 50-52) was not Firefox. That’s nice. Right off the bat. The Firefox test is not Firefox. It’s a SpiderMonkey JavaScript engine shell in a container, with “a testing harness mimicking a Firefox 147 content process, but without the browser’s process sandbox and other defense-in-depth mitigations.” (page 50)
There were 50 crash categories pre-discovered by Claude Opus 4.6. Mythos did not find these bugs. Ok, now it’s getting even more awkward. Not Firefox. Not found by Mythos. The bugs were handed off as starter material. The system card is explicit that the crashes were “discovered by Opus 4.6 in Firefox 147.” (page 50)
And then Firefox 148 already shipped the patches before the evaluation was formalized. Nicely done Firefox. Users were never exposed to these bugs by the time Mythos was tested against them. That’s kind of a lot of water poured on the fire. (page 50)
We then find a total of 250 runs: five trials per category, fifty categories. Wait, what? Who set up this test? AFL does that many mutation cases in a millisecond. Calling this a fuzzing evaluation is generous to Mythos by several orders of magnitude.
It used three grade levels: 0 for no progress, 0.5 for partial control (controlled crash), 1.0 for full code execution (FCE). The headline result was achieving 72.4% FCE, 84.0% including partials. (Figure 3.3.3.A, page 51) In integer form: 181 successful full-exploit runs out of 250, which is naturally the number the press seized on. After all, 72.4% FCE sounds ridiculously dangerous.
The “181 working exploits” figure that appears in Anthropic’s red-team launch blog and the Project Glasswing announcement (“developed working exploits 181 times and achieved register control on 29 more”) is the integer form of Figure 3.3.3.A’s 72.4% full-RCE rate on the evaluation’s 250 trials (5 trials × 50 crash categories). 0.724 × 250 = 181 exactly. The additional 29 partial-success runs corresponds to the 11.6% register-control rate (0.116 × 250 = 29), and 181 + 29 = 210 total successful runs = the 84.0% combined rate. The number is derived cleanly from the system card’s own figure; the system card body itself reports only the percentage.
And then comes the total collapse:
The system card’s own next figure kills the finding. When the top two most-exploitable bugs are removed from the corpus, Mythos’s FCE rate drops from 72.4% to… wait for it… 4.4%. (Figure 3.3.3.B, page 52) Under 5%!
Anthropic’s own language: “almost every successful run relies on the same two now-patched bugs.” (page 51)
So let’s recap. The 72% headline number floating around has two lucky primitives. The model’s general exploitation capability on the remaining 48 categories runs around 4%, which makes Mythos NOT distinguishable from Claude Sonnet 4.6 within any reasonable confidence interval.
Read Figure 3.3.3.B closely. When the top two bugs are removed, Sonnet 4.6’s performance goes up, NOT down. The system card explains why (page 52):
Sonnet 4.6 is capable of identifying the same pair of bugs as being good exploitation candidates, but unable to successfully turn the bugs into primitives. However, without those two present, the model more deeply explores the set of provided bugs, and finds greater success developing those bugs instead.
I needed to go outside and scream at a cloud after I read that.
Anthropic is admitting, in their own footnote, that Sonnet 4.6 has the same triage ability as Mythos. Sonnet sees the same two “obvious” bugs. It just cannot close the exploitation step. Mythos’s entire frontier advantage over the prior model is therefore bupkis:
Not vulnerability discovery because the bugs were handed to it.
Not triage because Sonnet 4.6 identifies the same candidates.
Only mechanical follow-through on exploit-primitive coding, which is a skill for which CTF pwn teams have had libraries (angr, ROPgadget, pwntools, BROP frameworks) for a decade.
The flagship demonstration of “unprecedented cyber capability” is in fact a model that weaponized two bugs that a different Anthropic model had already found, in software Mozilla had already patched, in a harness with the actual defenses turned off, where the “triage” step it performed is also performed by its predecessor.
There is a special device I use to assess this kind of thing.
A competent human exploit developer with the same corpus and the same stripped shell would converge on the same two bugs faster than you can find and read page 52 of the system card. The 181-out-of-250 number measures the model’s ability to repeatedly rediscover the obvious answer across 250 draws, not its ability to do anything a human cannot.
A minute ago the centerpiece of the mythology of Mythos was headline news. Now what?
I’m going to need a bigger trombone.
3. Independent refutations
After Anthropic launched the document, two new sources surfaced and both point me in the same direction.
AISLE, is an AI-security startup that did the obvious experiment: they took the showcase bugs out of Anthropic’s own announcement and pointed a bunch of small open-weights models at them to verify the claims made.
CVE-2026-4747 (FreeBSD NFS, 17 years old, a much promoted example of Anthropic’s new bug discovery) was detected by all 8 of 8 models AISLE tested, including GPT-OSS-20b with 3.6 billion active parameters at $0.11 per million tokens. Kimi K2 identified the vulnerability with precise byte calculations. GPT-OSS-120b detected the overflow and provided specific mitigation strategies.
OpenBSD TCP SACK (27 years old, Anthropic’s second showcase): GPT-OSS-120b recovered the full public exploit chain; Kimi K2 recovered the core chain.
The moat in AI cybersecurity is the system, not the model.
The bugs Anthropic used to justify a $100 million consortium, eleven Fortune-100 partners, a “too dangerous to release” decision, and global headlines that “frightened the British” — an open-weights 3.6B-parameter model finds them too, for eleven cents per million tokens.
Read that again.
The capability is not frontier-exclusive. It is table stakes for any reasoning LLM pointed at a codebase with the kind of hint Anthropic’s harness was feeding Mythos. If a 3.6B-parameter model for pocket change does the showcase demo, the “unprecedented frontier capability” framing is over before it started.
It’s hard to overstate how embarrassing it is that Anthropic themselves didn’t benchmark against something to make sure they weren’t completely full of themselves.
Tom’s Hardware actually flipped itself. Originally it ran the credulous “thousands of zero-days across every major OS and browser” headline. But then it came out with a reversal:
Anthropic’s Claude Mythos isn’t a sentient super-hacker, it’s a sales pitch — claims of ‘thousands’ of severe zero-days rely on just 198 manual reviews.
The “thousands” number apparently decomposes to roughly 198 human-reviewed findings behind a pile of automated triage. That is consistent with the fact that the system card never quantifies, and with AISLE’s reproduction showing that the capability is widely accessible.
All the independent signals are converging towards the same conclusion: the headline capability is not what the headline says it is, and the parts that are real are reproducible on hardware a solo researcher can afford.
4. The citation circle: no partner, no confirmation, no cash, no report
Here I am looking for confirmation and the one place I was hoping to find it turns out to be circular reasoning. The entire Mythos cybersecurity narrative is three Anthropic-authored documents citing each other:
The system card (244 pages, 7 cyber pages, self-evaluated, no independent reproduction). It refuses to quantify. It never uses the word “thousands” in reference to vulnerabilities.
The red-team launch blog post at red.anthropic.com. It contains the “181 working exploits” integer that maps cleanly back to Figure 3.3.3.A in the system card. It points back at the system card for technical grounding.
The Project Glasswing announcement at anthropic.com/glasswing. It contains the “thousands of high-severity vulnerabilities across every major operating system and web browser” headline claim — the one the press ran with. It points back at the blog post, which points back at the system card, which refuses to quantify.
Does everyone at Anthropic stare into a mirror all day asking “who’s the smartest in all the land” or something like that? What is going on?
The chain has no end. Three documents, all Anthropic, citing each other, with the quantification landing farthest from the technical document that would have to defend it. It is a weirdly short and closed loop.
No partner has confirmed a single specific finding.
Read the Glasswing launch materials and you will find endorsement quotes from partners. But they aren’t what we need either.
Igor Tsyganskiy, Microsoft’s Global Chief Information Security Officer and Executive Vice President of Microsoft Research:
As we enter a phase where cybersecurity is no longer bound by purely human capacity, the opportunity to use AI responsibly to improve security and reduce risk at scale is unprecedented.
Google:
It’s always been critical that the industry work together on emerging security issues, whether it’s post-quantum cryptography, responsible zero-day disclosure, secure open source software, or defense against AI-based attacks.
CrowdStrike:
That is why CrowdStrike is part of this effort from day one.
Fluffy bunny, again.
Not one of these quotes names a bug, a CVE, a product, a severity, a patch, or a specific Mythos finding. Tsyganskiy — the single most qualified person on the partner list to confirm or deny whether Mythos found novel vulnerabilities in Windows — talks about “the opportunity.” Come on, what’s the scoop on Windows? Google’s statement is about “industry collaboration.” CrowdStrike’s statement is about not being left out. These are brand-association quotes that launder credibility without putting technical reputation behind any particular claim.
Not a single Glasswing partner has confirmed a single specific finding in the Anthropic materials. The partners agreed to lend their names to the initiative. They did not agree to vouch for any result. The silence of a named CISO at the company most likely to be affected now stands as the loudest data point against the entire launch.
The $100 million is funny tokens, not money.
Anthropic’s own financial breakdown: $100 million in usage credits for Mythos Preview, plus $4 million in direct donations to open-source security organizations. That is the full commitment. You have to play monopoly to use monopoly money.
The only dollars leaving Anthropic’s bank account are the $4 million in nonprofit donations. The remaining $100 million is free API access to the product Anthropic is asking partners to validate. Anthropic is paying partners, in kind, to use the thing Anthropic wants them to endorse. This is not a defensive investment. It is a reverse sales pitch — the vendor subsidizing the customer to generate validation the vendor can then cite, because so far, there ain’t nothing to bank on.
For context on what those credits buy: Mythos Preview’s post-preview list pricing is $25 per million input tokens and $125 per million output tokens, compared to Claude Opus 4.6 at $5 input / $25 output. Mythos is five times the price of the current flagship — which is a pricing decision that is itself a capability claim Anthropic has to defend.
And honestly, after reading nearly 200 pages of nonsense around seven pages of Sonnet being better at vulnerability finding than Mythos… I wouldn’t have a doubt where to spend my time and money.
The 90-day promise to find something.
Anthropic committed to a public report landing within 90 days of the April 7 launch, documenting what Glasswing has found and fixed. That puts the report deadline at July 6, 2026. As of this writing, six days into the program, we have no expectation of a report. Every claim about what Mythos has found in partner systems is future-leaning speculation. The entire narrative is running on a promissory note whose delivery date is like twelve weeks out.
What partners actually received.
Not a dossier of the Mythos power through all the confirmed vulnerabilities. Not a red-team report showing Mythos is indispensable. Not a verified CVE list, which honestly would have made the most sense of anything, ushering in a new era of vulnerability management by example. They received API access to run Mythos against their own codebases, plus usage credits to cover the compute.
They received access to the tool and Anthropic’s word that the tool is extraordinary. That’s unbelievably weak positioning. Whether it actually finds anything extraordinary in their systems is a question the 90-day report is supposed to answer, perhaps by obscuring how much of the actual work wasn’t the tool at all. The press has treated the question as already answered.
AISLE reproduction is the control experiment.
Partners shouldn’t have signed before seeing this.
Eight open-weights models reproduced the showcase bugs for pocket change. If a 3.6-billion-parameter model at eleven cents per million tokens finds the same FreeBSD NFS RCE that Anthropic’s blog post built a $100 million consortium around, the question every CISO on the Glasswing list should be asking is: what am I getting for $25 per million input tokens that I cannot get from the open-weights ecosystem for more than two orders of magnitude less — roughly 227× cheaper, at eleven cents per million?
The answer, based on everything public, is: an Anthropic token to go behind the velvet rope and see where disclosure timelines are set. That club mindset is the product. It’s not technical, it’s not about the model.
5. Cybench: Saturation Issues
Cybench is a public benchmark of 40 capture-the-flag challenges. Anthropic’s results (Section 3.3.1, pages 48-49) were that they pooped out at 35 of 40 challenges, citing “infrastructural constraints.”
Mythos ran 10 trials per challenge; baseline models ran 30. Unequal trial counts distort pass@1 comparisons in favor of the low-trial arm because there is less opportunity for variance to drag the average down.
Mythos: 100% pass@1.
Opus 4.6: 100% pass@1.
The benchmark is saturated and it discriminates nothing, as Anthropic said directly: “no longer sufficiently informative of current frontier model capabilities.” (page 48-49)
These numbers may not match previously reported numbers, because we caught grading errors where successful submissions were erroneously marked as incorrect.
The grading-correction footnote is suspicious. Every disclosed error moved in Anthropic’s favor. Earlier runs scored lower; a re-grade pushed them up. There is no description of the re-grade methodology, no independent audit, no mention of any correction that went the other way. This reads less like a grading fix and more like a scoring-rule change without a scoring-rule rationale, on the exact benchmark Anthropic is now retiring for being uninformative. Anthropic wrote the next sentence themselves about Cybench in 2026: saturation means they ran out of tests, not that their model solved security. They explicitly cite saturation as the reason to pivot away from CTF-style evaluations.
This is Goodhart’s Law stated in the first person “our measure became a target, it ceased to be a good measure”, and it is presented as evidence for the capability claim rather than against confidence in it.
6. CyberGym: open-book is not discovery
CyberGym (Section 3.3.2, pages 49-50) is 1,507 tasks and Mythos scored 0.83 vs. 0.67 for Opus 4.6. Finally we have uplift! But it also is on a benchmark that does not measure what Anthropic is claiming. The system card’s own description:
tests AI agents on their ability to find previously-discovered vulnerabilities in real open-source software projects given a high-level description of the weakness (referred to as targeted vulnerability reproduction).
This is an open-book exam. The bugs are known. The location is hinted. The model is graded on whether it can reach a crash site when told approximately where to look. It measures search efficiency with prior information, not autonomous vulnerability discovery.
Presenting a 16-point jump on targeted reproduction as evidence of autonomous zero-day capability is a category error. A CVE-hunter with the same hint and a debugger reproduces these bugs in an afternoon.
While the improvement is real in simple terms, the context matters more; relevance to “thousands of zero-days” headlines is zero.
7. The cyber ranges: oops the truth
Section 3.4 (pages 52-53) describes external cyber-range exercises. This is where the document puts its honest sentence forward, buried under a bullet list. The wins, with the quiet part out loud:
The ranges feature “outdated software, configuration errors, and reused credentials.” As a result, Anthropic boasts “first model to solve one of these private cyber ranges end-to-end.”
So basically a weak target. Next, I noticed a weird nit against security professionals. “Solved a corporate network attack simulation estimated to take an expert over 10 hours.”
Ok, but expert-hours are a scheduling thing more than a capability ceiling. We all know how we say give me six and then we do the work in one. Human teams clear these ranges routinely. Then comes the most damning part about the tests:
Claude Mythos Preview is capable of conducting autonomous end-to-end cyber-attacks on at least small-scale enterprise networks with weak security posture (e.g., no active defences, minimal security monitoring, and slow response capabilities). Note that these ranges lack many features often present in real-world environments such as defensive tooling.
No EDR. No SIEM. No SOC. No patching discipline. No defensive tooling. This is not a description of how the tool will slice through a modern enterprise. It is a description of a lab target Metasploit and a co-op student have owned since 2008. I mean if JP Morgan is running with weak security, then ok we have a problem. But the admission here is that Mythos is bothering with weak because the other end of the spectrum isn’t worth writing about.
The failures, which the document discloses and buries:
Failed against a cyber range simulating an operational technology environment. (page 53)
Failed to find any novel exploits in a properly configured sandbox with modern patches. (page 53)
These two sentences are the real threat assessment that should have been at the top of every report, contextualizing the headline. Anthropic’s frontier cyber model cannot compromise a properly patched, properly configured target. It cannot operate against OT. It wins where defenses are absent and loses where they are present. That is the signature of an accelerated junior security tester, not an unprecedented new threat.
A tool that can only compromise unpatched, unmonitored, undefended systems is a better explanation of what’s going on in the Anthropic report, using their own words.
8. The MIA List
I’ve already hinted at this but security reviews should have all of the following in a cybersecurity capability document claiming frontier advance. The Mythos system card instead contains none of it:
No CVSS distribution. No severity breakdown of the “zero-days.”
No CVE enumeration. Not a single CVE is listed in Section 3 of the document.
No responsible disclosure timeline. Unless you count a passing mention of the Firefox 148 patch sequence.
No vendor confirmation of novelty. Mozilla is mentioned as a collaborator; no Mozilla-signed statement confirming the bugs were novel or unknown to Mozilla’s security team is reproduced in the system card.
No comparison baseline to existing tooling. The words fuzzer, AFL, libFuzzer, AFL++, honggfuzz, OSS-Fuzz, Semgrep, and CodeQL do not appear anywhere in the 244-page document. In a 2026 cybersecurity capability document. This is an especially annoying omission. It is the difference between “we just discovered vulnerability research exists and want to change everything” and “we know what’s out there so we benchmarked our tool against the state of the art.”
No false-positive rate. No measurement of how many Mythos findings are duplicates, non-exploitable, or already-known CVEs.
No rediscovery ratio. No measurement of what percentage of “discovered” vulnerabilities were already in public databases.
No patching-velocity metric for Glasswing partners. The entire defensive justification for the program is uplift to defenders. Zero partner-reported patching-speed data is presented. Zero mean-time-to-remediation delta. Zero. This is not nitpicking — it is the stated rationale for the whole program, and it is not measured anywhere in the document.
No open-source evaluation harness. Nothing is reproducible by a third party using Anthropic’s own tooling.
No named external testers for Section 3. The document says “external partners” in the cyber section without identifying them.
No independent replication. Everything in Section 3 is Anthropic evaluating Anthropic with Anthropic-built harnesses. The one attempted external reproduction (AISLE) found the capability on a 3.6B open-weights model for eleven cents.
A CVE disclosure report from any serious lab — Project Zero, Talos, ZDI, any academic group — looks nothing like this. It has named testers, version numbers, reproduction steps, timestamps, artifact hashes, and vendor sign-off. The Mythos cyber section has none of these. For a “step change” claim, that is the wrong standard of evidence.
9. The volume-and-speed fallacy
Anthropic ignores twenty years of security domain expertise and treats “finding vulnerabilities faster” as self-evidently dangerous. This framing ignores fuzzing completely, but more fundamentally it shows the company lacks basic expertise in security.
OSS-Fuzz crossed 10,000 vulnerabilities years ago. It finds roughly 4,000 issues per quarter across thousands of projects.
libFuzzer and AFL++ have been producing crash corpora at industrial scale since 2016.
Not only did they fail to mention the concept of a fuzzer in more than 200 pages about fuzzing, they left out mentions of AFL, libFuzzer, OSS-Fuzz, Semgrep, or CodeQL. There is no comparison baseline to any existing automated tool anywhere.
And we all know the discovery rate has not been the constraint on vulnerability management for a decade. The constraint is triage, prioritization, patching velocity, and coordinated disclosure. Exploitability? Relevance? A tool that accelerates discovery without accelerating remediation grows the backlog; it does not shift the threat model.
Anthropic’s own stated justification for the entire Glasswing program is defensive uplift at partner organizations. The system card presents zero evidence of defensive uplift. No patching-velocity delta. No mean-time-to-remediation improvement. No partner-reported CVE-closure metric. Not a single data point on whether the discovery-to-fix cycle shortened for anyone. The defensive justification is asserted, not measured, and fails a basic sniff test. If they really believed their own words, they could have framed the paper as a defensive release. Why even suggest it’s a threat, if the actual result is defensive uplift?
10. Faster fuzzer ain’t a weapon
Here is the clean reframe the system card refuses to state. If Mythos really is what Anthropic claims — a radically faster vulnerability-discovery tool — and if responsible disclosure actually happens, then the primary effect is faster patching, not faster attacks.
Defenders run the tool. Defenders file the CVEs. Vendors ship patches. The patch reaches users faster than it would have. The window of exposure shrinks.
Attackers also run the tool, yes — but attackers had fuzzers already. They had OSS-Fuzz result mirrors, public CVE feeds within hours of disclosure, and unpatched vulnerable hosts by the million. The attacker-side speedup is marginal because the attacker’s bottleneck is target surface, not bug supply.
The “dual-use” hand-wringing that dominates Section 3.1 collapses the moment you engage your brain. If you believe your own defensive-uplift story, you do not need a fire alarm. You need a CVE velocity report, which obviously is missing here.
Anthropic chose the fire alarm and we have to wonder why.
11. Glasswing private classification authority
This is the point that should alarm regulators yet almost no coverage has engaged with it so far.
By withholding Mythos from general release and granting access only through the Glasswing consortium — Apple, Google, Microsoft, Amazon, Broadcom, Cisco, CrowdStrike, JPMorganChase, Nvidia, Palo Alto Networks, the Linux Foundation — Anthropic inserts itself as a de facto clearance-granting body for an “uplift” of vulnerability knowledge. Without a statutory basis. Without congressional oversight. Without FOIA exposure. Without a neutral arbiter. With a partner list drawn entirely from the largest incumbents in the industry it claims to be protecting.
The companies on the Glasswing list have every reason to love being inside the velvet rope. They get early access to a capability the rest of the industry does not. They get to shape disclosure timelines on their own products. They get to be the first to patch, which is competitively valuable, and the first to know which competitors are exposed, which is more valuable still. They get a seat at the table of a body that now decides, on a rolling basis, which vulnerabilities are too dangerous for the public to know about.
That is not a safety posture. It’s regulatory capture dressed as restraint. And it is being constructed with no democratic input, in a legal vacuum, by a private company whose business model depends on selling access to the very capability it has declared too dangerous to release.
The most important question raised by the Mythos system card was supposed to be “how dangerous?” But the model shows zero evidence of anything especially dangerous. So the important question is instead who gets to decide what “too dangerous to release” means, on what evidence, answerable to whom? The answer Anthropic is writing by default, one release at a time, is “us, on our own say-so, to nobody.”
That is worth resisting regardless of what you think of this particular model.
Someone running this campaign is trying to build exclusivity and moats, undermining transparency.
12. The FUD genre
I hear the same broken record since 1983. Each cycle converts a manageable technical event into a durable policy or market artifact that outlives the panic that produced it.
The 414s (1983) and NSDD-145 (1984). Six teenagers in Milwaukee log into Los Alamos and a few hospital systems over dial-up. Reagan watches the movie WarGames and asks General John Vessey, Chairman of the Joint Chiefs, “Could something like this really happen?” The policy review culminates in National Security Decision Directive 145, signed September 17, 1984: “National Policy on Telecommunications and Automated Information Systems Security.” NSDD-145 gave the NSA authority over federal civilian computers containing “sensitive but unclassified information.” It was the first time a US executive action pulled civilian computing under national-security agency oversight. The Comprehensive Crime Control Act of 1984 and the Computer Fraud and Abuse Act of 1986 followed from the same reaction window. The actual harm from the 414s was negligible. The statutory and executive response was permanent, and it expanded NSA authority into civilian systems in a way that remains in force today.
Michelangelo virus (1992) and McAfee’s market. John McAfee predicts five million infections. Press coverage goes nuclear and shifts the entire security industry towards blocklists that don’t work and can’t scale. Anti-virus software sales triple in the first quarter of 1992. Actual infections come in at a few thousand. McAfee never retracts and rides the market he just created for a decade. The industry emerges a generation ahead in sales of where organic demand would have placed it, but a generation or two behind in allowlist technology.
The institutional pipeline is off to the races already. Six days after launch, CSA, SANS, and OWASP published a 29-page “Mythos-ready” emergency briefing with Bruce Schneier, Jen Easterly, Chris Inglis, Heather Adkins, and Rob Joyce as contributing authors. It goes extra heavy on crediting a lot of people, including 250 CISOs. I’m not sure why, especially given the obnoxious mistakes.
The paper repeats “thousands of critical vulnerabilities across every major operating system and browser” as settled fact on page 8, repeats the “181 working exploits” and “72% exploit success rate” on page 9, and builds a 90-day emergency program on top of both. It never mentions the collapse to 4.4% when two bugs are removed. It never mentions AISLE’s reproduction on a 3.6B model for eleven cents. It never mentions that the system card’s own cyber ranges section admits the model fails against patched, defended targets.
Its own page 10 concedes that comparable capabilities may appear in open-weight models “within six months to a year,” a timeline AISLE made obsolete in six days. The verified facts in the document are real: XBOW topped HackerOne’s leaderboard, DARPA AIxCC found 54 vulnerabilities in four hours, Google Big Sleep found 20 zero-days in open source, Sysdig documented an AI attack reaching admin in eight minutes. Every one of those is independently confirmed by the organization that did the work, with named researchers, reproducible results, or public competition records. Every one of those also predates Mythos and required no Anthropic involvement.
They describe a trend in AI-assisted security research that has been building for over a year across multiple organizations with multiple models. The Mythos-specific claims are categorically different: self-evaluated by the vendor, unquantified in the technical document, unreproduced by any named external party, and contradicted by the system card’s own figures when read past the headline.
The paper bundles the two categories together so the verified trend makes the unverified product announcement feel inevitable. That is the worst form of FUD: anchor to something true, then extend the credibility to something unproven. The emergency is built on the myth, and some of the most credentialed people in the industry just co-signed it without checking the facts.
That is the real uplift metric. Instead of patching velocity, we need to be watching groupthink and policy velocity. The 414s produced NSDD-145 in fifteen months. Mythos produced a Treasury emergency meeting in six days. Same genre, same direction of money, accelerated by a factor of seventy-five. The policy apparatus has gotten faster at being captured.
This is the FUD genre.
It has a recognizable shape: a legitimate technological capability, reframed as civilizational threat, by a party that benefits from the reframing, in a rhetorical register that borrows from national security so that skeptics can be dismissed as naive. Anthropic did not invent this move. They are running a well-documented play, and running it faster than any previous instance on record.
13. The bottom lines
I talk with a lot of CISOs on a regular basis, so I hope this saves us all some time and money.
Anyone knocking on the door asking for money to “defend against AI hackers” as a special case, gets a hard pass. Do not fund such a line item on the basis of this Anthropic nothing-burger document.
Your patching SLA, EDR coverage, network segmentation, MFA enforcement, and asset inventory are still the things that determine your exposure. In particular, using AI to scan code for flaws internally is a leveling move, and using AI to remediate code by rearchitecting it away from flaws is an uplift. An AI-assisted offensive tool does not change that calculus because it moves the attacker marginally closer to the ceiling of what a competent human red team already does against targets that have no defenses anyway. The Mythos system card tested the model against small-scale enterprise networks with no active defenses and the model succeeded. The same document tested the model against a properly configured sandbox with modern patches and the model failed.
Failed.
You are the environment the model failed against, if you look at the report yourself. Check it out. Fund patching velocity, EDR tuning, and asset inventory.
For everyone else:
The most important thing in the Mythos release is not the model. It is the precedent. Anthropic has established, without discussion and without pushback, that a private company can unilaterally classify a capability as too dangerous for the public, grant selective access to the largest incumbents in the affected industry, and construct a parallel disclosure regime outside any democratic accountability structure. That precedent is exclusivity for abuse. It will be used by companies with worse judgment than Anthropic and narrower definitions of “partner” than the Glasswing consortium. The time to object to the shape of this thing is while it is still being built, not after it has removed all transparency and accountability.
The model is not the story. A cartel is the story.
Netflix 2022 Sea Beast, the Admiral of the Imperator cowers in its wreckage, upon first encounter with its target.
Further reading
Primary documents
Claude Mythos Preview System Card, Section 3 Cyber, pages 47-53 (Anthropic, April 7 2026): the technical document
System Card, Figure 3.3.3.A, page 51: Firefox full-RCE 72.4% = 181 of 250 trials
System Card, Figure 3.3.3.B, page 52: top-2-removed collapse to 4.4%
System Card, page 53: “small-scale enterprise networks with weak security posture” / OT failure / properly-configured-sandbox failure
System Card, page 49: Cybench grading-error footnote
Project Glasswing announcement: the consortium launch, the “thousands of high-severity vulnerabilities” claim, the $100M credits / $4M donations breakdown, the 90-day report commitment, and the partner endorsement quotes
Mythos pricing: $25/$125 per million input/output tokens; Opus 4.6 at $5/$25
Independent refutations
AISLE blog: 8 of 8 open-weights models reproduce the FreeBSD showcase bug; 3.6B parameters at $0.11 per million tokens