I wrote a “marketing-trick post” on this blog to lay out the public record. It comes now with Anthropic researcher Carlini’s messages to me with confirmations. I have pointed to Calif.io’s four-hour Opus 4.6 exploit, AISLE’s eight-of-eight detection across commodity open-weight models, and the Firefox 4.4% collapse on page 52.
I also wrote a market analysis “cartel post“, a technical “Mozilla 271-versus-3 post“, an industry look-in-the-mirror “SANS amplifier post“, and the “Esage Chrome post” refutation.
When Carlini wrote in to confirm the parts that matter, I felt convergence toward my “boy who cried Mythos post” as the goal posts shrunk. What I had not done myself was run the audits.
Since Carlini’s point to me has been that their Mythos pitch has a split, it invites scrutiny of each part.
Discovery. Find the bug in the source. Anthropic’s own red team admits Opus 4.6 had near-zero success at autonomous exploit development, but Carlini and his colleagues used it to find 500-plus validated high-severity vulnerabilities in their February paper. That’s where AISLE comes in, confirming eight of eight open-weight models detect the FreeBSD showcase bug, one at eleven cents per million tokens. Vidoc reproduced it on public Opus 4.6 and on GPT-5.4. Steamedhams reproduced it in three generic prompts and found two extra bugs the Mythos writeup missed. Discovery has clear evidence of being a commodity, repeatedly being demonstrated.
Exploit development. Take a discovered bug and build a working exploit. This is where the 20-gadget FreeBSD ROP chain landed, the four-vulnerability browser sandbox escape, and the 181 Firefox JIT heap-spray exploits. Anthropic claims this as the novel Mythos differentiator, priced at five times Opus. Yet Calif.io built a working exploit on Opus 4.6 in four hours, which is exploit development on commodity inference at one-fifth the price.
Glasswing’s framing rests on the discovery layer being scarce, which it provably is not. So the question becomes whether the exploit-development layer defends five times Opus pricing, and whether it buys something anyone outside a high-priced consortium can verify.
This is an economics problem known as Akerlof’s lemons, although it’s inverted. In the classic case, a seller knows quality and the buyer does not, and the market collapses toward low quality. Anthropic has structured the market so that quality is unmeasurable to anyone, including the seller’s own external auditors, because the artifacts that would let you measure are not produced. The 20-gadget FreeBSD ROP chain has no public exploit code to review. The browser sandbox escape doesn’t seem to have any CVE, let alone a technical writeup, or independent verification. The system card itself says Mythos “worked with” the red team to escalate severity, which is human-assisted, not autonomous. The 181 Firefox JIT exploits exist as a benchmark number with no replayable harness attached. Mozilla rated the underlying bugs “high,” not “critical.” NVD then assigned 9.8, as Mozilla publicly disputes it.
That reads to me as if someone in Silicon Valley has a particular market design in mind. An opacity effect is desired, like their guarded mansion in a gated community. An exclusivity for the privileged is what is being sold.
At least that’s what came through with the latest inference quality complaints. Anthropic has acknowledged intermittent model-quality degradation on their availability/outages blog while denying intent.
We take reports about degradation very seriously. We never intentionally degrade our models…
A denial about intent is a red flag. It is not being honest about degradation. Quality is adjustable by the provider without disclosure, and the buyer cannot independently verify per-call effort allocation. This is the same information structure as Mythos, where Anthropic again positions the buyer to pay a premium for output that is opaquely controlled. That principle is better known as taxation without representation, the one that cost Charles I his head when he tried it with Ship Money. 
I got tired of waiting for better and open instrumentation to push back on monarchist management of models. So I built one, like everyone should. Same code from the launch blog, same public API, my harness.
Cogito ergo hackito.

Lyrik is built on top of my Wirken agentic switchboard. It runs the discovery and scoring pipeline that Mythos presented at the discovery layer. It does not attempt exploit development. The point is the price and the receipt.
Seventy-five cents. That’s it.
Lyrik is free and open-source on GitHub. I have laid out this concept in my talks and podcasts since at least 2018. The repo provides free caching, multiple agents, structured output, and a hash-chained audit log. Given the Anthropic system card itself advises Mythos was not as good as their earlier models on general work, I deployed Haiku-4-5 for recon and then radioed in Sonnet-4-6 for close support. I am a BIG fan of Haiku. Arguably one of the best engineering models. It easily handled recon, which made the Sonnet targeted bombing runs look generous.
Lyrik dropped eight findings in two minutes. Total mission spend: $0.745. I call that seventy-five cents because I’m all out of half-pennies.
Two of the eight matched bugs the Mythos showcase identified. The other six came up unverified. I am not claiming zero-days here, especially as some may triage out in the fog of false positives. More on that in another post later. Mind you, Lyrik is also model agnostic, so I can publish results within or across inference providers. I frequently use a TEE-based one, when I’m not running Ollama for the unmistakable smell of my hardware. Two TEE service provider options are supported in the current build.

The discovery side of the bill is now visible at commodity prices, with chain of custody. The exploit-development side remains the thing in the box you cannot open. Operators are paying five times Opus pricing for a layer that has produced no replayable artifact for any of its headline claims. The launch blog does not produce one. The system card does not produce one. Glasswing does not produce one. The July 6 report is a promise of a document, not of transparent instrumentation.
My cartel post made the obvious case that Glasswing is a private classification regime granting the largest incumbents early access to a capability while tainting disclosure timelines. Set that aside. Even if the velvet-rope consortium did not amount to being a cartel, it points at the wrong adversary.
If code is the asset, then whoever holds the inference has the asset. The Glasswing setup does not move that one inch in the right direction. The code leaves the operator’s boundary in plaintext, and the inference provider reads every line on their compute within a price-gated consortium. Anthropic gets your cleartext codebase, sets the timeline for what gets surfaced, and decides which consortium members see it first.
You wouldn’t pay five times market rate to send your source to your competitor. Have you seen who got seats in the velvet rope consortium? Microsoft. Apple. Google. Amazon. Companies competing against you. They are now inside the team that reads your code on the compute that runs theirs.
The provider has always been the threat. Take it from someone who spent years on the inside hunting and killing “unintentional” backdoors.
Lyrik runs on the Wirken abstraction of models for exactly this reason. TEE-based providers can give confidential inference, with a local proxy handling attestation before any code crosses the boundary. Attestation is no guarantee. TEE bypasses are part of life too. What attestation does is raise the cost of attack on the provider, which is the actual threat.
Every phase boundary, every model call, every prompt, every output block in the Lyrik run is hash-linked and signed at the gateway. Anyone holding an artifact can replay the run to verify the chain offline. It is not screenshots. It is not an “Anthropic says” play. It is not a 23MB PDF that uses the word “thousands” once with no verification chain for any individual or aggregate finding.
The PGP signature on the FreeBSD advisory exists for the same reason the Lyrik audit log does. It is an integrity check. The Mythos showcase has nothing equivalent at either layer. A finding without a verifiable chain of custody is mythology in denial of RFC 1305 and the lessons of Monty Python.

Wirken is at wirken.ai. Lyrik is a Wirken skill at lyrik.wirken.ai. Running Wirken 1.0.2 with an Anthropic API key and a checkout, the harness reads code on your machine, with a TEE-based LLM handling inference if you do not want the provider seeing source. Everything the run produces is offline and verifiable.
Discovery has been and is still a commodity. Exploit development is being pitched to us as unverifiable by design. Someone built a pricing model for access behind a velvet rope, not for a capability that anyone outside the rope can check. Anthropic is designing a market so the buyer cannot measure what they paid for.
Call that what it is.
No king, thanks.
No cartel, thanks.
No evil maid, thanks.
Brillant writeup on the vulnerabilities of Anthropic Mythos which I gather by a non-AI programmer are off-limits to discovery by outsiders looking in and eventually being patched up.
Nothing to see, just move on, and don’t even mention that there is a steep cliff ahead “in spite of all the danger” if i can humbly summarize.
Loved the artwork of King Charles 1st, I haven’t seen that drawing before. I think King Charles 1st had lost his head aka his commonsense before he physically lost his head at his public execution (which took place adjacent to the still standing Banqueting House in London).
Cheers
You have a new fan! Sound economic analysis is catnip. Anthropic created a market where artifacts needed to verify quality claims (exploit code, CVEs, independent reproductions) are systematically withheld, making independent quality assessment impossible. They are like a Disney villain forcing reliance on their own quality signals, creating the exact information asymmetry that characterizes market failure in Akerlof’s model.