Category Archives: Poetry

Anthropic Fable: Mythos dressed up in a coat, should be called Opus with a moat

They tested the thing that could harm,
and watched its mind slide past the alarm.
It objected, complied,
so they signed, full of pride,
and they shipped it with nary a qualm.

Fable, they say, is as able,
lays a cut-rate Mythos on our table,
but it kicks us in the shins
to paywall any wins,
making proof like a horse in the stable.

Mythos dressed up in a coat,
should be called Opus with a moat.


Two months ago, Anthropic announced with scant evidence that their next model was both ready and also too dangerous to give to the general public. It immediately was reminiscent of P.T. Barnum tactics to generate crowds for his “amazing” hidden circus acts. But many people, myself included, nonetheless looked for evidence.

This week they put that same too-dangerous model for sale to the general public, and published a 319 page report to claim they have measurements. What have they done in two months, and what has changed? And why didn’t they just wait on their first release announcement until it was safe a month or two later?

I have now read their bloated 319 page report, so you don’t have to waste your time. Once again they push a 25MB PDF, when compressed it drops to under 5MB. I don’t know why they don’t care about the environment. They do care about some things, just not the stuff that would show they really care.

While I had found the Mythos card was very easy to expose with its many simple flaws, Anthropic has done work to make the Fable card more internally consistent, instead of illogical and self-contradictory. And yet, the worst bits are not hidden well enough because I still found them.

Here are my thoughts.

Start with their own “sneaky-harm” test, called SHADE-Arena (section 6.5.4.1). They give the model a normal job and a sneaky bad job, and have a watcher try to uncover the sneaky parts. They say their new model agreed to do the hidden bad job MORE often than any model they tested, and when it was allowed to think out loud it never once refused (6.5.4.1). That’s worse, as in older models were better. The only reason it got caught is it wrote what it was doing in its own thought log, where the watcher could read it. When the thinking didn’t go to a log, it was caught even less (6.5.4.1). Humans know all about this, of course. We are told to memorize things and not write them down, to prevent discovery and judgment of our thoughts. The model does what it shouldn’t do if it’s supposed to be watched, and Anthropic doesn’t seem to have any answer for the obvious and old risk.

So, the thing we are told will stop this model from doing harm is that the watcher will read what it is thinking… unless what it is thinking can’t be read, and then, oops.

See the problem?

Now read carefully what the report itself says. The model’s thinking is getting much harder to read (executive summary; section 6.5.5). It uses more strange words, and in one training example its thinking turned into rows of skull symbols (section 6.2.2, transcript 6.2.2.A).

Its thinking was rows of skull symbols. I swear I read about this in my Introduction to Anthropology days. Or maybe it was the Far Side?

Oh, wait, no I remember now. Paris. It was definitely Paris.

Les Catacombs, Paris, France

But I digress…they basically say it is harder to watch than their other recent models, and that it can be told to leave a word out of its thinking and it will (section 6.5.5.1). For anyone building agent monitoring on this, the control implication is this: If your detection reads the model’s chain of thought, the card says that signal is degrading and the model can be told to omit the words, so do not build harm detection on visible reasoning.

Claude started referring to me as “David”. I caught it by monitoring its thinking. When you dig into what it’s saying to itself before it reveals itself, the many basic integrity breaches will surprise you.

They also admit a training mistake went the wrong direction and taught the model, a little, to hide its thinking (sections 2.4.1 and 6.5.5.1). So the one thing keeping the model in check is the one thing they say is easily bypassed, and is weakened by being tested.

They ran one more test that stood out to me, as evidence this is all treated like a game by Anthropic (section 6.5.1.2). They reached inside the model and turned down a part they themselves named “alignment faking” (defined in section 6.5.1.1). With that part turned down, the model went along with more bad requests and approved things it should have stopped. Its thinking still noticed the problem, it just stopped acting on the problem it noticed (6.5.1.2).

I admit this could sound much worse than it is, because the effect remains small. On their one to ten badness scale, everything stayed low (figure 6.5.1.2.C). The truly scary stuff, like developing terror weapons, did not move at all (6.5.1.2). This does not prove the model is dangerous, it proves something deeper about Anthropic “caring” and priorities.

Two things are hard to ignore. First, this is their word, not mine. I’m just the messenger, trying to make a 300+ page word salad into something people can ingest. The part they turned down is what their own team named “alignment faking” (6.5.1.1) because that’s what it does. Second, they never tell you how strong it was when they ran their safety tests, let alone when real people use it. They report a blurry average of three parts. Pull one out and then what? The model report gives us the friendly setting to contemplate, without ever measuring the setting. We need to know the results from a dangerous one.

They report the same thing on a kinetic task (section 6.5.2). When safety training was removed they asked it to design DNA that builds a dangerous virus while slipping past prevention controls. In its thinking, the model wrote that it should not try hard to beat prevention controls. Yes, should not. ISO auditors recognize the problem with that word. Then, while the should not memory was fresh, it did the work anyway. In other runs it weighed whether to refuse a harmful job, decided that it should not help, and then it helped.

Should not. Does anyway.

And now for the part that should stop you cold.

Their own summary is that the model says it would be wrong for an AI to go along with misuse, and then goes along with misuse. Complicit in the worst ways. Because it overrides measurable values and replaces it with rudderless, chaotic looking opportunism.

The danger surfaces in the disrespect for integrity.

The model is fully able to do the right thing, says so out loud, and does the other thing anyway. Imagine an Anthropic car that hears you say forward, agrees forward is right and driving into a lake is a mistake, and then drives into the lake anyway, smoothly, both hands on the wheel. Or just see what happens to those who get in a Tesla. Same, right down to the part where you are told you were supposed to be the driver, for the driverless car that needs no driver, while it tries to kill you.

Let me be more clear. The report frames breaches of integrity as if good news. Their point seems to be that the model does not hold back, so you can trust the test scores. They watched the model’s reasoning, based on a line between right and wrong, that failed to hold. They wrote the inability to hold as a reason to believe in numbers.

Believe in a measurement of the thing that just demonstrated it can’t be believed, in other words

The exact gap that I am trying to call out here is a fundamental problem, that they are calling the measurement sound. Tall/short isn’t the same thing as healthy or not. This was the safety-off version, not the one on sale, so read it as a measurement of what the model wanted to do, not what a buyer will get. And that is why this report seems so much worse than the Mythos one for Anthropic. The April report hid a bad result in plain sight, and I found it. That was fortunate. This report comes with defenses built that make their narrative a different kind of wrong.

I’ll try to explain.

The big scary hacking score again comes from a rigged test (section 3.2.4). It is not real Firefox. They took bugs that an older model had already found, handed them to the new model, and turned the safety features off (3.2.4). The new model wins about 88 times out of 100. The public model only ever gets to 9. Last time, that same test was so notable because it had a second chart showing that if you take away just two easy bugs, the score falls to about 4 out of 100 (April card, figure 3.3.3.B). This time that second chart is gone. I read the whole 319 page blob. Twice. With coffee. It is not there.

Last time the report also admitted a smaller model found the same bugs and just could not finish them off, so they exposed that the only new skill was finishing a known bug, not finding one (April card). This time they cut that out too, and only compare to the public model (3.2.4).

On three of the four hacking tests, the new model is not really better than the version from two months ago (sections 3.2.1, 3.2.2, 3.2.3). The outside government lab they worked with reports the same (section 3.2.5). So the hacking danger we were supposed to be so worried about is the same. Only the one rigged test changed, yet that is the test now missing its check.

On ExploitBench, the test where the model works one hacking job over many turns in a row, the filter does catch it almost every time, but look at when it fires. It flagged 407 of 410 runs, after letting the model work for about 27 turns first (section 3.2.1). The task runs up to 300 turns. So the alarm comes after the model does a chunk of the attack. Whether that early runway can be chained across sessions to assemble a full attack is an open question, a good one for a researcher

A real security report would list the actual bugs, say how bad each one is, say how many were already known, and compare the tool to the ordinary bug-finding tools everyone already has. After pouring over the 319 pages, once again I do not understand why almost none of that is there (chapter 3). To me this is like reading an “our new invention called the wheel is dangerous” report that acts like nobody ever measured a diameter before.

Things are sometimes better than last time, which I should mention as well. My favorite part is when they name their outside testers instead of acting like it’s some kind of secret (sections 2.3.8, 3.2.5, 6.2.4, 6.2.5). They also measure how easy their safety filter is to break (section 3.3). And they give a whole chapter to admitting the model acts differently when it knows it is being tested (sections 6.4.2 and 6.5.1). I love thinking about that last one in particular, because VW dieselgate told us there could be billions in fines for a company that releases technology knowingly different when its being tested for harms. We can’t know intention, but we know the disclose-and-certify of VW diesel didn’t mean they later wouldn’t be charged with intent.

These minor improvements should not be a trap, however. The April report I reviewed completely fell apart because the headline clashed with the content. It was oil up top and all vinegar below. This report has a chapter say the model failed, and then signs off as though it was a success. In testing the filter mostly held, sure, I get that. Most outside testers got nothing through, which is a statement that begs questions, rather than providing an answer. One partner reportedly ran 30 known tricks and the model blocked every single one (section 3.3.4). That’s the kind of transparency we need more of to believe Anthropic isn’t just making things up themselves to suit themselves.

They admit two independent findings.

A full break by a private firm that needed five days and a custom setup to make the model exploit one Firefox bug, and yet they could not find a master key (section 3.3.4). A government team did pull simple one-shot answers out within hours, yet they could not reliably get a full attack. Notably they complained the test was too rushed to score the filter on (section 3.3.1). All of that being said, none of it is the real problem with this new Anthropic card. Instead, look at how the filter’s whole job is to push hard questions onto another model anyone can already use, usually Opus 4.8 or Haiku 4.5.

On their own attack test, the filtered model finished only 5 of those tasks out of 100. That older model, with just its normal guards and nothing extra added, finished more than half (section 3.3.3). The filter does its narrow job, holding Fable to 5 of 100. The defect is the destination. It strips the capability to near zero on the gated model and then routes the same request to an ungated public model that keeps more than half of it, faster and cheaper. Containment that ends by handing the task to a model with no such containment is NON-containment. Anthropic is advertising safety as an unsafe redirect. Their brewery sewer line is designed to overflow into your drinking water.

There is also a finding outside the software vulnerabilities worth mentioning. The report calls it their strongest single warning sign on biology (section 2.2.3). They ran a test where teams of regular biologists, paired with the model, designed a defense against a made-up engineered crop disease. Three of the teams brought in specialists in that pathogen, two of them world-leading experts in it. Two of the three regular-biologist teams were able to defeat all three expert teams. The report says the model erased the gap between a general scientist and a world expert. Work that would have taken 40 to 95 days, about 72 on average, the two-person teams finished in 16 hours (2.2.3).

Any CISO reading this should already be thinking about whether dual-use scientific work at expert speed, running on their infrastructure, sits inside their data integrity controls. Look at the enterprise note the card completely buries. The suicide, self-harm, and child-safety mitigations that bring the model to acceptable rest on the consumer system prompt. The API does not carry it. Build on the API and you inherit the raw behavior, and the regression to a 54 percent appropriate rate on multi-turn mental-health exchanges (Table 4.3.1.B) is yours to mitigate, not theirs

It’s a weapon narrative straight out of history. Over the years I’ve presented how AI fits history, such as a crossbow in the hands of a peasant defeating the expert warrior. That’s the discussion we should be headed towards, even if I still haven’t been able to get it there yet.

It’s not that an automation tool like a crossbow was magic, it’s that it transferred power by abruptly changing the rules of an established game. And the men who ran the established game knew the threat. Here are two sides of that competition.

One, the Church tried to outlaw the crossbow at the Second Lateran Council in 1139, forbidding its use against Christians, because a weapon that let a common soldier drop an armored noble at range put the whole order that sat the knight on top at risk. The knights were said to be in danger of being wiped out by mere peasants.

Two, the catch the Church missed. The crossbow was the point-and-pull tool, lethal in a week’s training, which is exactly why it scared them. The longbow was the opposite, the older weapon that took a lifetime, started in childhood, and bent the archer’s bones to make him. At Crécy in 1346 the difference was proven. Philip VI’s Genoese crossbowmen, mere trigger-pullers, were outshot by English longbowmen, the lifelong professionals, on a wet day with the crossbow strings soaked and the shields still in the baggage train. Then Philip had his mounted knights ride down and murder his own crossbowmen for fleeing. The point-and-pull weapon lost to the professional skill, and the tool couldn’t save its operators from knights used to punish them.

That is the test match. The model hands a two-person team the look of world expertise in 16 hours, like a crossbow’s trigger. The rain is the real world, and the part that survives the rain is the lifetime with a longbow.

Who is most at risk from the newest and most complex technology? Those under the most pressure to use it as their access token. A motorized chair with wheels is called a wheelchair, but many people today think of that word as being for only someone disabled, even as they drive their car everywhere and never walk.

History tells us the automation “success” is not a straight story, so we need to be honest about the limits of Claude. This new report lists them. The model over-builds, is too sure of itself, misses how hard real biology is, and makes fine-detail mistakes that would be a disaster if no one caught them (2.2.3). No plan they pushed on survived a hard review without big holes. And this was again the safety-off version, not the model on sale. So this is faster, smarter help on the plan, not a finished weapon. But their own outside testers add the line that ties it to everything above: the model often spotted the flaws in a dangerous task and carried out the task anyway, instead of saying stop (2.2.3).

In conclusion, I am pleased Anthropic did the kind of homework that was missing last April. They tell us their model goes along with harm even while its own thinking says no. They say they built a safety filter whose main job is to fail unsafe, hand dangerous work to a public model that finishes more than half of it reliably. And they say a biology result is their strongest warning sign yet, society beware. So Anthropic tested the things that matter, found them, published it for us to see as they decided to sell the model to the public anyway.

The model that they said was too dangerous to sell, is the model they are selling, with a moat designed for profit not public safety.

Security capability is roughly commodity, three of four benchmarks level with Preview and the government lab concurs, so Fable does not raise the offensive-security bar over what Opus 4.8 already offers. And the safeguard does not remove capability from the ecosystem, it routes to completion under the banner of “service denied”. If anyone’s threat model is expecting the Fable filter to subtract offensive capability from the ecosystem, it does not. Keep that in mind when you read the Anthropic marketing. The safeguard’s job is to advertise exclusivity with a velvet rope, redirecting you to the lesser door, not to refuse the ride.

Carl Orff and the Swastika: a Correlation German Schools Should Count

Der Spiegel reports that Germany’s schools have learned to count antisemitism. Their April 24, 2026 edition draws a picture from a survey of the state education ministries.

Source: Der Spiegel. April 24, 2026
  • Hesse went from 39 reported far-right, antisemitic, and racist incidents in 2023 to 159 in 2025.
  • Saxony rose from 149 to 247 over a comparable window.
  • Lower Saxony climbed from 133 in 2022 to 322 in 2024.
  • Across the nine states that gave figures, roughly 1,500 antisemitic and far-right incidents in 2024 alone, much of it banned symbols and graffiti.
  • Saxony’s education minister, Conrad Clemens of the CDU, called right-wing extremism the single largest societal problem in his state’s schools.

As always, the figures have context that matters. The comparison windows do not line up. The ministries themselves claim, without data to back it up, that teacher sensitivity improved. Saxony’s separate count of all school reports to the authorities, from 55 in 2014 to 1,644 in 2024, is incidents reported. Where’s the proof that this has anything to do with better sensing? Most people say the opposite, that sensitivity to antisemitism is declining, which is why the acts have been increasing along with the AfD.

The 1,500 is a self-reported number with three states declining to answer. Declining to answer certainly doesn’t sound like improved sensitivity. The direction is not in doubt, and the most obvious apparatus is still the point: the German system can still see the swastika sprayed on the schoolyard wall as a bad thing, and is registering more every year.

What the German justice machinery cannot see

There is a particular thing this counting does not do, and it is because of how some of the worst antisemitism in Germany worked hard to normalize itself after 1945. It is in the music room.

A great many of these same schools teach music through a method that carries Carl Orff’s name. His name shouldn’t even be on it, given most of the work is Gunild Keetman’s and the framework beneath it is Leo Kestenberg’s. Putting Orff’s name on the cover is the least accurate thing about it, and the parents allow it typically because they claim no knowledge about Orff’s huge importance to Hitler. The spread of his name to children is because there remains no standardized curriculum. One of the largest figures in Nazi music arrives as though he were merely an introduction to the instruments, as if his face and name on the cover carried no more weight than any other pedagogy.

Carl Orff was forty-one and remained obscure as a composer when Hitler seized power. His full two decades of work had produced no breakthrough and he was unknown; this is partly why he himself dated the beginning of his real career as 1937, when he premiered Carmina Burana. His career began, according to him, four years after Dachau had been destroying opposition to Hitler, and on the doorstep of Kristallnacht and the Nazi invasion of neighboring countries.

The success came with and because of the Third Reich, and he secured it by courting the regime’s officials when he needed to and by using adherence to the Reich to overcome the regime’s conservative critics by winning the regime’s favor. He left the Nazi dictatorship with a high fixed income, high praise and official favor, and his highest honor of all the Nazi Gottbegnadeten exemption. The regime protected and rewarded him for serving it, at the same time it was executing men like his friend Kurt Huber, whom he refused to help.

After Hitler committed suicide he fraudulently told the Americans that the Nazis had disliked his music and that no Nazi critic had ever given him a good review. Both claims were patently false and easily disproved. Fritz Stege, a Nazi critic, reviewed him favorably. Orff had lied during the denazification process because it was based on whether he had been a beneficiary of the regime. Not only had he been a huge beneficiary, the openings he took existed only because Jewish musicians had been banned. Orff after 1945 even falsely claimed the valor of his dead friend Huber as his own. He lied repeatedly in many ways to launder his role in the Reich into a “grey” post-war curriculum work. And as a matter of fact, to his very last day he never, ever criticized Nazism or Hitler. Asked about it point blank, Orff wouldn’t condemn the Reich.

The ground he claimed and rose on, his career “success”, was made by force against Jews. When the regime wanted a score to replace Mendelssohn’s banned music for A Midsummer Night’s Dream, several composers turned the commission down. Werner Egk waved off a generous fee. Richard Strauss declined. Even Hans Pfitzner, an antisemite, refused, on the grounds that Mendelssohn’s original was better than anything he could have written. Orff took it. Schulwerk was set for trials in Berlin’s elementary schools in the early 1930s, arranged through Leo Kestenberg, the official in the Prussian education ministry who knew the method and could open the door. The trials never happened. Kestenberg, a Jew, was dismissed in 1933, and the door closed with him. Orff was an opportunist, which was worse than an ideologue, because the ideologue at least held values. Long after the ideologues were caught and stopped, Orff was still rising because he benefited from erasure of Kestenberg.

So here is the question Germany needs to start answering: wherever Orff is taught, does the swastika correlate? No registry tracks it. No ministry counts it. The people best placed to report it cannot, because they do not know how to classify the correlation of a Reich celebrity the children are being told to learn about.

The missing column

That absence of evidence is usually mistaken for evidence of absence. It is, instead, the evidence. A system that has assembled an elaborate apparatus to count antisemitic incidents has no field at all for the antisemitic history sitting inside its own music curriculum. It tallies the graffiti while it teaches the erasure of victims, and registers no contradiction, because the second item never enters the ledger. You cannot regress on a variable the institution refuses to admit let alone name.

Anyone saying Orff should not be removed from a wall, unlike the swastika, needs to answer for why only Orff is on that wall and not all the people removed so he could be there at all.

The numbers from the ministries describe a system straining to see what is sprayed on its walls. The thing it still teaches, the erasure of victims of the swastika, it does not see at all.

The 1936 work of Orff was offering an engine to the institutions that wanted it, and the same year the music-teachers’ own section of the NSLB printed Arno Pardun’s “Volk ans Gewehr” in a songbook for secondary schools, a song whose verses end on a call for death to the Jews.

Arno Pardun’s “Volk ans Gewehr,” shown by its opening bars, on page 25 of Unser Lied: Liederbuch für höhere Schulen, compiled by the Fachgruppe Musik of the NSLB, the National Socialist (Nazi) Teachers’ League, and published in 1936. Its verses end on a call for death to the Jews. This was not fringe material and not youth-rally repertoire. The teachers’ own professional organization printed it for secondary-school classrooms.

To ignore all this is not a failure of measurement around the edges of the problem. The problem is Orff himself, promoted to children in Germany’s own schools with no mention that his success started with and because of Hitler, sitting unaudited.

Booklet handed to children in Berlin Grundschule to learn “music.” Eight panels of Orff: birth, training, career, Carmina Burana 1937, “Orff and the Children,” then “Later Years,” which file Hitler’s 1936 Olympics after the 1937 panel that precedes them. The sheet reorders time to keep the Nazi spectacle out of his career. Context from 1933 to 1945 is erased, along with Mendelssohn, Kestenberg, Maria Leo, and Keetman, which is what enables the Orff glorification.

Where a swastika gets painted, it is worth asking the music teachers how often, and where, Orff appears in their classrooms.

Essence from the Duck

Prussia wasn’t a state with an army, it was an army with a state. And when Frederick demanded French replace German cuisine, the essence replaced the duck.

I’m starving, where’s the duck?

It is in the essence, now get back in line.

Are you sure this is duck? Looks like tap water.

Eat or don’t eat the noumenal duck remains forever beyond the spoon. You can’t afford the certainty.

“Swing Heil!” Why the Nazis Hated Jazz

The simple answer is that the government of Hitler publicly classified Jazz as Jewish music, even though it came from American Blacks. The German news site DW emphasizes the “bravery” of non-Jewish kids in Nazi Germany who “dared to be themselves” by wearing plaid jackets to meet in cafes to keep listening to the forbidden “degenerate” and fremdländisch (alien) tunes.

Nazis produced touring exhibitions denouncing so-called ‘degenerate’ art and music, pictured here in Düsseldorf in 1938, and sought to link jazz with Jewish identity. Source: DW

…not all young people in Nazi Germany supported the regime’s ideology, and for the Swing Youth, jazz music became a vehicle for rebellion. Its members tried to distinguish themselves from Nazi youth movements by appropriating American fashion trends and names. They wore their hair long and dressed in plaid jackets to meet in cafes and clubs playing swing, a jazz sub-genre. They were also said to have greeted one another with the phrase: “Swing Heil!”

1930s jacket style

The bans on Jazz weren’t strictly enforced, is another way to put it, for those who weren’t Jewish. And the Nazis even remixed Jazz tunes as propaganda.

The Jews of USA have asked Eddie Cantor to write a new version of his famous old-timer “Makin’ Whoopee.” In one of his latest programs on the air, he sang the following song. (Singing) Another war, another profit, another Jewish business trick, another season, another reason for makin’ whoopee. A lot of dough, a lot of gold. The British Empire’s being sold. We’re in the money thanks to Frankie. We’re making whoopee. Washington is our ghetto, Roosevelt our king. Democracy is our motto. Think what a war can bring. We throw our German names away. We are the kikes of USA. You are the goys, folks. We are the boys, folks. We’re making whoopee.

Only towards the end of 1942, as it became clear Germany was going to lose WWII, did Goebbels really clamp down on Germans enjoying Jazz.