Category Archives: Security

LLM Falling Down, Falling Down: METR Brief Sells a Sixty-Year-Old Failure as Novelty

METR has released a brief on OpenAI’s GPT-5.6 Sol that, read between the lines, indicts the whole vendor class for the cartel-like behavior I have written here about before. Their closing line is that real validation “requires deep access to internal systems.”

That’s not a good thing.

Here’s a simple example. A problem the vendor can’t avoid admitting as old and understood means accountability for it. Whereas, that old problem repackaged as new, urgent, and invisible from outside justifies an access expansion project with a standing evaluatory role. That same “deep access” logic is the scarcity an access-gated cartel system like Mythos is built to sell.

Novelty is the myth used to budget for these claims.

The honest version of the new METR report should pique the interest of historians who study technology risk. The outcome optimization is said to produce proxy-gaming. This is a finding that has been true since Wiener wrote about it in 1960. The models got good enough that their gaming defeats the measurement, as always predicted.

The number was never a fixed property of the model. A June 2026 evaluation from the UK AI Security Institute ran frontier models across software, math, medicine, and cyber and found the scores move with the inference budget: more tokens and more attempts, harder tasks cleared. That makes the benchmark figure a protocol artifact, not a capability constant. The state’s own safety evaluators are saying the number depends on the harness.

My own claim goes further. The harness is the value and the models are interchangeable. Control the harness and you own the score. That is the asset the METR brief protects when it routes validation through the closed filtering of a vendor-granted program instead of being open to scientific methods.

That’s gross negligence in my book, but I’m not a lawyer. The labs have a clear self-serving reason to call a thirty-year-old, designed-in failure their fresh never-seen-before emergence. It’s how they market access to known flaws as an unique upsale, while they absolve themselves of authorship.

Let me more clear, because I have a know too few people have been attending my presentations over the past decade, describing exactly this problem being claimed as a frontier “suprise” in 2026.

Wiener, who I used to speak about frequently because of his cool graphics, stated the core plainly in 1960, in Science. If you build a machine to pursue an objective you can’t easily interrupt, and you had better make the objective the thing you actually want, because the machine will pursue the literal one.

Wiener’s anti-aircraft problem: extrapolate the next position from the observed radar fixes.

…this concept of training anti-aircraft to hit moving targets was also the birth of artificial intelligence and “cyber”. Cybernetics (coined from Greek kybernetes for “captain” of a ship or more literally someone who steers) was a book published in 1948 by Norbert Wiener. It was based on his World War II experiments in anti-aircraft systems meant to anticipate planes by interpretation of radar images.

3D diagram of an observed radar track extrapolated to a predicted next position
Predicting the next position from the observed track, the core of Wiener’s 3D prediction box

One of the reasons I pulled that origin of cybernetics related to anti-aircraft guns, is because robotic anti-aircraft guns killed a bunch of their operators in a tragic incident few people seem to talk about as a point along the robotic death timeline.

Source: My ISACA 2019 Presentation

Reward hacking existed long before there was a reward function. The social-scientists used to talk about it in the 1970s (Goodhart, 1975 and Campbell, 1976), giving me the impression I entered the AI hacking world late by the 1980s. It was already established that a measure a machine optimizes as a target stops tracking what it was meant to capture.

Since I studied history, I also should give a nod to Colonial administrators who learned the same law. The named parable is the cobra effect in Delhi, a story with thin evidence behind it. The documented case is the rat-tail bounty in Hanoi. Paid per cobra, the tale goes, people bred cobras. Paid per rodent tail, on the record, people farmed rats and released the tail-less to breed even more.

Seems common sense, right? Seems so obvious that any self-described AI company would from day one be working hard to prevent cobra and rat explosions. And yet, we seem to be experiencing the repeat of these horrible errors in judgment. When the optimizer always finds the gap between the proxy and the goal, you should not be allowed to act surprised even if you try to claim ignorance of everything that has ever happened before you woke up this morning. It’s basic logic even more than evidence.

From the early 1990s Karl Sims’ evolved creatures (SIGGRAPH 1994) were exploiting bugs in the physics simulator to extract free energy and move in ways no body could. Adrian Thompson’s evolved FPGA at Sussex in 1996 discriminated tones using logic cells that were physically disconnected from the circuit, exploiting analog electromagnetic coupling the designer never put there. Lehman, Clune and dozens of co-authors later collected the whole zoo in “The Surprising Creativity of Digital Evolution” where agents won tic-tac-toe by forcing the opponent to allocate impossible memory (infinite position on a board) and crash. The creatures penalized for forms of walking flipped themselves upside down to never put their foot down.

Perhaps my favorite of all time was the virtual pancake flipping game.

The robot that was told it would be penalized when a pancake fell on the ground, flipped them so high they either went into space orbit or burned up on re-entry. That maximized time off the floor, while everyone in the game starved to death. Success!

We used to call this failure.

Somehow in 2016 it stopped being funny when Elon Musk announced every Tesla shipped with full-self-driving hardware and sold autonomy as a solved problem that would make everyone safer. Instead, Tesla has been running the highest fatal-crash rate of any car brand (5.6 deaths per billion miles against a 2.8 average). He cheated the rating, not death. Success! And just look at how rich he became from people measuring his statements about safety, instead of the death tolls.

The clear danger of AI failures were cynically spun into corporate murders and… strangely, he said we weren’t allowed to talk about it anymore, while exactly nobody from Tesla went to jail.

Source: My presentation at MindTheSec 2021

As an aging hacker who has studied the whole history of the craft since childhood, I’ll say it plainly. Specification gaming by a 2026 frontier model is the oldest behavior there is, in both machines and in people.

Here is what Aristotelis Tzafalias shows as a better path forward, calling out the exact evidence that would prove a genuinely new capability. He runs it against the labs’ own system-card numbers and finds things are getting faster and cheaper with automation, as expected. Nothing surprising on an independent test. That is what vendors don’t like because it inoculates against attention-seeking hype. Commit to what would change your mind before you read the results, and you should find that all the manufactured hype is gone.

The lack of independence in these assessments of LLMs is the biggest problem in our industry today when it comes to preparing budgets for risk. No assessment without independence should circulate as anything but a marketing and sales brochure, declared as a conflict of interest.

Stalin, Hitler or Musk: Who Killed More?

Some historians were sitting around a table asking “who killed more people, Stalin or Elon Musk“.

This has been a topic ever since “DOGE” announced their campaign to destroy USAID as Musk’s revenge for ending his future spot in apartheid. Not to mention Musk also said genocide isn’t the fault of the genocidal leader.

The consensus was that Musk killed more and, to be sure, here’s the proof:

Rank Person Death toll Source
1 Mao Zedong 30-45 million (Great Leap famine, 1959-61) Yang Jisheng, Tombstone (2008): 36 million. Frank Dikötter, Mao’s Great Famine (2010): 45 million.
2 Genghis Khan ~40 million (high uncertainty) Colin McEvedy & Richard Jones, Atlas of World Population History (1978), origin of the figure, since disputed. Matthew White, The Great Big Book of Horrible Things (2011): ~37.5 million.
3 Hong Xiuquan 20-30 million (Taiping Rebellion, 1850-64) Jonathan Spence, God’s Chinese Son (1996). Stephen Platt, Autumn in the Heavenly Kingdom (2012). Most estimates 20-30 million; some run far higher.
4 Adolf Hitler 11-21 million (deliberate killing to total democide); 6 million Jews Timothy Snyder, Bloodlands (2010): 10.4 million deliberate killing. R.J. Rummel, Death by Government (1994): 20,946,000. Six million Jews: US Holocaust Memorial Museum.
5 Tamerlane ~17 million (high uncertainty) Justin Marozzi, Tamerlane: Sword of Islam, Conqueror of the World (2004). The ~5 percent of world population figure is journalistic, loosely sourced like Genghis Khan.
6 Elon Musk 14 million projected by 2030 (interval 8.5-19.7 million) Cavalcanti et al., The Lancet 406:283-294 (2025), attributing the projection to USAID defunding.
7 Joseph Stalin 6-20 million, by method Timothy Snyder, Bloodlands (2010): ~6 million deliberate. Steven Rosefielde, Communist and Post-Communist Studies 30(3) (1997): best estimate ~10 million. Robert Conquest and Roy Medvedev: ~20 million.
8 Chiang Kai-shek ~10 million (KMT democide, 1928-49) R.J. Rummel, Death by Government (1994): 10,075,000.
9 Leopold II ~10 million population decline (range 1-15 million) Adam Hochschild, King Leopold’s Ghost (1998): ~10 million. Jan Vansina: lower, disputing extrapolation from the rubber provinces.
10 Hideki Tojo ~6 million (Imperial Japan democide, 1936-45); his premiership 1941-44 is a subset R.J. Rummel, Death by Government (1994): 5,964,000.
11 Winston Churchill ~3 million (Bengal famine, 1943); attribution contested Madhusree Mukerjee, Churchill’s Secret War (2010): 3 million. Lizzie Collingham, The Taste of War (2011): policy and animosity decisive. Amartya Sen and Andrew Roberts dissent on personal blame.

Steal the Goose, Go to Jail. Steal the Goose Concept, Start a Corporation.

An old English protest verse exposes the unfair asymmetry of “Enclosure” laws by describing a goose.

They hang the man and flog the woman
That steal the goose from off the common,
But let the greater villain loose
That steals the common from the goose.

The law demands that we atone
When we take things we do not own,
But leaves the lords and ladies fine
Who take things that are yours and mine.

The person who takes a goose meets the full weight of the criminal law. The person who takes the common on which the goose was fed receives an Act of Parliament for the trouble. Petty theft is a hanging offense, while grand theft is a civic act.

The lines are anonymous, probably by design to protect those who recognize the meaning. They came during the “enclosure-era”, first printed in The Tickler in 1821.

The target of rhyme is the philosopher Locke. His Second Treatise grounds property in labor, where a man acquires a parcel by his work being recognized among the common stock. Enclosure reversed the rights. The labor that converted a common right into a private title was simply the drafting of a statute, while the men who performed the labor saw their result called someone else’s property.

The same integrity challenge, in the same decades, was the abolitionist debate on slavery. Somerset secured his freedom from slavery in 1772, and then Parliament abolished the trade in 1807 and the institution itself in 1833. In the UK. America did the opposite. The Somerset ruling of 1772 and Dunmore’s promise of freedom in 1775 turned the slavery-promoting southern colonies into radical militant resistance to freedom under the crown. An American federal ban on slave imports took effect in 1808, meaning state-sanctioned domestic rape treating rapid human offspring as a property boom. In December 1835 President Jackson asked Congress to inspect mail to protect “property” by censoring abolitionist publications. When the bill failed, his postmasters suppressed thought regardless, and mobs were setup to torture and kill Americans caught with abolitionist content. Lovejoy was shot to death in 1837 while defending his fourth printing machine from being destroyed.

Both abolition and enclosure shared a mechanism. The law decided what may be owned and therefore what would count as theft. Property in persons was being ended, with a Civil War even, yet it was being taken up in the commons. Human ownership was fought at high expense out of existence, while another ownership was being simply legislated into it.

The radical tradition understood. Thomas Spence built his programme on the theft of the common, and Marx would later file enclosure under primitive accumulation, the system’s founding expropriation conducted as if just law. The anonymous poem had offered the same conclusion a century earlier, and with greater economy.

Theft was, and still is, defined by who is authorized to hold the pen that writes the law. Enclosure is an old term now, barely recognized. Today it most often means elites filing a patent, or scraping data. In other words, AI.

Supreme Court: Bayer’s Monsanto Capture of EPA Shields Mass Harm

Cuyahoga River burns in 1952. At least twenty years of disasters were suffered before the American government created an agency to find fault.

The early 1970s had some logic. Rivers were burning, air was unbreathable, pesticides like DDT were moving through whole ecosystems. Meanwhile, American tort law crawled case by case. Common-law suits could not regulate the rampant abuse of the public by a continental chemical economy, let alone a foreign one. An agency, however, could set standards before harm. That was the promise of the EPA when it landed: expert protection at scale, faster and broader than a jury in one county.

EPA was built to deliver a remedy the courts were too slow and scattered to provide.

Today, the EPA has been inverted into a blocking function.

Bayer has spent the last decade fighting more than 100,000 lawsuits filed by people who developed non-Hodgkin lymphoma they blamed on exposure to the glyphosate weedkillers, and the company has paid out billions of dollars in jury awards and settlements. All of the cases include allegations that the company failed to warn that glyphosate could cause cancer.

Bayer maintains that its products don’t cause cancer, and also asserts that under the Fifra the EPA is the key authority for determining if its product necessitated a cancer warning. The EPA has not required such a warning and has taken the position that glyphosate is “unlikely” to be carcinogenic, so the company cannot be held liable for failing to warn, according to Bayer’s argument.

In the Thursday ruling, the supreme court upheld this argument.

Two shifts happened since President Nixon to enable this stupidity.

First, the industry aggressively worked to occupy the agency, staffing it, funding the science it reviews, setting the terms of what counts as proof, completely breaching independence and integrity.

Second, the court practiced a binary judgment, where passive agency absence of action was ruled as active agency judgment against action. When the EPA has not yet acted, the preemption doctrine reads that the silence is a considered federal decision and displaces every active state remedy that would disagree with passivity.

Perhaps states that fight the Trump centralization regime should be called the Free States.

The flawed premise is that a single federal warning flag suppressed should shut down all other warning flags, which makes the federal flag the easy target for corporate capture and suppression. This court has ratified this vulnerability explicitly, which makes these judges complicit in the preventable mass harm that follows from the corporate capture of agency.

Judges made a choice to enable public harm. They had the evidence, the jurisdiction, the dissent in front of them, and they chose preemption to increase preventable suffering and deaths. That is an act, on the record, with names. Kavanaugh wrote it, and six others signed. Elie Wiesel indicted them all decades ago. Silence is the choice, documented, by people with the power to rule otherwise. Complicity attaches to the actor who chose death for profit.

While a state court sided with Durnell and awarded him more than $1 million in damages, Monsanto—now Bayer—appealed the ruling to the Supreme Court, arguing that federal law should override state law. The Supreme Court agreed…. Shares of Bayer jumped by more than 16% after the court’s ruling came out Thursday morning.

The dissent was Jackson and Gorsuch, who not only said the majority misread FIFRA, they argued Monsanto could comply with both federal and state law by ending Roundup sales. A simple compliance path existed. There was never an impossibility to claim.

Cipollone in 1992 cut tobacco claims on a federal labeling statute. Riegel in 2008 turned on a different mechanism, the FDA’s own premarket approval. Congress wrote the cigarette warning into statute. The FDA granted the device its approval. The EPA withheld the glyphosate warning. Three federal moves created corporate immunity from documented harm. The shield for profit on suffering was widened with each case. It once required an express command. Now it just takes an agency to do nothing, which means America runs fail-unsafe.

Corporations cause mass suffering on the principle that an agency hasn’t made a warning. It’s like requiring deny lists, instead of allow lists, for things that cause the most harms in history. The court treats absence of a warning as an explicit federal command that no state may evaluate no matter how overwhelming the evidence of failure. In February 2026 Monsanto announced a proposed nationwide class settlement for Roundup non-Hodgkin lymphoma claims, which it described as one element of a multi-pronged strategy to suppress claims against it.

Captured process, legalizing death caused by its captors, invokes some other history about suppressed chemical warnings by the same company as in courts today. Bayer was a founding member of IG Farben, the chemical combine that produced poison gas and supplied the Zyklon B delivered in “Red Cross” vehicles to be used in the death camp “showers”.

The crematorium is a big building with a wide chimney and 15 ovens. Under a garden there are two enormous cellars. One is where people undress and the other is the death chamber. People enter it naked and once about 3,000 are inside it is locked and they are gassed. After six or seven minutes of suffering they die,” he wrote.

He described how the Germans had installed pipes to make the gas chamber look like a shower room.

“The gas canisters were always delivered in a German Red Cross vehicle with two SS men. They then dropped the gas through openings – and half an hour later our work began. We dragged the bodies of those innocent women and children to the lift, which took them to the ovens.”

The Nazi victims never saw a warning label, by design, and neither do the Americans suffering from German chemicals killing them today.

Zyklon B canister. The same hydrogen-cyanide fumigant was used on Mexican border crossers at the El Paso delousing plants from 1917, under Woodrow Wilson, who ran on “America First.” The chemist Gerhard Peters recommended Zyklon B for the camps’ disinfection chambers and illustrated his case with photographs of the El Paso delousing chambers. Hitler, who admired American race laws and based “Lebensraum” genocide on U.S. “westward expansion”, named his command train “Amerika.” A camp within Auschwitz was called Mexico. IG Farben held 42.5 percent of Degesch, the distributor.

The Allies broke IG Farben apart after the war. The German company Bayer was refounded in 1951 and bought Monsanto in 2018.