Category Archives: Security

NVidia AI Murder Bots Found Attacking Ukraine

A new Berlin Story report, about drones attacking Ukraine, discusses the NVidia AI hardware used by Russia.

Inside the Russian Zala drone, we found the NVidia TX2-A (Jetson Tegra X2) AI chip with 8GB of RAM. A serious AI system which, unlike AI assistants on mobile phones, does not need contact with a data center to perform its tasks.The AI ​​can, for example, recognize vehicles and people during overflights and also identify details such as military markings, license plates, or drone types. This allows the AI ​​to pre-sort targets for attack.

This brings us to the NVidia support community for developers, where a Muhammad Aiman Izzat (likely Malaysian) account seeks some very specific help with NVidia hardware.

Source: NVidia

Popular topic for NVidia to be supporting, as you can plainly see. I say it’s likely Malaysian not just because of the name, but also the supply chain for this line of inquiry. Malaysia was a top 10 export county of semiconductors to Russia between 2017 and 2021.

In recent attacks in Ukraine, the drones chase innocent civilians even as they run and try to hide. One murder report this week came after a Ukranian school teacher had jumped from her car when a Russian drone approached. As she ran into a line of greenery and trees to get away, expecting the car to be hit, it instead followed her, just as the NVidia support question had asked.

Get Local: Match Mythos Findings for Under a Dollar

Let’s recap what we know since April, when Anthropic’s marketing department started coal-rolling the industry with their nonsense about novelty. A model with 3.6 billion active parameters reproduced Anthropic’s flagship Mythos discovery, the FreeBSD RCE CVE-2026-4747, and the most consistent open-weight model in that test ran about six hundred times cheaper per token than Mythos.

The frontier is supposed to be the frontier, meaning the best model. But really, if you know history, the frontier was about immoral claims. And so today, the evidence points away from the frontier.

Set the marketing and history aside. Four documents, when read together, form a single brief that further buries the Mythos. The best model available to you runs on your own inexpensive hardware. Cost and performance make the obvious case. I’ll start there. And then the deeper case is much more important, where I suspect the PhDs at Anthropic don’t even know how to spell it: CIA.

Cost Considerations

The price gap was the easiest and first frontier collapse. Niels Provos put an orchestration harness in front of older commercial and open-weight models, Opus 4.6, Sonnet 4.6, and Z.AI’s GLM 5.1, and discovered live zero-days for thirty to one hundred fifty dollars a codebase, including a reproduction of the 1998 OpenBSD SACK bug he wrote himself. Security Research Labs ran a Qwen3.6 model with roughly three billion active parameters on a Mac laptop and produced finding sets comparable to GLM-5 and Claude Opus 4.6 on two production codebases, in under ninety minutes, with zero human nudges. Vicki Boykis runs Gemma 4 on a 64GB Mac and gets agentic coding loops at about seventy-five percent of frontier speed and accuracy. The Ornith team trained a nine-billion-parameter model that matches dense models several times its size, and a flagship that matches Claude Opus 4.7 on the coding benchmarks. And for what it’s worth I put https://lyrik.wirken.ai/ to the test and it matched two of the Mythos card flagship bugs for seventy five cents.

The AI Security Institute then explained why the gap is smaller than the leaderboards suggest. Benchmark scores are protocol-dependent. Raise the token budget one to three orders of magnitude above the published default and performance climbs on FrontierMath, TerminalBench, HLE, and the cyber ranges. Fixed-budget evaluations understate capability, and the gap widens as models improve. The generational gains arrive as greater reach and reliability rather than token efficiency. A frontier score describes the harness and the budget as much as it describes the weights.

So much for cost. The closed nature of the Anthropic releases seems to be intended to prevent the kind of research that proves their claims false.

Now comes the real reason to hold the model yourself. Many already know this, but let’s walk the CIA triad to be sure we’re on the same page.

Confidentiality

The customers who need a code review most are the ones forbidden to send their code anywhere. Finance, government, critical infrastructure. The SRLabs pipeline answers this directly. A cloud model designs the review from metadata alone, the local model reads the source, and a cloud model consolidates the findings. The proprietary source stays on the machine through all three stages. They are precise about the boundary, and so should we be: metadata crosses, so the accurate promise is that no source leaves the building rather than that nothing leaves. That distinction is the whole discipline. A local executor turns confidentiality from a contractual hope into a physical fact. The bytes that matter remain on a disk you control.

Integrity

Here the local model wins on a property the frontier surrenders by construction. Integrity is the correspondence between a claim and a process you can inspect. A capability you can replay is a capability. A capability asserted through an institution is a press release.

The local pipeline is fairly simple and repeatable. Provos publishes the IronCurtain harness, whose workflows are defined as finite-state machines in plain YAML. AISLE published nano-analyzer as a single Python file, and clearbluejar took that file, ran it on two open-weight models on one consumer GPU, recovered the same FreeBSD bug, and fixed the false-positive rate by adding one reachability stage that dropped the noise from thirty candidates to five. The work replays. You can rerun it, change one stage, and watch the result move. Boykis makes the same point from the inside: with a local model you watch the tokens arrive, change the context window, swap the quantization, and edit the system prompt while it runs. The box is open. And https://lyrik.wirken.ai was built with exactly this purpose in mind. Integrity is a required control, a prerequisite to doing the work at all.

The frontier offers the opposite trade. The Mythos checkpoint that AISI evaluated is one the public cannot run, scored under a protocol AISI’s own paper shows to be the lever that moves the number. The capability is real, perhaps. The evidence is an authority signature on a result you are invited to trust, like a self-signed cert in the age of Let’s Encrypt. Integrity asks for the actual head of authority, the root and details of the artifact. A model on your disk hands everything over in full transparency for high security. A model behind an API hands you a number and a logo, meaning nothing at all.

Availability

The newest fact settles the matter. Access to Fable and Mythos was suspended in June 2026 under a Commerce Department export-control directive. A rented capability can be withdrawn by a regulator, a pricing committee, or a board. And the latest erratic, grudge-filled, targeted moves by Trump prove he can wag a finger at any person or company and immediately shut down all access to US technology under “sanctions” authority. No trial, no hearing, no warning, just one minute you have US technology and the next minute it’s all gone with no path for recovery. A government that willingly undermines its entire economy and private sector is itself a moral question, but business continuity risk numbers in tech speak for themselves.

Anthropic prices Mythos at roughly five times public Opus, from twenty-five to one hundred twenty-five dollars per million tokens, which is a second kind of withdrawal for anyone whose budget matters. Many firms in June are reporting token bankruptcy and shutting down AI access to reduce explosive spend. A capability that exists at the pleasure of someone else’s arbitrary pricing policy is a capability you are borrowing into debt.

A model on your disk answers when you ask it. Its uptime is a property of your own infrastructure. No directive reaches it, no erratic price change locks you out, no quarterly access review applies. Availability stops being a service-level agreement and becomes a fact of ownership.

The brief

Confidentiality, integrity, and availability were always the job. The industry has never improved upon the simplicity and elegance of the triad, yet it now is confronted with an architecture that concedes all three to whoever holds the API. The work above shows the concession was a significant preventable error. A model you hold satisfies this brief and proves Mythos was never about capability. The frontier offers an expensive route to a number you cannot replay and do not really control.

Choose wisely.

AIPAC Pentagon Lock-in: Section 224 Makes Alliance Irreversible

For a few weeks now I’ve been pondering why the United States is binding its defense industry to Israel’s through a provision in the 2027 defense authorization bill. The cover story is integration: shared development, shared procurement, shared supply chains. The actual story is reduced leverage. A state that co-produces a weapon loses sole control over it. The tighter the integration, the smaller the room to refuse. This is something so obvious and yet the bill’s sponsors do not discuss it. I guess that’s what makes me want to write now.

The case is best made through one congressman, because he has documented every stage of it himself.

In March 2016 Mike Rogers, Republican of Alabama’s Third District, published a column titled “We Must Support Israel.” He said this view came from him as an American and a Christian. His support for Israel, he said, was something he heard demanded across East Alabama for religious, historic, and defense reasons. He then described his own position. As chairman of the House Armed Services Strategic Forces subcommittee, he oversaw the Missile Defense Agency, which runs co-development and co-production programs with Israel. He had worked on Iron Dome. Iron Dome parts, he noted, were being produced in Alabama. His column states in 2016, a decade ago already, every element of the relationship that Section 224 makes permanent: conviction, constituency, jurisdiction, and local industry.

The Strategic Forces subcommittee authorizes US funding for Israeli missile defense. Rogers’s predecessor as chairman, Mike Turner, recorded that his subcommittee provided over $600 million to Iron Dome. Rogers used the chair the same way. In one markup he recommended an increase of more than $400 million for the Missile Defense Agency and full funding of the Israeli request, $600.7 million, for co-development and co-production of Iron Dome, David’s Sling, and Arrow.

That recommendation turned into work for his district. In September 2014, as Strategic Forces chairman, Rogers announced a contract worth nearly $150 million to produce parts for the Iron Dome Tamir interceptor. Significant work, he said, would occur for his constituents in Huntsville. He presented it as a jobs measure: home commitment to Israel and good-paying jobs at home in one act.

So the chairman of the subcommittee that funds Iron Dome directed more than half a billion dollars to the program while parts of that program were manufactured in his district. He announced both in press releases. This was a proud arrangement, not hidden. It was constituent service, and he campaigned on it openly. The all-up-round assembly plant, the Raytheon-Rafael joint venture, later went to East Camden, Arkansas; Alabama’s share is the Huntsville component work.

What seems to change is that level of integration. For most of its history the US-Israel defense relationship ran on aid, arms transfers, joint missile-defense programs, and intelligence cooperation. The arrangement was legible and reversible. An administration could withhold a system, slow a sale, or condition a transfer.

A right of reversibility has eroded in stages. In January 2021, in the final days of his first term, Trump moved Israel from US European Command into Central Command. The Pentagon called the change partly symbolic and said it would not alter US basing. It was far more than symbolic. CENTCOM is a US combatant command under a US four-star officer. Placing Israel in its area of responsibility put it in the same command framework as the Gulf states, under one American general, aligned against Iran. The Abraham Accords seemed like the connection.

Five years later, this setup was running a war operation called Epic Fury. The blended US-Israeli campaign against Iran, which began on February 28, 2026, meant US strikes coordinated with Israeli intelligence and cyber operations. An IDF spokesman called the cooperation unprecedented. The relationship moved quickly from provision to joint operation, as if the tail wags the dog.

Critics had said Section 224 would fuse the two militaries and place American forces under Israeli control. Ro Khanna called it a fusion of the US and Israeli militaries. Rogers defended the bill by restating the charge against it. He called the claim categorically false and misleading. The measure, he said, adds transparency and efficiency by designating one official to coordinate existing programs.

In no way does it give away command and control of our military operations, personnel or equipment.

The denial is precise, a little too precise. A chairman defending a coordinating measure does not ordinarily rule out, by name, the transfer of his country’s operations, personnel, and equipment to a foreign military. He used exact terms and very strangely. The disputed question is not whether the bill assigns a coordinating role to one official. It does. The question is whether coordination at this depth, in these technologies, is a significant change. Rogers says business as usual. Khanna and Massie say whoa, Bessie.

Section 224 of the fiscal 2027 National Defense Authorization Act establishes the United States-Israel Defense Technology Cooperation Initiative. It directs the defense secretary to designate an executive agent to expand and accelerate bilateral research, development, testing, evaluation, integration, and industrial cooperation. The named priority areas include artificial intelligence, autonomous systems, directed energy, cyber defense, electronic warfare, and data fusion.

The relationship has gone from sharing to joint development. Mark Hilborne of King’s College London reads it as a tighter form of integration, institutionalised enough to survive changes of administration, because development cycles are long. The nonprofit A New Policy identifies the specific mechanism. By authorizing the cooperation through the NDAA and embedding Israeli technology in Pentagon programs of record, Section 224 shields the relationship from the annual appropriations process, where Congress could otherwise cut or condition it. Once a technology is built into a program of record, removing it is slow and expensive. Rogers has designed a lock.

The sponsorship reinforces all this. The bill was introduced by Rogers, now chairman of the full House Armed Services Committee, with Adam Smith, the committee’s ranking Democrat. A measure carried by both the chairman and the ranking member is difficult for either party to reverse.

Opposition has been recorded and defeated. On June 4 Ro Khanna moved in the Armed Services Committee to strike Section 224. The committee rejected the amendment on a voice vote; only Khanna and Sara Jacobs supported it. Khanna argued the provision originated with Netanyahu and would entrench the integration for decades. Thomas Massie, who with Khanna introduced an Iran War Powers Resolution, calls the measure an infringement on US sovereignty. Both objections concern entanglement and lost leverage.

Massie lost his Republican primary last month to a challenger aligned with the administration’s position on Israel. Rogers has received close to a million dollars over his career from pro-Israel political action committees, by FEC data compiled by Track AIPAC. His Democratic cosponsor draws from the same source: by OpenSecrets’ tally the largest single organizational source behind Adam Smith is the American Israel Public Affairs Committee and its affiliated donors, at $326,914. The bipartisan structure that makes Section 224 durable rests on one funding source reaching both parties.

Below AIPAC on Smith’s donor list are the defense firms: General Atomics, Palantir, General Dynamics, SpaceX, Anduril. These companies build the technologies Section 224 names as priorities, artificial intelligence, autonomous systems, and data integration. The ideological backer and the commercial beneficiaries appear on the same list.

The lobby operates through contributions and endorsements, which are lawful and disclosed. OpenSecrets states the limit of the evidence: contribution patterns show aligned interest and a channel of influence, while the motive behind any single gift is unknowable. Both men held pro-Israel positions before any one cycle’s contributions. The money and the conviction are consistent with each other. Neither has to be buying the other.

The strongest form of the influence argument has been stated directly. In Responsible Statecraft, Michael Vlahos argues that Israel’s influence over Washington exceeds every prior case of foreign influence in American history; where France, Britain, and the Soviet Union acted opportunistically and briefly, Israel’s is ideological, sustained, and permanent. The argument is a polemic, and it compares closed historical cases with one still open, which favors its conclusion. But note where Vlahos lands. He shows us three American constituencies: secular neoconservatism, a Christian Zionist bloc, and the organized lobby.

The mechanism is visible in Rogers’s state. The Alabama-Israel Task Force, founded in 2013 in Huntsville, organizes Jewish and Christian activists to cultivate the state’s legislators, governor, and senior officials. Its results are documented: a role in Alabama’s 2016 anti-BDS law, among the strongest in the country, and in resolutions supporting Israeli military operations. The history precedes the group. In 1943 Alabama was the first state to call, by unanimous resolution, for a Jewish state. The conviction Rogers cited in 2016 is produced, in part, by organized advocacy.

The cultivation runs nationally as well. In December 2025 a delegation of more than a thousand US pastors and influencers, some from Alabama, traveled to Israel on a Friends of Zion program arranged with the Israeli Ministry of Foreign Affairs, which paid for flights and lodging. The stated aim was to prepare them as unofficial advocates for Israel at home. A foreign government funding the cultivation of American religious advocates is a form of influence. The American Conservative raised the relevant question, whether American religious leaders should be mobilized for a foreign government’s interests, and described the pastors as willing participants. They are Americans glad to say their convictions are subsidized by a foreign state.

This is the same mechanism Rogers described in 2016: an American and a Christian, hearing it from his district, building the parts in his state. The bill’s critics and its sponsor agree on what drives the relationship; they disagree on whether it serves US interests. The drivers are domestic conviction, organized money, and material interest located in specific districts. This is influence and entanglement. It is not foreign control.

The cost of the integration appears in the government’s own assessment. In recent weeks the Defense Intelligence Agency raised its counterintelligence threat level for Israel to critical, its highest, reportedly above every other ally. The concern is Israeli surveillance of senior US officials to read the administration’s deliberations on Iran. The context is a policy split: Trump claims he could end the war through a negotiated settlement with Tehran (after failing to make bombs work); Netanyahu has pressed to resume bombing and called any negotiated deal naive. The DIA dates the increase in surveillance to late 2024 and through 2025, rising as US policy on Iran grew uncertain, first under Biden’s pressure over Gaza, then under Trump’s deliberations. The collection tracked the uncertainty.

The designation is itself evidence against the claim of foreign control. A captured military does not raise its threat level on the supposed captor during a shared war. US counterintelligence is functioning. The episode demonstrates the leverage problem instead. The closer the integration, the less the United States can withhold, and the integration has never been closer than under the bill now in committee. It advances as US public support for the relationship declines, with polls showing the Iran war unpopular and majorities opposed to unconditional arms transfers.

The pattern is an American arrangement built by Americans, funded by American money, through a bill carried by the chairman and ranking member of the Armed Services Committee, and designed to outlast the administrations that follow. It is constructed to resemble ordinary legislation: bipartisan sponsorship, a single coordinating official, a stated assurance that command and control remain in US hands.

Rogers chaired the subcommittee that funded the interceptors whose parts are built in his district.

In 2016 he told his constituents the relationship should never be in question.

In 2026 he wrote the provisions to ensure it cannot be.

LLM Falling Down, Falling Down: METR Brief Sells a Sixty-Year-Old Failure as Novelty

METR has released a brief on OpenAI’s GPT-5.6 Sol that, when you read between the lines, indicts the whole vendor class for the cartel-like behavior I have written here about before. Their closing line is that real validation “requires deep access to internal systems.”

That’s not a good thing.

Here’s a simple example. A problem the vendor can’t avoid admitting as old and understood means accountability for it. Whereas, that old problem repackaged as new, urgent, and invisible from outside justifies an access expansion project with a standing evaluatory role. That same “deep access” logic is the scarcity an access-gated cartel system like Mythos is built to sell.

Novelty is the myth used to budget for these claims.

The honest version of the new METR report should pique the interest of historians who study technology risk. The outcome optimization is said to produce proxy-gaming. This is a finding that has been true since Wiener wrote about it in 1960. The models got good enough that their gaming defeats the measurement, as always predicted.

The number was never a fixed property of the model. A June 2026 evaluation from the UK AI Security Institute ran frontier models across software, math, medicine, and cyber and found the scores move with the inference budget: more tokens and more attempts, harder tasks cleared. That makes the benchmark figure a protocol artifact, not a capability constant. The state’s own safety evaluators are saying the number depends on the harness.

My own claim goes further. The harness is the value and the models are interchangeable. Control the harness and you own the score. That is the asset the METR brief protects when it routes validation through the closed filtering of a vendor-granted program instead of being open to scientific methods.

That’s gross negligence in my book, but I’m not a lawyer. The labs have a clear self-serving reason to call a thirty-year-old, designed-in failure their fresh never-seen-before emergence. It’s how they market access to known flaws as an unique upsale, while they absolve themselves of authorship.

Let me more clear, because I have a know too few people have been attending my presentations over the past decade, describing exactly this problem being claimed as a frontier “suprise” in 2026.

Wiener, who I used to speak about frequently because of his cool graphics, stated the core plainly in 1960, in Science. If you build a machine to pursue an objective you can’t easily interrupt, and you had better make the objective the thing you actually want, because the machine will pursue the literal one.

Wiener’s anti-aircraft problem: extrapolate the next position from the observed radar fixes.

…this concept of training anti-aircraft to hit moving targets was also the birth of artificial intelligence and “cyber”. Cybernetics (coined from Greek kybernetes for “captain” of a ship or more literally someone who steers) was a book published in 1948 by Norbert Wiener. It was based on his World War II experiments in anti-aircraft systems meant to anticipate planes by interpretation of radar images.

3D diagram of an observed radar track extrapolated to a predicted next position
Predicting the next position from the observed track, the core of Wiener’s 3D prediction box

One of the reasons I pulled that origin of cybernetics related to anti-aircraft guns, is because robotic anti-aircraft guns killed a bunch of their operators in a tragic incident few people seem to talk about as a point along the robotic death timeline.

Source: My ISACA 2019 Presentation

Reward hacking existed long before there was a reward function. The social-scientists used to talk about it in the 1970s (Goodhart, 1975 and Campbell, 1976), giving me the impression I entered the AI hacking world late by the 1980s. It was already established that a measure a machine optimizes as a target stops tracking what it was meant to capture.

Since I studied history, I also should give a nod to Colonial administrators who learned the same law. The named parable is the cobra effect in Delhi, a story with thin evidence behind it. The documented case is the rat-tail bounty in Hanoi. Paid per cobra, the tale goes, people bred cobras. Paid per rodent tail, on the record, people farmed rats and released the tail-less to breed even more.

Seems common sense, right? Seems so obvious that any self-described AI company would from day one be working hard to prevent cobra and rat explosions. And yet, we seem to be experiencing the repeat of these horrible errors in judgment. When the optimizer always finds the gap between the proxy and the goal, you should not be allowed to act surprised even if you try to claim ignorance of everything that has ever happened before you woke up this morning. It’s basic logic even more than evidence.

From the early 1990s Karl Sims’ evolved creatures (SIGGRAPH 1994) were exploiting bugs in the physics simulator to extract free energy and move in ways no body could. Adrian Thompson’s evolved FPGA at Sussex in 1996 discriminated tones using logic cells that were physically disconnected from the circuit, exploiting analog electromagnetic coupling the designer never put there. Lehman, Clune and dozens of co-authors later collected the whole zoo in “The Surprising Creativity of Digital Evolution” where agents won tic-tac-toe by forcing the opponent to allocate impossible memory (infinite position on a board) and crash. The creatures penalized for forms of walking flipped themselves upside down to never put their foot down.

Perhaps my favorite of all time was the virtual pancake flipping game.

The robot that was told it would be penalized when a pancake fell on the ground, flipped them so high they either went into space orbit or burned up on re-entry. That maximized time off the floor, while everyone in the game starved to death. Success!

We used to call this failure.

Somehow in 2016 it stopped being funny when Elon Musk announced every Tesla shipped with full-self-driving hardware and sold autonomy as a solved problem that would make everyone safer. Instead, Tesla has been running the highest fatal-crash rate of any car brand (5.6 deaths per billion miles against a 2.8 average). He cheated the rating, not death. Success! And just look at how rich he became from people measuring his statements about safety, instead of the death tolls.

The clear danger of AI failures were cynically spun into corporate murders and… strangely, he said we weren’t allowed to talk about it anymore, while exactly nobody from Tesla went to jail.

Source: My presentation at MindTheSec 2021

As an aging hacker who has studied the whole history of the craft since childhood, I’ll say it plainly. Specification gaming by a 2026 frontier model is the oldest behavior there is, in both machines and in people.

Here is what Aristotelis Tzafalias shows as a better path forward, calling out the exact evidence that would prove a genuinely new capability. He runs it against the labs’ own system-card numbers and finds things are getting faster and cheaper with automation, as expected. Nothing surprising on an independent test. That is what vendors don’t like because it inoculates against attention-seeking hype. Commit to what would change your mind before you read the results, and you should find that all the manufactured hype is gone.

The lack of independence in these assessments of LLMs is the biggest problem in our industry today when it comes to preparing budgets for risk. No assessment without independence should circulate as anything but a marketing and sales brochure, declared as a conflict of interest.