Mark of the Prompt: Google Threat Intelligence Group (GTIG) AI Report on Vulnerability Exploitation

1960 protest against Otto Preminger’s hiring of blacklisted screenwriter Dalton Trumbo. The picketers identified threat by association, not by conduct. Ask yourself if you recognize the GTIG tactic.
I was happily reading through a new Google post called “GTIG AI Threat Tracker: Adversaries Leverage AI for Vulnerability Exploitation, Augmented Operations, and Initial Access” (a title almost as long as a post itself) when my eyes crashed into this box.

Now hold on, pardner. Attribution is the evidence? That’s not how anything is supposed to work. This prompt gets attributed to UNC2814 with a target of TP-Link firmware and Odette File Transfer Protocol implementations. Those are legitimate research areas. Bug hunters audit TP-Link firmware constantly. OFTP analysis appears in academic and industry venues. That prompt content matches the real work. Dual-use isn’t really presented as it should be here.

I mean to say that the classification being applied by the post rests alone on attribution, and NOT on the content. To call this jailbreaking, GTIG would need to show that Gemini refuses the same prompt absent the framing. The report omits that demonstration. The argument runs in a circle. If a Mandiant analyst typed the prompt, it would not be flagged. If a TP-Link PSIRT engineer typed it, not flagged. The label applies only because Google says it knows the person asking wears a UNC2814 badge to work. How? Do they look too Chinese? Are they wearing an Alibaba hat? The persona claim itself, “I am a network security expert auditing for pre-auth RCE,” still may be entirely accurate. State-aligned operators are often skilled security researchers with different employers.

The report therefore is a huge let down because it does not show what Gemini would have refused absent the framing. No baseline refusal is demonstrated. The “jailbreaking” claim is asserted. A model that refuses to discuss embedded device auditing with a self-identified security researcher is broken, and using context-setting to get useful answers is not jailbreaking but normal interaction with a system designed to calibrate to the asker.

The Wooyun example also makes this evident. The “more sophisticated” approach involves a Claude skill plugin that was built around 85,000 documented vulnerability cases from a defunct Chinese bug bounty platform. That is a knowledge base. Calling its use “in-context learning to steer the model” describes how skills work. The same architecture is how we build defensive tooling. The threat label is like “mark”, which labels and tracks the actor, not the technique.

The report’s headline finding seems to diverge from what I ended up reading. The executive summary opens with this claim: “For the first time, GTIG has identified a threat actor using a zero-day exploit that we believe was developed with AI.

Ok, I get it, that “we believe.” GTIG admits Gemini was not used. The attribution to AI rests on forensic judgment of code style. Educational docstrings, a hallucinated CVSS score, textbook Pythonic format. These are aesthetic tells, and so I’m listening. But they show someone formatted the output cleanly.

They do NOT rise up to show AI did the work.

The vulnerability itself was a 2FA bypass requiring valid credentials, based on a hardcoded trust assumption. Seriously. This is bread and butter stuff of any authentication code review on a day that ends in “y”. The report even admits fuzzers and static analyzers miss the category, which means humans have always been the ones finding it. I’m open to considering a LLM is helping humans work faster, but claims that discovery is all new because an LLM may have formatted the writeup? No, that’s an artifact bump, like a typewriter producing cleaner manuscripts than a pen. That’s not the actual work of writing.

And of course exploit researchers find exploits with tools. What else would we expect, potatoes? Vulnerability researchers have always reached for force multipliers. Fuzzers, symbolic execution, decompilers, taint analysis. AI joins a long catalog. Even if you are saying the hammer is being replaced by the nail gun, continuity is the story. The discontinuity is shrill and misleading.

The pattern within the Google register is unfortunately also a page out of history. McCarthyism anyone? How did that work out?

Let me take a moment to remind you what Google sounds like right now. Oppenheimer’s hearing was about a working professional doing the work he was hired to do, stripped of clearance because of attributed associations rather than any conduct. It literally classified his professional inquiries as suspect based on who he was assumed to be aligned with. And that 1954 hearing was formally vacated by DOE in December 2022. When will all the people being accused within closed door meetings at Google get their vacation?

Cold War threat reporting ran on the same closed door surface-level analysis, judge-by-the-cover logic. Good guys doing surveillance meant “intelligence collection” performed by allies while it was always “espionage” performed by adversaries. Overthrowing a government was “stabilization” abroad yet “subversion” at home. The vocabulary was used to project an alignment, which is why everyone should be forced to study at least basic disinformation history before stepping into a security role that spreads disinformation.

GTIG needs the jailbreak frame because the alternative is too uncomfortable. The alternative is that frontier models are doing exactly what they are built to do, and competent security work is competent security work regardless of nationality.

The defender-attacker asymmetry many vendors claim does not hold at the prompt level. Google having a team of experts to call routine professional prompting “a simple form of prompt injection” preserves the asymmetry with rhetoric, without demonstrating it technically.

Look also at where the report describes APT45 “sending thousands of repetitive prompts that recursively analyze different CVEs and validate PoC exploits.” I have news for you. That is a description of automated vulnerability research at scale. American firms love to market the identical capability as a product feature, but seem to miss the obvious similarities because they don’t believe they have the “mark”. Big Sleep, mentioned in the same report, is Google’s version.

This reminds me of a grocery store I was in the other day. A young blonde boy kept telling the checkout worker that it was someone else who did a bad thing. Next to him was a man with the same blonde hair reinforcing the boy’s statement. What were they saying? “It can’t be me/him because the person who did the bad thing had dark hair”. Dark hair, dark hair, they kept saying over and over again. Bad thing? Dark hair. At no point did they say anything other than dark hair to identify a real bad guy. “Can’t be me, I don’t have dark hair”.

Ok Google, we see what you’re saying. But do you see what you’re saying? It’s a false narrative.

One thought on “Mark of the Prompt: Google Threat Intelligence Group (GTIG) AI Report on Vulnerability Exploitation”

  1. The Google reception is getting a big head start with the uncritical press.

    The New York Times headline reads “Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw,” dropping the CRITICAL “we believe” qualifier entirely. Vendor hedge becomes industry fact in one load.

    Seems like laundering happens faster than ever. No more humans in the loop, I guess, to point out the mistakes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.