Linguistics as a Tool for Cyber Attack Attribution

Update August 2020: Latest research can be found in a new blog post called Cultural Spectrum of Trust.


My mother and I from 2006 to 2010 presented a linguistic analysis of the Advanced Fee Fraud (419 Scam).

One of the key findings we revealed (also explained in other blog posts and our 2006 paper) is that intelligence does not prevent someone from being vulnerable to simple linguistic attacks. In other words, highly successful and intelligent analysts have a predictable blind-spot that leads them to mistakes in attribution.

The title of the talk was usually “There’s No Patch for Social Engineering” because I focused on helping users avoid being lured into phishing scams and fraud. We had very little press attention and in retrospect instead of raising awareness in talks and papers alone (peer review model) we perhaps should have open-sourced a linguistic engine for detecting fraud. I suppose it depends on how we measure impact.

Despite lack of journalist interest, we received a lot of positive feedback from attendees: investigators, researchers and analysts. That felt like success. After presenting at the High-Tech Crimes Investigation Association (HTCIA) for example I had several ex-law enforcement and intelligence officers thank me profusely for explaining in detail and with data how intelligence can actually make someone more prone to misattribution, to fall victim to bias-laced attacks. They suggested we go inside agencies to train staff behind closed doors.

In other words, since long before the Sony breach news started breaking I have tried to raise the importance of linguistic analysis for attribution, as I tweeted here.

I’m told my sense of humor doesn’t translate well under the constraints of Twitter.

Recently the significance of our work has taken a new turn; a spike in interest on my blog post from 2012 is happening right now, coupled with news about linguistics being used to analyze Sony attack attribution. Ironically the news is by a “journalist” at the NYT who blocked me on Twitter.

I’m told by friends she blocked me after I used a Modified Tweet (MT) to parody her headline.

Allegedly she didn’t find my play on words amusing, but a block seems kind of extreme for that MT if you ask me.

And then at the start of the Sony breach story breaking on December 8, I tweeted a slide from our 2010 presentation.

Also recently I tweeted

good analysis causes anti-herding behavior: “separates social biases introduced by prior ratings from true value”

Tweets unfortunately are disjointed and get far less audience than my blog posts so perhaps it is time to return to this topic here instead? I thus am posting the full presentation again:

Download: RSAC_SF_2010_HT1-106_Ottenheimer.pdf

Look forward to discussing this topic further, as it definitely needs more attention in the information security community. Kudos to Jeffrey Carr for pursuing the topic and invitation to participate in crowds that have been rushing into the Sony breach analysis fray with linguistics.

Updated to add: Perhaps it also would be appropriate here to mention my mother’s book called The Anthropology of Language: An Introduction to Linguistic Anthropology.anthropology of language

Ottenheimer’s authoritative yet approachable introduction to the field’s methodology, skills, techniques, tools, and applications emphasizes the kinds of questions that anthropologists ask about language and the kinds of questions that intrigue students. The text brings together the key areas of linguistic anthropology, addressing issues of power, race, gender, and class throughout. Further stressing the everyday relevance of the text material, Ottenheimer includes “In the Field” vignettes that draw you in to the chapter material via stories culled from her own and others’ experiences, as well as “Doing Linguistic Anthropology” and “Cross-Language Miscommunication” features that describe real-life applications of text concepts.

Big Data Security in 1918: How Far Off Is That German Gun?

Recently I wrote here about the ill-fated American operation “IGLOOWHITE” from the Vietnam War that cost billions of dollars to try and use information gathering from many small sensors to locate enemies.

It’s in fact an old pursuit as you can see from this news image of the Japanese Emperor inspecting his big 1936 investment in anti-aircraft data collection technology.

Even earlier, Popular Science this month in 1918 published a story called “How Far Off Is That German Gun? How sixty-three German guns were located by sound waves alone in a single day.”

How Far Off Is That German Gun? How 63 German guns were located by sound waves alone in a single day, Popular Science monthly, December 1918, page 39

Somewhere in-between the Vietnam War and WWI narratives, we should expect the Defense Department to soon start exhibiting how they are using the latest location technology (artificial intelligence) to hit enemy targets.

The velocity of information between a sensor picking signs of enemy movement and the counter-attack machinery…is the stuff of constant research probably as old as war itself.

Popular Mechanics for its share also ran a cover story with acoustic locator devices, such as a pre-radar contraption that was highlighted as the future way to find airplanes.

The cover style looks to be from the 1940s although I have only found the image so far, not the exact text.

That odd-looking floral arrangement meant for war was known as a Perrin acoustic locator (named for French Nobel prizewinner Jean-Baptiste Perrin) and it used four large clusters of 36 small hexagonal horns (six groups of six).

Such a complicated setup might have seemed like an improvement to some. Here are German soldiers in 1917 using a single personal field acoustic and sight locator to enhance the “flash bang” of enemy artillery, just for comparison.

Source: “Weird War One” by Peter Taylor, published by Imperial War Museum

Obviously use of many small sensors gave way to the common big dish design we see everywhere today. Igloo White perhaps could be seen as a Perrin data locator of its day?

They are a perfect example of how simply multiplying/increasing the number of small sensors into a single processing unit is not necessarily the right approach versus designing a very large sensor fit for purpose.


Update September 2020: “AI-Accelerated Attack: Army Destroys Enemy Tank Targets in Seconds

…”need for speed” in the context of the well known Processing, Exploitation and Dissemination (PED) process which gathers information, distills and organizes it before sending carefully determined data to decision makers. The entire process, long underway for processing things like drone video feeds for years, has now been condensed into a matter of seconds, in part due to AI platforms like FIRESTORM. Advanced algorithms can, for instance, autonomously sort through and observe hours of live video feeds, identify moments of potential significance to human controllers and properly send or transmit the often time-sensitive information.

“In the early days we were doing PED away from the front lines, now it’s happening at the tactical edge. Now we need writers to change the algorithms,” Flynn explained.

“Three years ago it was books and think tanks talking about AI. We did it today,” said Army Secretary Ryan McCarthy.

Three years ago? Not sure why he uses that time frame. FIRESTORM promises to be an interesting new twist on IGLOOWHITE from around 50 years ago, and we would be wise to heed the “fire, ready, aim” severe mistakes made.