Linguistics as a Tool for Cyber Attack Attribution

My mother and I from 2006 to 2010 presented a linguistic analysis of the Advanced Fee Fraud (419 Scam).

One of the key findings we revealed (also explained in other blog posts and our 2006 paper) is that intelligence does not prevent someone from being vulnerable to simple linguistic attacks. In other words, highly successful and intelligent analysts have a predictable blind-spot that leads them to mistakes in attribution.

The title of the talk was usually “There’s No Patch for Social Engineering” because I focused on helping users avoid being lured into phishing scams and fraud. We had very little press attention and in retrospect instead of raising awareness in talks and papers alone (peer review model) we perhaps should have open-sourced a linguistic engine for detecting fraud. I suppose it depends on how we measure impact.

Despite lack of journalist interest, we received a lot of positive feedback from attendees: investigators, researchers and analysts. That felt like success. After presenting at the High-Tech Crimes Investigation Association (HTCIA) for example I had several ex-law enforcement and intelligence officers thank me profusely for explaining in detail and with data how intelligence can actually make someone more prone to misattribution, to fall victim to bias-laced attacks. They suggested we go inside agencies to train staff behind closed doors.

Recently the significance of the work has taken a new turn; I see a spike in interest on my blog post from 2012 coupled with news that linguistics are being used to analyze Sony attack attribution. Ironically the story is by a “journalist” at the NYT who blocked me on Twitter. I’m told by friends I was blocked because once I used a Modified Tweet (MT) to parody her headline.

Since long before the beginning of the Sony attack I have tried to raise the importance of linguistic analysis for attribution, as I tweeted here.

NSA, @Mandiant and @FireEye analysts say no English or bad grammar means u not no American

And then at the start of the Sony news on December 8, I tweeted a slide from our 2010 presentation. Also recently I tweeted

good analysis causes anti-herding behavior: “separates social biases introduced by prior ratings from true value”

Tweets unfortunately are disjointed and get far less audience than my blog posts so perhaps it is time to return to this topic here instead? I thus am posting the full presentation again:


Look forward to discussing this topic further, as it definitely needs more attention in the information security community. Kudos to Jeffrey Carr for pursuing the topic and invitation to participate.

Updated to add: Perhaps it also would be appropriate here to mention my mother’s book called The Anthropology of Language: An Introduction to Linguistic Anthropology.anthropology of language

Ottenheimer’s authoritative yet approachable introduction to the field’s methodology, skills, techniques, tools, and applications emphasizes the kinds of questions that anthropologists ask about language and the kinds of questions that intrigue students. The text brings together the key areas of linguistic anthropology, addressing issues of power, race, gender, and class throughout. Further stressing the everyday relevance of the text material, Ottenheimer includes “In the Field” vignettes that draw you in to the chapter material via stories culled from her own and others’ experiences, as well as “Doing Linguistic Anthropology” and “Cross-Language Miscommunication” features that describe real-life applications of text concepts.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.