Category Archives: Security

Exposing Anonymous With Frequent Pattern

Eight years ago, in 2003, we proposed and presented the use of linguistic analysis for email author identification. Our use case was started with the investigation of Advanced Fee Fraud (AFF), also known as 419 scams from Nigeria. We proved, albeit from a small data set, that language can identify a message author using several key indicators. We further proved that bias made victims far more susceptible to social engineering attacks.

About five years later, in 2008, an educational institution in Quebec picked up this theme of email author identification by applying pattern analysis to data sets. They released an online paper called A novel approach of mining write-prints for authorship attribution in e-mail forensics

In this paper, we introduce an innovative data mining method to capture the write-print of every suspect and model it as combinations of features that occurred frequently in the suspect’s e-mails. This notion is called frequent pattern, which has proven to be effective in many data mining applications, but it is the first time to be applied to the problem of authorship attribution.

Er, well, they are obviously wrong. The first time was not 2008. It probably was not even in 2006 (when we wrote our paper) or 2003. I would be far more impressed if they gave a little credit to the long history of language and data analysis, let alone our published and presented work. Our presentations on pattern frequency for authorship attribution predates not only their paper but, for at least two or three of the authors, their entire career.

At the start of 2010 we presented our findings at the RSA Conference in San Francisco and showed how anonymous authors could be distinguished using linguistic analysis. We pulled apart email messages, presented them based on their use of language (including stylometric features), and presented a taxonomy that predicts fraud based on key indicators.

The audience in our presentations always gets a quiz at the end; many always seem surprised they suddenly are able to see uniqueness in messages where none existed prior.

I just noticed that the Quebec crew have republished their paper under a more contemporary title with almost the same specific use case in mind: Mining writeprints from anonymous e-mails for forensic investigation

In this paper, we focus on the problem of mining the writing styles from a collection of e-mails written by multiple anonymous authors. The general idea is to first cluster the anonymous e-mail by the stylometric features and then extract the writeprint, i.e., the unique writing style, from each cluster. We emphasize that the presented problem together with our proposed solution is different from the traditional problem of authorship identification, which assumes training data is available for building a classifier.

Here is a major differentiation point. We did not assume a massive amount of training data was available or necessary to build a classifier. Our system can be taught to virtually anyone so that they then can start identifying authorship immediately. We have applied it and presented around the world, from Turkey to Brazil, with success.

Here is another major differentiation point. We were not trying to beg “first time” innovation recognition because we combined the extant body of knowledge in linguistics and security (social engineering). It was done in a novel way to help reduce fraud — stop people from falling victim to 419 scams — but we gave attribution.

We could have saved them a lot of time and hassle since we have been reporting it for eight years now. Perhaps there is a chance for collaboration in the future.

I could go on with differentiation points, but here’s one more. We don’t charge you to read our paper or presentation.

Hard Math CAPTCHAs – Easy As Pie

I mean Pi. Funny example of security control failure:

It seems these scientists want to ward off ruffians who can’t do advanced math. After all, the service they’re offering is access to truly random numbers — a difficult computer science feat on its own, and one that only responsible adults should have access to.

The scientists thought it would be a good idea to give their viewers a math challenge — solve a basic calculus problem to prove they are human. An equation version of the CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is posted on their sign up page. Would this be called a CAPECHA?

Solve Me

But these math elitists may have a problem on their hands. As calculus teachers around the world are now discovering, the Internet will now do your math homework for you. Just go to WolframAlpha and pop in the problem, and boom, you’ll have access to all the random numbers your heart could desire.

Perhaps they did it for the publicity, or just for the humor. Or maybe they did it to drive up the market price for hired CAPTCHA-solution labor. I wonder if the next Craigslist ad for “easy money, work from home” will include basic calculus as a required skill.

It’s China! It’s Israel! It’s…

Pick your favorite bogeyman. The latest outsider attack is probably their fault…

My presentation at BSidesSF this year tried to make the argument that attribution is harder than ever online. Attackers make extensive use of proxies and remote control, so it can be very difficult to trace all the points back to an actual person…and even if you do, they may only be one of a thousand mules following instructions. It was gratifying to hear General Alexander at the RSA keynote on February 17th after my presentation admit to his audience “We don’t have situational awareness”.

I could go into the complicated philosophy of why attribution is a double-edged sword (e.g. users on the Internet do not want to sacrifice their privacy) or go into the long history of technical issues with attribution (e.g. smurfing), but instead I just want to point out the two most recent spectacular attribution failures.

First, WordPress suffered a denial of service attack that came from systems in China. I asked my audience at BSidesSF “how many people in the audience use products made in China” and the entire room raised their hand. Granted, there were only three people in the room (jk), but my point is that “it came from China” should be immediately discounted as a strong attribution link. If a weapon found after an attack has “from China” stamped on it, investigators should not jump to the conclusion that the attacker therefore must also be from China. Even worse is to super-impose Chinese state motives onto a suspected Chinese attacker, all because the weapon is “from China”.

WordPress said last week the attacks might have been politically motivated and aimed at an unnamed Chinese-language blog, but it no longer has that view.

“Don’t think it’s politically motivated anymore,” WordPress Founder Matt Mullenweg said in an e-mail to IDG News Service. “However the attacks did originate in China.”

Mullenweg did not elaborate on the change in view or offer details on the source of the attacks.

I had tried to warn against this in my Operation Sloppy Night Dragon post.

Second, I have a lot of respect for Ralph Langner who has been credited with exposing the details of the Stuxnet attack. When I listened to his recent interview he made points like Stuxnet was very basic because it did not need to be complex and Stuxnet was directed at Natanz, never at Busheir. Why did he say at first it was probably directed at Busheir? In the interview he said it was because he assumed that would be a target of Mossad…in other words, his bias on international politics overshadowed his analysis of the facts. He recently reiterated it was the Mossad.

“My opinion is that the Mossad is involved,” Ralph Langner said while discussing his in-depth Stuxnet analysis at a prestigious TED conference in the Southern California city of Long Beach.

We should not lose sight of the fact that he already has admitted he made one serious mistake because he believed Mossad was to blame before his investigation started. The Mossad certainly has a lot of people spooked, but every suspicious bird and rock is not necessarily their handiwork.

Every piece of dog poop you see, on the other hand, should in fact be attributed to the CIA.

I appreciate Langner’s honest, clear and open style; yet it seems when he switches to geopolitical analysis he overlooks important data points like the significance of Pakistan and German intelligence operations.

Note the recent mass exodus of US special forces and operatives from Pakistan after the arrest of Davis. The US denies he was anything more than a diplomat, but let’s face the fact that a fight with Afghans and Iranians makes Pakistan a really good proxy. The British certainly made this point when they told the CIA under Tenet that Iran was stealing nuclear secrets from Pakistan. Without the Davis incident (he killed two motorcyclists that probably were trying to assassinate him) we would have far less data on how Pakistani operations might be attributed back to American objectives. Instead an exodus of US operatives now is suggested by some to be related to the drop in US drone attacks in Afghanistan (e.g. disruption of intelligence channels); it probably also is impacting other Pakistan-originated operations that could affect Iran (e.g. Stuxnet).

While there is a case to be made that Pakistan has been a proxy to US and Israeli objectives, that is far from achieving attribution. Maybe Britain was acting on its own, with the support of Germany, on behalf of the US. Time will tell and probably reveal a more complicated picture than we might believe today; and that is just for the physical world. Take for example the overthrow of Iran’s Mossadegh in 1953. It served British objectives, but today we know it was an American-led operation masked to look like an insider revolt against nationalism, despite the fact that the prior year Iran’s nationalist movement fit American interests. Attribution of crowd events was hard. Attribution of Internet crowd events is even harder.

How the US Fell Behind in Broadband

The CEO of Sonic.net, a broadband provider, has a blog post with some interesting details. Here is his argument for why the US has such slow broadband.

In 1996, the US Congress kicked off the broadband revolution when it passed the Telecom Act. The 1996 Act created a level playing field for competitive carriers, and brought about widespread deployment of DSL and other broadband technologies.

Then in 2003 and 2004, the then Republican led FCC reversed course, removing shared access to essential fiber infrastructure for competitive carriers and codifying instead a policy of exclusive use and “multi-modal competition”.

[…]

Elsewhere in the world, regulatory bodies followed the lead of the US Congress and separated essential copper and fiber infrastructure from the services and providers who used them, and the result has been amazing. In Asia and Europe, Gigabit services are becoming common, and the price paid by consumers per megabit is a tiny fraction of what we pay here at home.

The bottom-line seems to be a failure of politicians to fight for better management of shared (collective) resources. The US needs a national broadband policy that aggressively promotes true competition, based upon the separation of retail network services and wholesale network transport. Greater freedom and innovation clearly can come from shared roads, shared electric lines, shared stop-lights, shared fire-hydrants…why not shared fiber?

We must build new fiber all the way to your home, passing by along the way the idle fiber infrastructure that the FCC set aside nearly a decade ago.

The American government and phone companies in 1992 said they were working to put fiber to homes.

The phone companies painted a bright picture of the wonders of fiber optics and the Information Age — the latest movies available at the flick of a remote control, the Library of Congress via a personal computer and picture phones out of “2001: A Space Odyssey.

It was a good start under President Clinton, but serious impediments stood in the way. The Brookings Institute in 2002 tried to get the Bush Administration to turn up the heat and put the focus on improving broadband speeds:

The principal source of the problem is monopolistic structure, entrenched management, and political power of the ILEC and CATV sectors, worsened by major deficiencies in the policy and regulatory systems covering these industries.

The Sonic.net CEO explained above how that all turned out, as the US watches the world pass it by. One might think the following sentence would have received more attention, even from a President busy starting two wars:

Failure to improve broadband performance could reduce U.S productivity growth by 1% per year or more, as well as reducing public safety, military preparedness, and energy security.

Alas, while the US rapidly increased domestic broadband subscribers in 2001 to 2009 from 9% to 63.5% it actually has been in decline relative to the rest of the world. Today it does not even make the top ten — behind fifteen or more other countries.

US Broadband in 16th Place

Even European Mobile Broadband Penetration (e.g. smart phones) is twice that of the Americas.