Barracuda Investigation of Romney’s Twitter Followers

Barracuda Labs (BLabs) has posted a statistical summary about Twitter accounts that pay money for followers. They call it an underground economy.

In short, they setup a Twitter account and paid for followers to find others who did the same. They then recorded this correlation as data on “fake” accounts.

  • There were 72,212 unique fake accounts identified
  • 61% of these fake accounts are less than 3 months old (since April 16th, 2012)
  • Average age of these fake accounts is 19 weeks or about 5 months
  • 55% of fake accounts have ~2000 followings
  • The average number of following for a fake account is 1,799

One issue I see with this summary is the lack of a thorough definition of what constitutes a fake account. They only say the following:

(created by dealers for selling followings or tweets business)

Is an account fake because it is not a one-to-one ratio with a user? Is it fake because it is created for business purposes? Their brief sentence, which we are forced to use as a definition, seems to contradict a BLabs Facebook fake profile infographic one-liner.

…the sole purpose of this fake profile is to entice you into into befriending them.

Sole purpose? What if there is also a business purpose, like with Twitter accounts?

A definition of fake needs to be clarified. And on that note an interesting question is whether BLabs could define as fake the account they setup for research.

Why am I reminded of the Three Laws of Robotics?

There also is no statement of why BLabs consider this an underground economy other than to say in their conclusion that it is a violation of Twitter’s ToS.

Finally, creating fake Twitter accounts and buying/selling followers is against Twitter’s ToS, and gradually erodes the overall value of the social network. Twitter keeps on detecting fake accounts and followings, and suspending them in last few years. However, if they do not move faster and smarter, these fake accounts will continue to be created, blended into the massive Twitter population, bringing bigger and bigger impact.

And the impact is? Eroded value of a social network? At the start of the BLabs post it seemed to say that they were out to protect their customers. Yet their analysis of impact suggests only a weakened Twitter.

They are right about the Twitter ToS, which includes the Twitter Help Center Rules. The section on Spam and Abuse is quite clear.

Username Squatting: You may not engage in username squatting…creating accounts for the purpose of selling those accounts

[…]

Selling user names: You may not buy or sell Twitter usernames

[…]

Your account may be suspended for Terms of Service violations if any of the above is true.

The Twitter profile economy thus could be a grey market activity (legal but an unauthorized/unintended use of goods) unless BLabs investigators are able to prove that all those selling accounts are in violation of actual legal statutes.

The California EDD offers this definition, which emphasizes a lack of government oversight and regulation.

“Underground economy” is a term that refers to those individuals and businesses that deal in cash and/or use other schemes to conceal their activities and their true tax liability from government licensing, regulatory, and taxing agencies. Underground economy is also referred to as tax evasion, tax fraud, cash pay, tax gap, payments under-the-table, and off-the-books.

The BLabs data is very informative and interesting but lacks thorough analysis. A clear definition and more detailed report on the economics would help support their final conclusion. I can agree with their accusation against Romney but only if I agree with some open and uncomfortable assumptions.

…we believe most of these recent followers of Romney are not from a general Twitter population but most likely from a paid Twitter follower service.

Romney's Friend Creation Timeline

Romney is probably buying his popularity and Twitter is taking their time to shutdown accounts that violate their ToS. We have confirmation of what we would already know to be highly likely. We do not have proof of illegal activity or an underground economy.

Ultimately it is easy to find fraud and tell Twitter there is a problem without carefully defining it. It is much harder to profile a threat in order to tell them exactly how to detect and prevent fraud without alienating real users.


Update 8/14: Alex Hutton tweeted about Status People’s faker tool. I used it on Romney’s account; it samples up to 500 accounts and offers the following results:

@MittRomney Faker Scores
Fake 12%
Inactive 30%
Good 58%

To be fair, although this is like Gawker’s report last year (based on PeekYou analysis) that 92% of Gingrich’s followers are fake, PeekYou’s CEO put it like this:

Using algorithms to determine whether an online presence is real or fake is obviously more art than science

New Diesel Taxi for London, NYC, and Tokyo

London makes it very clear that they have chosen a 1.5 dCi EuroV diesel engine with a 6-speed manual as their new taxi standard. Tokyo and NYC are said to be getting the same Nissan NV200 taxi in a new 10 year contract.

Although it is a fairly large size van that can carry five adult passengers and significant load, it is expected to average 53.3 mpg. Clean air is a major marketing point in London’s campaign.

The New York site, however, doesn’t say anything about the engine. Will the NV200 in the Big Apple also get cutting-edge diesel technology? Strangely, there is no mention of a mpg number in any of the U.S. press.

Even worse, the “Taxitistics graphic” for NYC makes no mention of air quality or engine efficiency at all!

Note the black cloud imagery. Thus I suspect the vehicles in different cities will actually be very different. Unlike the high-tech London taxis, NYC passengers should expect to be dragged around by a dated and anemic gas engine that takes more pit-stops because it gets no better than 25 mpg. If it is the same 2.0L 4-cylinder gas engine as in Nissan’s Sentra, then the match-up would look like this:

City London NYC
Engine 1.5L diesel EuroV 2.0L gas PZEV
Torque 216 142
mpg 53.3 25

More power, more efficiency…Londoners will get to their destination at less cost and more quickly and cleanly, because they chose a diesel fleet. For reference, the torque of Nissan’s new diesel is comparable to a Dodge Power Wagon.


A quick calculation:
13,237 taxis traveling 70,000 miles a year is 926,590,000 total miles per year.
At 25 mpg that would be 37,063,600 gallons.
At 53.3 mpg that would be 17,384,428 gallons (19,679,172 gallons saved)

At $4/gallon (current price of both gasoline and diesel in NYC is $3.9/gal) the savings from diesel engines in NYC would be $78,716,688 a year ($0.30 per passenger trip).

In other words, all else being equal, it will cost NYC $6,000/year more per taxi to run gasoline instead of diesel.

In just five years the NYC taxi will spend $30,000 more to operate than a London taxi, yet depreciate in value faster.

Likewise, waiting until electric engines are available (estimated in 2017 for NYC) would waste 78,716,688 gallons of gasoline at a loss of $314,866,752 ($0.40 per passenger trip).

That’s before calculating the emission harm differences. Again, NYC has said nothing about clean air. Taxis with diesel engines also are able to drive further without stopping to refuel, saving significant time and making them more available.

Chinese Security Mysteries and Social Networking

The old saying is that “information wants to be free.” In China this has proven to be helpful for citizen activists who use social networks to report and try to collaboratively solve mysteries in security.

First I noticed in What’s On in Ningbo (WON) the case of a corn cob thrown from a speeding police car.

The corn cob incident

一辆车从我们身边呼啸而过,车上飞下一只白色塑料袋,里面有一根啃完的玉米棒。正谴责,定睛一看,是辆警车,浙B牌照。我们的车速是每小时120公里,此车车速估计在140公里。

That basically says “a white bag with a corn cob was thrown out of a window while driving 90 mph. It was a police car with Zhejiang B (Ningbo) plates.” My favorite part of the story is a quantified risk analysis offered as perspective.

再看玉米棒可能造成的后果,一颗10克的枪弹发射出去后所产生的能量,跟从车速为每小时120公里的汽车上投出一个4千克西瓜所产生的能量不相上下

I’m not sure but I think that says “The danger from a corn cob at 80 mph is like a 0.4 oz bullet hitting a 9 lb watermelon.” The watermelon not only is a reference commonly used for impact calculations but also a very popular fruit in China.

Anyway, photos of the car with a description of the incident with details like “Hangzhou-Ningbo Expressway to the direction of Ningbo, Yuyao near the toll station” were posted to a Sina Weibo microblog site. Soon after the topic gained hundreds of comments and thousands of views, prompting the police to issue a statement.

6月6日下午,我局民警驾驶该警车从杭州押解六名违法嫌疑人返甬,途中一名嫌疑人随手将民警为其提供的已吃完的食品扔出窗外

That says on June 6th a police car was traveling with 6 suspects in their car (really, 6 suspects on the 6th day of the 6th month?). One of the suspects was said to have thrown a corn cob out the window without being detected.

The statement was obviously not satisfactory. It raised questions like 1) why can a detainee throw anything out a window 2) how did a corn cob get in the car 3) why are six detainees in one car 4) why are biodegradable corn cobs in a non biodegradable plastic bag…eh, nevermind the last question.

Eventually the police made more statements and copped to performing below expectations. They apologized for the incident and asked the public for continued supervision and support.

Second, I also noticed in Ningbo news a story about a mysterious gap in surveillance.

The disappearing passenger incident

Soon after a taxi stopped for two passengers the police were called in to investigate.

Twenty minutes into the drive, Hong, who had gotten into the back seat, had disappeared. The door was closed and no sound had been heard.

Having failed to locate Hong or connect with him via his phone, Liang and Peng called the police. The police found that the video on the car’s monitoring system was blank from 7:49 pm to 9:10 pm, spanning the time when Hong disappeared.

The story was posted to a Sina Weibo microblog site and speculatons reached into the thousands. Could it have been a wormhole? Was Hong a ghost? Did he throw himself out the window in a plastic bag?

Instead, it turned out to be a very simple problem.

On the night of August 4, the police declared that Hong did not in fact disappear. He was left on the curb when the taxi pulled away, with Liang and Peng assuming he was inside. Hong’s mobile phone happened to be broken that night.

To better understand the Chinese perspective on this story (and how it became so popular) you have to include some cultural elements. 1) It is common for taxi passengers to sit in the front seat and not look at the back seat 2) Chinese are not accustomed to voicemail and expect people to answer the phone immediately 3) Infrastructure tends to be trusted but not verified.

Dangers in Predicting the Future With Data

Mike Greenfield has some really insightful things to say on his blog about big data statistical risk and the difficulty in predicting human behavior. Take for example his experience with starting a company, which proved how dangerous it was to rely on a sole supplier.

So Facebook acted rationally, optimizing for their own best interests and those of their users. They killed the notifications feature (which we used to tell someone her friend’s child was turning two). They removed boxes and tabs from profile pages (which over a million moms had added to show off their kids’ accomplishments). And they hid invitations (which moms used to tell their friends about our product).

At that time, we were almost completely dependent on Facebook’s channels to communicate with our users and find new ones. We felt like a beer maker preparing for the government banning beer sales in markets, shutting down bars, and only allowing people to drink in restaurants on Tuesdays. Not quite prohibition, but pretty darned close.

I want reiterate that Greenfield is in the business of predicting human behavior based on data analysis. Although he says “Facebook acted rationally” he actually started his blog post with “Facebook, the VCs said, could suddenly turn off all of their communication channels and we’d collapse. We thought they were full of it…”.

Why didn’t he see it coming?

It sounds to me that VCs predicted the danger of losing a sole supplier. That makes sense in a simple predictive risk model. A “rational” behavior model for suppliers who see economic opportunity, however, is a complex and messy business. It really shouldn’t be so casually described as if a supplier who kills their distribution channel is predicted easily or is rational/optimizing.

Although I love the prohibition analogy it probably is not for the reasons Greenfield uses it. Prohibition is a good example of bad regulation and resulting security risks.

Consider for a moment how the consumption of alcohol actually increased in America after it was banned. If Facebook’s regulation of data were like prohibition then we should predict an illegal data running/smuggling boom.

That didn’t happen, as documented by Greenfield. Instead his story centers on “cutting the cord” and walking away from Facebook forever.

Also consider that prohibition in America was led by popular religious extremists (well, popular in Kansas anyway) who violently forced into power a bunch of blatent hypocrites.

The “conservative” politicians who said they favored a “dry” country ended up meaning someone who drank but refused to admit it. In today’s terms it is similar in nature to the radically homophobic politicans.

Those calling for regulation thus can be mired in complex psychological and cultural issues, which makes “rational” predictions of their economic behavior less than obvious. Was Greenfield accounting for a fundamentalist Carrie Nation element to Facebook when he was threatened by “hatchetation” of his data?

The really interesting point of Greenfield’s story is that at the same time he (like most people) predicted a demise of email and replacement with social networking (risk of staying on email), he also was using the venerable traditional direct-communication path of email to save his company from destruction.

As 2010 came to a close, the proverbial feces was hitting the proverbial fan, and we started to look at email as a way out of the ditch. […] Over the course of 2011, we streamlined our content-writing and emailing operations, in the process turning email into a viable re-engagement channel for millions of moms.

The lesson of course is to predict and manage risk related to distribution channels to your customers, which is what the VCs told them in the first place. It sounds to me had he followed his own risk analysis based on a prediction of the future he would have been far worse off. In other words don’t stop using email unless you realize the true risk of giving up ownership and control over your communication.

Fast forward to Greenfield’s more recent post called “Predicting the Future is the Future” and he extols automation.

Automation is incredibly important. It democratizes the process of building and using statistical models, so that a small startup (with lots of data) can build pretty good statistical models without a team of statisticians. These automated statistical models will almost inevitably perform more poorly than their human-built counterparts, but they’re close enough to be competitive.

I really want to agree with him, because technology can make data more accessible and therefore more democratic. Giving out statistical model tools to everyone means they too can start a company and make money from mining your personal data.

But again he leaves out an essential part of behavior — who gets to own and control access to data. This part of risk has to be better defined before we can celebrate democracy and a risk reduction.

His description of the troubles with Facebook give a clear example of how automation can be rendered completely useless — it runs straight into severe power inequality in terms of resource control and management risks.

Alas, back to the Facebook prohibition analogy, every farm in America used to have an apple tree, if not an orchard. Yet the saying “as American as apple pie” is a subtle reminder of the strange story of hard cider in America.

150 years ago, in the 1840s, hard cider held the position now held by beer as the preferred alcoholic beverage of the working class.

Where did it go? It turns out that while technology democratized the process of building farms and making goods it alone was unable to prevent the extinction of the preferred beverage in America.

…the temperance movement remains as a major culprit responsible for the decline of cider consumption in the U.S., but the association of cider with rural WASP culture was the added factor which distinguishes cider from beer or wine. Add to this the economics of beer production, growing urbanization, German immigration, a predatory beer industry, and a substitute drink in coca-cola, and there seems to be enough factors working together to explain why and how cider so completely disappeared.

A statistician looking at data in 1840 might have said cider was the future, but the question is whether they could or would have predicted a much more complicated mix of risk factors related to irrational human behavior (e.g. religious fervor and ethnic prejudice) that killed the market.


England’s farmers were insulated from the risk of politics and industry in early 1900s America, so they still make cider:

cider at Broome farm
Source: Broome Farm on Flickr.

Mother Earth News says it is not too late to learn how to make your own American cider…assuming you can find a reliable apple distributor.