Spamdog Millionaire – The geography of social media spam

Posted on February 28, 2009, 8:32 pm, by Philip Jacob, under social media, spam.

It seems that the Nigerians are busy with other work these days.

A few months ago, we noticed some content starting to appear on StyleFeeder that we weren’t comfortable with. Usually, people post about clothes, shoes and furniture and that kind of thing, but we were seeing posts about illegal movie downloads, bedroom drugs and more of the usual suspects. Now, I’ve written about email spam elsewhere in the past and Savage has a lot of experience with email in his own right, but there was one big, noticeable trend that jumped out as we started investigating the problem.

We have over a million registered users, so “digging in” was something that was well beyond the means of a manual effort on our part. Definitely a needle in the haystack situation. In just a few days, we developed a tool that we call Assassin (I guess we were not feeling terribly original) that digs through users’ accounts, extracts a bunch of features and does some analysis on them using some AI techniques. Because we were just getting started with this software, we didn’t want to start auto-killing accounts until we were comfortable with it. But that was a few months ago. What we now have is a mature social media spam detection facility that runs against our production dataset and is remarkably adaptive in how it ferrets out the bad guys.

Last week, I did some analysis of the accounts that we have closed so far and it revealed my suspicions. For each account that we have closed due to spammy activity, I ran their source IP addresses through a GeoIP lookup and graphed the data using DabbleDB (which I had been meaning to play with for some time – more on that later). The result: India, in a word. Pakistan, too.

Originating countries of social media spammers on StyleFeeder

Here’s where this gets particularly interesting. Someone asked us why we didn’t just firewall India and China on the basis that these users aren’t monetizable in any meaningful way for us (many products listed on our site are sold by retailers that won’t ship to India or other countries outside of North America or Europe). Only 0.54% of our legitimate traffic comes from India, too. There’s a low signal to noise ratio.

This is not an entirely unreasonable reaction. It takes a simple cost/benefit equation to realize that users in these countries are more trouble than they’re worth. I felt that this approach, however, was too heavy-handed in that it was inevitably going to be a blunt instrument that caused a lot of collateral damage. Plus, we needed a solution for social media spammers within North America and those using proxies and cracked hosts, so it wasn’t going to save us much.

But this suggestion stuck with me. Developing software like Assassin is beyond the capability of many startups, both technically and in terms of the resources that we were able to bring to bear against this problem. They use the Craigslist model for policing content: rely on your users to complain or flag it for you. For those companies, shutting off 40% of the global population may be an easy decision.

What concerns me is that easy decisions like this may result in a fragmented Internet if small, innovative startups are forced to make decisions like this. Let’s say that Twitter decided that there were too many social media spammers in India (perhaps picking on the wrong guy, like this case). What’s the answer? Block India? If this kind of decision was enacted again and again in any regularity, the long term impact outside of North America and Europe could be material.

The irony here is that we saw cases of companies who had hired social media spammers in India to put spammy content on Squidoo (based in North Carolina) and then link to it from StyleFeeder (based in Boston) in order to help a car dealer in Connecticut sell cars made in Japan. The global relationships in this problem are inescapable. Who’s to blame? There are a number of ways to answer that question, most of which can be at least partially addressed by looking through the familiar lenses that we use to talk about the email spam problems. I’ll avoid touch that, because I think that “Who’s to blame?” is not the big question this time.

Who suffers? Well, if country-level blocking was deployed by a large number of startups, it’s clearly the Indian non-spammer regular users who would suffer, at least in part due to actions of their own countrymen. When you look at companies like Friendster – with something like 80% of their traffic coming from outside the USA – you have to wonder if opportunities might be inadvertently lost. Or Orkut, which is hugely popular in Brazil. Whether explicit or not, most startups adopt a usage policy that keeps their sites open to the entire Internet and then add in specialized rules to limit access over time.

Social media spam differs from email spam in a few key respects. With email, I’m the only one bothered by the spam in my inbox. When comment spam appears on a blog or when crappy posts appear on a social website, it becomes a part of a shared experience. The reach can be so much greater when the SEO effects are factored in; perhaps this explains why we’re seeing entire companies dedicated to this kind of spammy behavior set up. I’ll avoid mentioning them by name or linking to them, because I don’t want to give them publicity.

How widespread is this kind of blocking by startups who are susceptible to the armies of computer-literate Indian social media spammers? I’m wondering what other small companies do when faced with annoying users in countries that aren’t part explicitly part of their target markets. If our experience is representative, this challenge may be more widespread than most people realize.

12 Comments

Philip Jacob / Whirlycott » A brief analysis of social media spam says:

February 28, 2009 at 10:43 pm

[…] A fun little post over on the StyleFeeder tech blog about social media spam. […]
Spamdog Millionaire - The geography of social media spam … | dsecure.net says:

March 1, 2009 at 7:48 am

[…] Read the original: Spamdog Millionaire – The geography of social media spam … […]
datalibre.ca · Datatainment! says:

March 5, 2009 at 9:43 am

[…] for the latter, I came across a title called Spamdog Millionaire – The geography of social media spam, which I could not resist reading! In this case Philip Jacob on the StyleFeeder Tech Blog did the […]
Where’s the badware? | Security Hero says:

March 22, 2009 at 10:48 pm

[…] at social shopping site StyleFeeder, Philip Jacob posted some stats about the geographic origins of spammy accounts. It turns out that the majority of the spam […]
Andre Mesarovic says:

May 19, 2009 at 1:17 pm

The New York Times just had an article on the cost of non-monetized customers: “In Developing Countries, Web Grows Without Profit” – http://www.nytimes.com/2009/04/27/technology/start-ups/27global.html.

Excerpt:

Call it the International Paradox.

Web companies that rely on advertising are enjoying some of their most vibrant growth in developing countries. But those are also the same places where it can be the most expensive to operate, since Web companies often need more servers to make content available to parts of the world with limited bandwidth. And in those countries, online display advertising is least likely to translate into results.
Justin Cooke says:

June 6, 2009 at 7:43 pm

I like your thought process in whether to “ban India” or not. When you take a closer look at who the real “culprit” is, you find that there are many companies in many countries that could be involved and it’s hard to say who’s to blame.

As we get closer to Thomas Friedman’s “Flat Earth” we’re going to find that event he smaller companies will have a global presence. What do you think is the best way to politely explain to our friends and co-workers in India and elsewhere that enough’s enough…spam is bad for the community?
Richard says:

August 20, 2009 at 4:00 am

I agreed with Justin, It’s not about “ban India” or not it just not the solution of the problem, I Think we have to find out the root cause.
MercuryDragon says:

March 16, 2011 at 8:54 pm

I am a moderator on a social networking website that has restricted several Asian countries including India due to spamming. While I see that most spam does come from Indian users, I find using that graph as a reference for country restriction.. just doesn’t look very reliable to me.

This blog lists Pakistan as a prominent spammer at 5.71% meanwhile USA has 15.4%, which is the second highest percentage on that pie graph. Should we restrict USA too if we’ve placed a restriction upon Pakistan?

The thing is, the Canadian-based social networking website I moderate on makes more money from the USA than Pakistan from advertisement affiliates. Is it spam or the money backing who is decided to be blocked? Most websites have no problem with spam if it is from an affiliate, but if done by an outside company, it is blocked, banned, resctricted, or suspended.

Also, the blog starts with the sentence, “It seems that the Nigerians are busy with other work these days.” but Nigeria isn’t even listed on the pie graph with a percentage. How does that work, beginning a statement implying people of Nigeria are at large spamming our computers when there is no mention of them statistically in the blog?
Philip Jacob says:

March 17, 2011 at 8:19 am

To your first point: I am against country-level restrictions and made that clear in the blog post. I agree that doing this on a country-level is a bad idea precisely because it is not reliable, not will it actually solve the problem in question. Not everybody sees it that way, unfortunately.

Re: Pakistan – no, we should not restrict Pakistan any more than we should restrict the US. I fear that you have misinterpreted my message if you came away thinking this.

Re: Nigerians – it is a well-known joke in the anti-spam world that some of the most outrageous spam messages one receives are from people in Nigeria purporting to be the holders of huge amounts of money that they would be happy to share with you if you could pay some administrative fees to get it into a Swiss bank, etc. If you will kindly do some reading on the subject, you will discover many examples of this kind of spam.
MercuryDragon says:

March 20, 2011 at 9:21 am

Either way, this blog was used by our Administrators to explain the reasoning behing the restriction of multiple countries, so it would seem the content was misleading if the point was anti-country restriction.

And yes, I have recieved these Nigerian bank account scammer e-mails myself, but still, the country isn’t even listed on the graph. :/
Philip Jacob says:

March 21, 2011 at 9:27 am

Your administrators either misunderstood or purposefully distorted my message. Reread the post, especially the following:

“I felt that this approach [country-level blocking], however, was too heavy-handed in that it was inevitably going to be a blunt instrument that caused a lot of collateral damage. Plus, we needed a solution for social media spammers within North America and those using proxies and cracked hosts, so it wasn’t going to save us much.”

The entire rest of the post above continues on to show why country level blocking tactics are a Bad Idea.

The reference to Nigerian spam is related to email. Note that the title of the post refers to “social media spam,” which is another variety entirely. Perhaps reading this page will give you the insight you need to fully understand what was intended to be a humorous reference.

http://www.snopes.com/fraud/advancefee/nigeria.asp
Joe says:

May 7, 2011 at 10:06 am

Having a little knowledge in the area of fraud, I understand the reference to Nigeria as it was a joke that was mentioned more than once. I would have to say though that it’s not the country that’s at fault. Figure out the source, and you’d be surprised to find out where it all began. They may simply be the hired gun.

Traveler Joe
http://www.wildplanettours.com/

StyleFeeder Tech Blog

Spamdog Millionaire – The geography of social media spam

12 Comments

Pages

Archives

Categories

Friends

Meta