It seems that the Nigerians are busy with other work these days.
A few months ago, we noticed some content starting to appear on StyleFeeder that we weren’t comfortable with. Usually, people post about clothes, shoes and furniture and that kind of thing, but we were seeing posts about illegal movie downloads, bedroom drugs and more of the usual suspects. Now, I’ve written about email spam elsewhere in the past and Savage has a lot of experience with email in his own right, but there was one big, noticeable trend that jumped out as we started investigating the problem.
We have over a million registered users, so “digging in” was something that was well beyond the means of a manual effort on our part. Definitely a needle in the haystack situation. In just a few days, we developed a tool that we call Assassin (I guess we were not feeling terribly original) that digs through users’ accounts, extracts a bunch of features and does some analysis on them using some AI techniques. Because we were just getting started with this software, we didn’t want to start auto-killing accounts until we were comfortable with it. But that was a few months ago. What we now have is a mature social media spam detection facility that runs against our production dataset and is remarkably adaptive in how it ferrets out the bad guys.
Last week, I did some analysis of the accounts that we have closed so far and it revealed my suspicions. For each account that we have closed due to spammy activity, I ran their source IP addresses through a GeoIP lookup and graphed the data using DabbleDB (which I had been meaning to play with for some time – more on that later). The result: India, in a word. Pakistan, too.
Originating countries of social media spammers on StyleFeeder
Here’s where this gets particularly interesting. Someone asked us why we didn’t just firewall India and China on the basis that these users aren’t monetizable in any meaningful way for us (many products listed on our site are sold by retailers that won’t ship to India or other countries outside of North America or Europe). Only 0.54% of our legitimate traffic comes from India, too. There’s a low signal to noise ratio.
This is not an entirely unreasonable reaction. It takes a simple cost/benefit equation to realize that users in these countries are more trouble than they’re worth. I felt that this approach, however, was too heavy-handed in that it was inevitably going to be a blunt instrument that caused a lot of collateral damage. Plus, we needed a solution for social media spammers within North America and those using proxies and cracked hosts, so it wasn’t going to save us much.
But this suggestion stuck with me. Developing software like Assassin is beyond the capability of many startups, both technically and in terms of the resources that we were able to bring to bear against this problem. They use the Craigslist model for policing content: rely on your users to complain or flag it for you. For those companies, shutting off 40% of the global population may be an easy decision.
What concerns me is that easy decisions like this may result in a fragmented Internet if small, innovative startups are forced to make decisions like this. Let’s say that Twitter decided that there were too many social media spammers in India (perhaps picking on the wrong guy, like this case). What’s the answer? Block India? If this kind of decision was enacted again and again in any regularity, the long term impact outside of North America and Europe could be material.
The irony here is that we saw cases of companies who had hired social media spammers in India to put spammy content on Squidoo (based in North Carolina) and then link to it from StyleFeeder (based in Boston) in order to help a car dealer in Connecticut sell cars made in Japan. The global relationships in this problem are inescapable. Who’s to blame? There are a number of ways to answer that question, most of which can be at least partially addressed by looking through the familiar lenses that we use to talk about the email spam problems. I’ll avoid touch that, because I think that “Who’s to blame?” is not the big question this time.
Who suffers? Well, if country-level blocking was deployed by a large number of startups, it’s clearly the Indian non-spammer regular users who would suffer, at least in part due to actions of their own countrymen. When you look at companies like Friendster – with something like 80% of their traffic coming from outside the USA – you have to wonder if opportunities might be inadvertently lost. Or Orkut, which is hugely popular in Brazil. Whether explicit or not, most startups adopt a usage policy that keeps their sites open to the entire Internet and then add in specialized rules to limit access over time.
Social media spam differs from email spam in a few key respects. With email, I’m the only one bothered by the spam in my inbox. When comment spam appears on a blog or when crappy posts appear on a social website, it becomes a part of a shared experience. The reach can be so much greater when the SEO effects are factored in; perhaps this explains why we’re seeing entire companies dedicated to this kind of spammy behavior set up. I’ll avoid mentioning them by name or linking to them, because I don’t want to give them publicity.
How widespread is this kind of blocking by startups who are susceptible to the armies of computer-literate Indian social media spammers? I’m wondering what other small companies do when faced with annoying users in countries that aren’t part explicitly part of their target markets. If our experience is representative, this challenge may be more widespread than most people realize.