KDD 2008: Mining Social Networks

One of the most popular sessions at KDD was the Social Networks panel. Social network data has long existed and been found useful for discovery and prediction, but only recently has the scale, availability and value of social network data reached tremendous heights. One of the most well known sets of social network data is telephone call records. Telephone companies found that by looking at call records, they could better identify fraudsters—people who hijacked telephone equipment to get free calls. But, how social network data can be best utilized is not always obvious. With the call record data, the fraudsters did not call each other (were not directly connected), but rather were often found to call the same seedy number, say a pay phone in the back of a certain bar. Other sorts of “anomaly detection” discussed by the panel included identifying “bad” stock brokers (which were often found to work at the same branch office) and computer network intrusion detection.

Historically, the applicability of social network data has been limited to specialized applications. One question that was asked repeatedly during the session was: “Where’s the money?!?!” One of the largest on-line gold mines is search advertising, yet social network data provides relatively little promise for enhancing search ads because such ads are already quite focused. The panelists pointed out that there is much money to be made in non-search ads, product recommendations and direct marketing. Given information about on-line users’ relationships, one can better predict what ads they will click on, how they will respond to direct marketing, and, ultimately, what products they will buy. If Alice and Bob are friends and Alice owns an iPhone, chances are good that Bob will be buying one soon and iPhone marketing is more likely to convert Bob than Chris, someone with no iPhone-owning friends.

Research on social networks has largely been limited to problems of prediction and detection. But any marketer or retailer will naturally wonder: can we modify the network? Can we create relationships that will cause rapid adoption of our product(s)? The panelists had relatively little to say in response to questions of this sort, other than to say that it is a difficult problem—social psychologists have been working on the problem for decades. I wonder if the reason why it is so difficult is because it requires significant interaction between a learning system and a real social network. Simulation isn’t enough—if one could accurately simulate the effects of changes to the network, one could easily determine the optimal changes. Yet, your typical machine learning/data mining research can’t actively enact changes to a live social network; and, a static social network data set, even if it includes a log of changes, is of limited value. So, the initial data mining work on modifying social networks will likely happen behind closed doors. But, maybe this is a problem that is too interdisciplinary and context-dependent to be treated as a data mining problem. On the other hand, one needs to look no further than the likes of Disney, Nintendo and Apple to see the rewards of proper seeding of the social network. Maybe one day we’ll be able to prove theorems about what certain businessmen have known for years…