Facebook has been serving up public listings for over a year now. Unlike most of the site, anybody can view public listings, even non-members. They offer a window into the Facebook world for those who haven’t joined yet, since Facebook doesn’t allow full profiles to be publicly viewable by non-members (unlike MySpace and others). Of course, this window into Facebook comes with a prominent “Sign Up” button, growth still being the main mark of success in the social networking world. The goal is for non-members to stumble across a public listing, see how many friends are already using Facebook, and then join. Economists call this a network effect, and Facebook is shrewdly harnessing it.
Of course, to do this, Facebook is making public every user’s name, photo, and 8 friendship links. Affiliations with organizations, causes, or products are also listed, I just don’t have any on my profile (though my sister does). This is quite a bit of information given away by a feature many active Facebook user are unaware of. Indeed, it’s more information than the Facebook’s own privacy policy indicates is given away. When the feature was launched in 2007, every over-18 user was automatically opted-in, as have been new users since then. You can opt out, but few people do-out of more than 500 friends of mine, only 3 had taken the time to opt out. It doesn’t help that most users are unaware of the feature, since registered users don’t encounter it.
Making matters worse, public listings aren’t protected from crawling. In fact they are designed to be indexed by search engines. In our own experiments, we were able to download over 250,000 public listings per day using a desktop PC and a fairly crude Python script. For a serious data aggregator getting every user’s listing is no sweat. So what can one do with 200 million public listings?
I explored this question along with Jonathan Anderson, Frank Stajano, and Ross Anderson in a new paper which we presented today at the ACM Social Network Systems Workshop in Nuremberg. Facebook’s public listings give us a random sample of the social graph, leading to some interesting exercises in graph theory. As we describe in the paper, it turns out that this sampled graph allows us to approximate many properties of the complete network surprisingly well: degree and centrality of nodes, small dominating sets, short paths, and community structure. These are all things marketers and sociologists alike would love to know for the complete Facebook graph.
This result leads to two interesting conclusions. First, protecting a social graph is hard. Consistent with previous results, we found that giving away a seemingly small amount can allow much information to be inferred. It’s also been shown that anonymising a social graph is almost impossible.
Second, Facebook is developing a track record of releasing features and then being surprised by the privacy implications, from Beacon to NewsFeed and now Public Search. Analogous to security-critical software, where new code is extensively tested and evaluated before being deployed, social networks should have a formal privacy review of all new features before they are rolled out (as, indeed, should other web services which collect personal information). Features like public search listings shouldn’t make it off the drawing board.
Another interesting point is this. The Government wants to spend 15 billion pounds on the IMP database of all traffic data – email headers, itemized phone bills, and the like – so that they can track the UK social graph. This paper shows that you don’t need to spend all that money – you can get the social graph just by scraping the public data from Facebook.
Interesting second point. One would suppose that a least a crude review of possible privacy implications would take place before new features/applications are introduced. Which means privacy concerns were either gravely underestimated or blatantly ignored.
You might think so, but Facebook doesn’t seem to value user privacy over the ability to roll out features quickly. I have it from a reliable source that there is no privacy review process for such applications.
Also, you may be interested to know that the story has been picked up by The Guardian!
Also covered on Dark Reading, with a response from Facebook: http://www.darkreading.com/security/privacy/showArticle.jhtml?articleID=216402556
I’m always concerned about joining Facebook because of Facebook’s public page view. Why should anyone looking for me could access my friends’ public profiles? I went through your study Eight Friends are Enough just when I was finishing up my article. I have mentioned it as a recommended reading on my post http://blogs.itworldcanada.com/idol/2009/04/16/you-are-on-google-you-are-internetnal/
I’ve just written my own python script to do exactly this: pull off a social network from people’s public facebook listings. Using the BeautifulSoup library it isn’t actually that hard.
As an extension, I’d love to be logged into facebook whilst trawling. Some users have open profiles on networks, or at least did. University networks are common for this and the amount of information you can pull from people’s profiles rapidly mounts up,
As a second extension, I’d like to do twitter. Whilst real names aren’t the order of the day on twitter it would be interesting and the API makes it much easier to trawl.
I do have one question though: you’ll no doubt have come across users who add everyone they ever meet. As such, a facebook network might not, necessarily, represent the true nature of the individual’s social life; we need more information to make this truly useful, such as how often do these people talk.
So you’ll know who’s met who, but not who’s best friends with who, given the above methods.