Category Archives: Privacy technology

Anonymous communication, data protection

How to vote anonymously under ubiquitous surveillance

In 2006, the Chancellor proposed to invade an enemy planet, but his motion was anonymously vetoed. Three years on, he still cannot find out who did it.

This time, the Chancellor is seeking re-election in the Galactic Senate. Some delegates don’t want to vote for him, but worry about his revenge. How to arrange an election such that the voter’s privacy will be best protected?

The environment is extremely adverse. Surveillance is everywhere. Anything you say will be recorded and traceable to you. All communication is essentially public. In addition, you have no one to trust but yourself.

It may seem mind-boggling that this problem is solvable in the first place. With cryptography, anything is possible. In a forthcoming paper to be published by IET Information Security, we (joint work with Peter Ryan and Piotr Zielinski) described a decentralized voting protocol called “Open Vote Network”.

In the Open Vote Network protocol, all communication data is open, and publicly verifiable. The protocol provides the maximum protection of the voter’s privacy; only a full collusion can break the privacy. In addition, the protocol is exceptionally efficient. It compares favorably to past solutions in terms of the round efficiency, computation load and bandwidth usage, and has been close to the best possible in each of these aspects.

With the same security properties, it seems unlikely to have a decentralized voting scheme that is significantly more efficient than ours. However, in cryptography, nothing is ever optimal, so we keep this question open.

A preprint of the paper is available here, and the slides here.

Tor on Android

Andrew Rice and I ran a ten week internship programme for Cambridge undergraduates this summer. One of the project students, Connell Gauld, was tasked with the job of producing a version of Tor for the Android mobile phone platform which could be used on a standard handset.

Connell did a great job and on Friday we released TorProxy, a pure Java implementation of Tor based on OnionCoffee, and Shadow, a Web browser which uses TorProxy to permit anonymous browsing from your Android phone. Both applications are available on the Android Marketplace; remember to install TorProxy if you want to use Shadow.

The source code for both applications is released under GPL v2 and is available from our SVN repository on the project home page. There are also instructions on how to use TorProxy to send and receive data via Tor from your own Android application.

User complaints about photos in Facebook ads

I wrote about the mess caused by Facebook’s insecure application platform nearly 2 months ago. I also wrote about the long-term problems with “informed consent” for data use in social networks. In the past week, both problems came to a head as users began complaining about multiple third-party ad networks using their photos in banner ads. When I mentioned this problem in June, Facebook had just shut down the biggest ad networks for “deceptive practices,” specifically by duping users into a US$20 per month ringtone subscription. The void created by banning SocialReach and SocialHour apparently led to many new advertisers popping up in their place, with most carrying on the practice of using user photos to hawk quizzes, dating services, and the like. The ubiquitous ads annoyed enough users that Facebook was convinced to ban advertisers from using personal data. This is a welcome move, but Facebook underhandedly inserted a curious new privacy setting at “Privacy Settings->News Feed and Wall->Facebook Ads”:

Facebook does not give third party applications or ad networks the right to use your name or picture in ads. If this is allowed in the future, this setting will govern the usage of your information.

With this change, Facebook has quietly reserved the right to re-allow applications to utilise user data in ads in the future and opted everybody in to the feature. We’ve written about social networks minimising privacy salience, but this is plainly deceptive. It’s hard not to conclude this setting was purposefully hidden from sight, as ads shown by third-party applications have nothing to do with the News Feed or Wall. The choices of “No One” or “Only Friends” are also obfuscating, as only friends’ applications can access data from Facebook’s API to begin with; this is a simple “opt-out” checkbox dressed up to making being opted in seem more private. Meanwhile, Facebook has been showing users a patronising popup message on log-in:

Worried about privacy? Your photos are safe. There have been misleading rumors recently about the use of your photos in ads. Don’t believe them. These rumors were related to third-party applications, and not ads shown by Facebook. Get the whole story at the Facebook Blog, or check out the Help Center.

This message is misleading, if not outright dishonest, and shows an alarming dismissal of what was a widespread practice that offended many users. People weren’t concerned with whether their photos were sent to advertisers by Facebook itself or third-parties. They don’t want their photos or names used or stored by advertisers regardless of the technical details. The platform API remains fundamentally broken and gives users no way to prevent applications from accessing their photos. Facebook would be best served by fixing this instead of dismissing users’ concern for privacy as “misleading rumors.”

The Economics of Privacy in Social Networks

We often think of social networking to Facebook, MySpace, and the also-rans, but in reality there are there are tons of social networks out there, dozens which have membership in the millions. Around the world it’s quite a competitive market. Sören Preibusch and I decided to study the whole ecosystem to analyse how free-market competition has shaped the privacy practices which I’ve been complaining about. We carefully examined 45 sites, collecting over 250 data points about each sites’ privacy policies, privacy controls, data collection practices, and more. The results were fascinating, as we presented this week at the WEIS conference in London. Our full paper and complete dataset are now available online as well.

We collected a lot of data, and there was a little bit of something for everybody. There was encouraging news for fans of globalisation, as we found the social networking concept popular across many cultures and languages, with the most popular sites being available in over 40 languages. There was an interesting finding from a business perspective that photo-sharing may be the killer application for social networks, as this features was promoted far more often than sharing videos, blogging, or playing games. Unfortunately the news was mostly negative from a privacy standpoint. We found some predictable but still surprising problems. Too much unnecessary data is collected by most sites, 90% requiring a full-name and DOB. Security practices are dreadful: no sites employed phishing countermeasures, and 80% of sites failed to protect password entry using TLS. Privacy policies were obfuscated and confusing, and almost half failed basic accessibility tests. Privacy controls were confusing and overwhelming, and profiles were almost universally left open by default.

The most interesting story we found though was how sites consistently hid any mention of privacy, until we visited the privacy policies where they provided paid privacy seals and strong reassurances about how important privacy is. We developed a novel economic explanation for this: sites appear to craft two different messages for two different populations. Most users care about privacy about privacy but don’t think about it in day-to-day life. Sites take care to avoid mentioning privacy to them, because even mentioning privacy positively will cause them to be more cautious about sharing data. This phenomenon is known as “privacy salience” and it makes sites tread very carefully around privacy, because users must be comfortable sharing data for the site to be fun. Instead of mentioning privacy, new users are shown a huge sample of other users posting fun pictures, which encourages them to  share as well. For privacy fundamentalists who go looking for privacy by reading the privacy policy, though, it is important to drum up privacy re-assurance.

The privacy fundamentalists of the world may be positively influencing privacy on major sites through their pressure. Indeed, the bigger, older, and more popular sites we studied had better privacy practices overall. But the desire to limit privacy salience is also a major problem because it prevents sites from providing clear information about their privacy practices. Most users therefore can’t tell what they’re getting in to, resulting in the predominance of poor-practices in this “privacy jungle.”

Open letter to Google

I am one of 38 researchers and academics (almost all of whom are far more important and famous than I will ever be!), who has signed an Open Letter to Google’s CEO, Eric Schmidt.

The letter, whose text is released today, calls upon Google to honour the important privacy promises it has made to its customers and protect users’ communications from theft and snooping by enabling industry standard transport encryption technology (HTTPS) for Google Mail, Docs, and Calendar.

Google already uses HTTPS for sign-in, but the options to make the whole of the session secure are hidden away where few people will ever find them.

Hence, at the moment pretty much everyone who uses a public WiFi connection to read their Gmail or edit a shared doc has no protection at all if any passing stranger decides to peek and see what they’re doing.

However, getting everyone to change their behaviour will take lots of explaining. Much simpler to have Google edit a couple of configuration files and flip a default the other way.

The letter goes into the issues in considerable detail (it’s eleven pages long with all the footnotes)… Eric Schmidt can hardly complain that we’ve failed to explain the issues to him !

Security and Human Behaviour 2009

I’m at SHB 2009, which brings security engineers together with psychologists, behavioral economists and others interested in deception, fraud, fearmongering, risk perception and how we make security systems more usable. Here is the agenda.

This workshop was first held last year, and most of us who attended reckoned it was the most exciting event we’d been to in some while. (I blogged SHB 2008 here.) In followups that will appear as comments to this post, I’ll be liveblogging SHB 2009.

Attack of the Zombie Photos

One of the defining features of Web 2.0 is user-uploaded content, specifically photos. I believe that photo-sharing has quietly been the killer application which has driven the mass adoption of social networks. Facebook alone hosts over 40 billion photos, over 200 per user, and receives over 25 million new photos each day. Hosting such a huge number of photos is an interesting engineering challenge. The dominant paradigm which has emerged is to host the main website from one server which handles user log-in and navigation, and host the images on separate special-purpose photo servers, usually on an external content-delivery network. The advantage is that the photo server is freed from maintaining any state. It simply serves its photos to any requester who knows the photo’s URL.

This setup combines the two classic forms of enforcing file permissions, access control lists and capabilities. The main website checks each request for a photo against an ACL, it then grants a capability to view a photo in the form of an obfuscated URL which can be sent to the photo-server. We wrote earlier about how it was possible to forge Facebook’s capability-URLs and gain unauthorised access to photos. Fortunately, this has been fixed and it appears that most sites use capability-URLs with enough randomness to be unforgeable. There’s another traditional problem with capability systems though: revocation. My colleagues Jonathan Anderson, Andrew Lewis, Frank Stajano and I ran a small experiment on 16 social-networking, blogging, and photo-sharing web sites and found that most failed to remove image files from their photo servers after they were deleted from the main web site. It’s often feared that once data is uploaded into “the cloud,” it’s impossible to tell how many backup copies may exist and where, and this provides clear proof that content delivery networks are a major problem for data remanence. Continue reading Attack of the Zombie Photos

Location privacy

I was recently asked for a brief (4-page) invited paper for a forthcoming special issue of the ACM SIGSPATIAL on privacy and security of location-based systems, so I wrote Foot-driven computing: our first glimpse of location privacy issues.

In 1989 at ORL we developed the Active Badge, the first indoor location system: an infrared transmitter worn by personnel that allowed you to tell which room the wearer was in. Every press and TV reporter who visited our lab worried about the intrusiveness of this technology; yet, today, all those people happily carry mobile phones through which they can be tracked anywhere they go. The significance of the Active Badge project was to give us a head start of a few years during which to think about location privacy before it affected hundreds of millions of people. (There is more on our early ubiquitous computing work at ORL in this free excerpt from my book.)
The ORL Active Badge

Location privacy is a hard problem to solve, first because ordinary people don’t seem to actually care, and second because there is a misalignment of incentives: those who could do the most to address the problem are the least affected and the least concerned about it. But we have a responsibility to address it, in the same way that designers of new vehicles have a responsibility to address the pollution and energy consumption issue.

Facebook Giving a Bit Too Much Away

Facebook has been serving up public listings for over a year now. Unlike most of the site, anybody can view public listings, even non-members. They offer a window into the Facebook world for those who haven’t joined yet, since Facebook doesn’t allow full profiles to be publicly viewable by non-members (unlike MySpace and others). Of course, this window into Facebook comes with a prominent “Sign Up” button, growth still being the main mark of success in the social networking world. The goal is for non-members to stumble across a public listing, see how many friends are already using Facebook, and then join. Economists call this a network effect, and Facebook is shrewdly harnessing it.

Of course, to do this, Facebook is making public every user’s name, photo, and 8 friendship links. Affiliations with organizations, causes, or products are also listed, I just don’t have any on my profile (though my sister does). This is quite a bit of information given away by a feature many active Facebook user are unaware of. Indeed, it’s more information than the Facebook’s own privacy policy indicates is given away. When the feature was launched in 2007, every over-18 user was automatically opted-in, as have been new users since then. You can opt out, but few people do-out of more than 500 friends of mine, only 3 had taken the time to opt out. It doesn’t help that most users are unaware of the feature, since registered users don’t encounter it.

Making matters worse, public listings aren’t protected from crawling. In fact they are designed to be indexed by search engines. In our own experiments, we were able to download over 250,000 public listings per day using a desktop PC and a fairly crude Python script. For a serious data aggregator getting every user’s listing is no sweat. So what can one do with 200 million public listings?

I explored this question along with Jonathan Anderson, Frank Stajano, and Ross Anderson in a new paper which we presented today at the ACM Social Network Systems Workshop in Nuremberg. Facebook’s public listings give us a random sample of the social graph, leading to some interesting exercises in graph theory. As we describe in the paper, it turns out that this sampled graph allows us to approximate many properties of the complete network surprisingly well: degree and centrality of nodes, small dominating sets, short paths, and community structure. These are all things marketers and sociologists alike would love to know for the complete Facebook graph.

This result leads to two interesting conclusions. First, protecting a social graph is hard. Consistent with previous results, we found that giving away a seemingly small amount can allow much information to be inferred. It’s also been shown that anonymising a social graph is almost impossible.

Second, Facebook is developing a track record of releasing features and then being surprised by the privacy implications, from Beacon to NewsFeed and now Public Search. Analogous to security-critical software, where new code is extensively tested and evaluated before being deployed, social networks should have a formal privacy review of all new features before they are rolled out (as, indeed, should other web services which collect personal information).  Features like public search listings shouldn’t make it off the drawing board.