Category Archives: Academic papers

How much did shutting down McColo help?

On 11 November 2008 McColo, a Californian server hosting company, was disconnected from the Internet. This took the controllers for 6 major botnets offline. It has been widely reported that email spam volumes were markedly reduced for some time thereafter. But did disconnecting McColo only get rid of “easy to block” spam?

In a paper presented this week at the Sixth Conference on Email and Antispam (CEAS) I examined email traffic data for for the incoming email to a UK ISP to see what effect the disconnection had.
Continue reading How much did shutting down McColo help?

The Economics of Privacy in Social Networks

We often think of social networking to Facebook, MySpace, and the also-rans, but in reality there are there are tons of social networks out there, dozens which have membership in the millions. Around the world it’s quite a competitive market. Sören Preibusch and I decided to study the whole ecosystem to analyse how free-market competition has shaped the privacy practices which I’ve been complaining about. We carefully examined 45 sites, collecting over 250 data points about each sites’ privacy policies, privacy controls, data collection practices, and more. The results were fascinating, as we presented this week at the WEIS conference in London. Our full paper and complete dataset are now available online as well.

We collected a lot of data, and there was a little bit of something for everybody. There was encouraging news for fans of globalisation, as we found the social networking concept popular across many cultures and languages, with the most popular sites being available in over 40 languages. There was an interesting finding from a business perspective that photo-sharing may be the killer application for social networks, as this features was promoted far more often than sharing videos, blogging, or playing games. Unfortunately the news was mostly negative from a privacy standpoint. We found some predictable but still surprising problems. Too much unnecessary data is collected by most sites, 90% requiring a full-name and DOB. Security practices are dreadful: no sites employed phishing countermeasures, and 80% of sites failed to protect password entry using TLS. Privacy policies were obfuscated and confusing, and almost half failed basic accessibility tests. Privacy controls were confusing and overwhelming, and profiles were almost universally left open by default.

The most interesting story we found though was how sites consistently hid any mention of privacy, until we visited the privacy policies where they provided paid privacy seals and strong reassurances about how important privacy is. We developed a novel economic explanation for this: sites appear to craft two different messages for two different populations. Most users care about privacy about privacy but don’t think about it in day-to-day life. Sites take care to avoid mentioning privacy to them, because even mentioning privacy positively will cause them to be more cautious about sharing data. This phenomenon is known as “privacy salience” and it makes sites tread very carefully around privacy, because users must be comfortable sharing data for the site to be fun. Instead of mentioning privacy, new users are shown a huge sample of other users posting fun pictures, which encourages them to  share as well. For privacy fundamentalists who go looking for privacy by reading the privacy policy, though, it is important to drum up privacy re-assurance.

The privacy fundamentalists of the world may be positively influencing privacy on major sites through their pressure. Indeed, the bigger, older, and more popular sites we studied had better privacy practices overall. But the desire to limit privacy salience is also a major problem because it prevents sites from providing clear information about their privacy practices. Most users therefore can’t tell what they’re getting in to, resulting in the predominance of poor-practices in this “privacy jungle.”

Location privacy

I was recently asked for a brief (4-page) invited paper for a forthcoming special issue of the ACM SIGSPATIAL on privacy and security of location-based systems, so I wrote Foot-driven computing: our first glimpse of location privacy issues.

In 1989 at ORL we developed the Active Badge, the first indoor location system: an infrared transmitter worn by personnel that allowed you to tell which room the wearer was in. Every press and TV reporter who visited our lab worried about the intrusiveness of this technology; yet, today, all those people happily carry mobile phones through which they can be tracked anywhere they go. The significance of the Active Badge project was to give us a head start of a few years during which to think about location privacy before it affected hundreds of millions of people. (There is more on our early ubiquitous computing work at ORL in this free excerpt from my book.)
The ORL Active Badge

Location privacy is a hard problem to solve, first because ordinary people don’t seem to actually care, and second because there is a misalignment of incentives: those who could do the most to address the problem are the least affected and the least concerned about it. But we have a responsibility to address it, in the same way that designers of new vehicles have a responsibility to address the pollution and energy consumption issue.

Security economics video

Here is a video of a talk I gave at DMU on security economics (and the slides). I’ve given variants of this survey talk at various conferences over the past two or three years; at last one of them recorded the talk and put the video online. There’s also a survey paper that covers much of the same material. If you find this interesting, you might enjoy coming along to WEIS (the Workshop on the Economics of Information Security) on June 24-25.

Temporal Correlations between Spam and Phishing Websites

Richard Clayton and I have been studying phishing website take-down for some time. We monitored the availability of phishing websites, finding that while most phishing websites are removed with a day or two, a substantial minority remain for much longer. We later found that one of the main reasons why so many websites slip through the cracks is that the take-down companies responsible for removal refuse to share their URL lists with each other.

One nagging question remained, however. Do long-lived phishing websites cause any harm? Would removing them actually help? To get that answer, we had to bring together data on the timing of phishing spam transmission (generously shared by Cisco IronPort) with our existing data on phishing website lifetimes. In our paper co-authored with Henry Stern and presented this week at the USENIX LEET Workshop in Boston, we describe how a substantial portion of long-lived phishing websites continue to receive new spam until the website is removed. For instance, fresh spam continues to be sent out for 75% of phishing websites alive after one week, attracting new victims. Furthermore, around 60% of phishing websites still alive after a month keep receiving spam advertisements.

Consequently, removal of websites by the banks (and the specialist take-down companies they hire) is important. Even when the sites stay up for some time, there is value in continued efforts to get them removed, because this will limit the damage.

However, as we have pointed out before, the take-down companies cause considerable damage by their continuing refusal to share data on phishing attacks with each other, despite our proposals addressing their competitive concerns. Our (rough) estimate of the financial harm due to longer-lived phishing websites was $330 million per year. Given this new evidence of persistent spam campaigns, we are now more confident of this measure of harm.

There are other interesting insights discussed in our new paper. For instance, phishing attacks can be broken down into two main categories: ordinary phishing hosted on compromised web servers and fast-flux phishing hosted on a botnet infrastructure. It turns out that fast-flux phishing spam is more tightly correlated with the uptime of the associated phishing host. Most spam is sent out around the time the fast-flux website first appears and stops once the website is removed. For phishing websites hosted on compromised web servers, there is much greater variation between the time a website appears and when the spam is sent. Furthermore, fast-flux phishing spam was 68% of the total email spam detected by IronPort, despite this being only 3% of all the websites.

So there seems to be a cottage industry of fairly disorganized phishing attacks, with perhaps a few hundred people involved. Each compromises a small number of websites, while sending a small amount of spam. Conversely there are a small number of organized gangs who use botnets for hosting, send most of the spam, and are extremely efficient on every measure we consider. We understand that the police are concentrating their efforts on the second set of criminals. This appears to be a sound decision.

Facebook Giving a Bit Too Much Away

Facebook has been serving up public listings for over a year now. Unlike most of the site, anybody can view public listings, even non-members. They offer a window into the Facebook world for those who haven’t joined yet, since Facebook doesn’t allow full profiles to be publicly viewable by non-members (unlike MySpace and others). Of course, this window into Facebook comes with a prominent “Sign Up” button, growth still being the main mark of success in the social networking world. The goal is for non-members to stumble across a public listing, see how many friends are already using Facebook, and then join. Economists call this a network effect, and Facebook is shrewdly harnessing it.

Of course, to do this, Facebook is making public every user’s name, photo, and 8 friendship links. Affiliations with organizations, causes, or products are also listed, I just don’t have any on my profile (though my sister does). This is quite a bit of information given away by a feature many active Facebook user are unaware of. Indeed, it’s more information than the Facebook’s own privacy policy indicates is given away. When the feature was launched in 2007, every over-18 user was automatically opted-in, as have been new users since then. You can opt out, but few people do-out of more than 500 friends of mine, only 3 had taken the time to opt out. It doesn’t help that most users are unaware of the feature, since registered users don’t encounter it.

Making matters worse, public listings aren’t protected from crawling. In fact they are designed to be indexed by search engines. In our own experiments, we were able to download over 250,000 public listings per day using a desktop PC and a fairly crude Python script. For a serious data aggregator getting every user’s listing is no sweat. So what can one do with 200 million public listings?

I explored this question along with Jonathan Anderson, Frank Stajano, and Ross Anderson in a new paper which we presented today at the ACM Social Network Systems Workshop in Nuremberg. Facebook’s public listings give us a random sample of the social graph, leading to some interesting exercises in graph theory. As we describe in the paper, it turns out that this sampled graph allows us to approximate many properties of the complete network surprisingly well: degree and centrality of nodes, small dominating sets, short paths, and community structure. These are all things marketers and sociologists alike would love to know for the complete Facebook graph.

This result leads to two interesting conclusions. First, protecting a social graph is hard. Consistent with previous results, we found that giving away a seemingly small amount can allow much information to be inferred. It’s also been shown that anonymising a social graph is almost impossible.

Second, Facebook is developing a track record of releasing features and then being surprised by the privacy implications, from Beacon to NewsFeed and now Public Search. Analogous to security-critical software, where new code is extensively tested and evaluated before being deployed, social networks should have a formal privacy review of all new features before they are rolled out (as, indeed, should other web services which collect personal information).  Features like public search listings shouldn’t make it off the drawing board.

The Snooping Dragon

There’s been much interest today in a report that Shishir Nagaraja and I wrote on Chinese surveillance of the Tibetan movement. In September last year, Shishir spent some time cleaning out Chinese malware from the computers of the Dalai Lama’s private office in Dharamsala, and what we learned was somewhat disturbing.

Later, colleagues from the University of Toronto followed through by hacking into one of the control servers Shishir identified (something we couldn’t do here because of the Computer Misuse Act); their report relates how the attackers had controlled malware on hundreds of other PCs, many in government agencies of countries such as India, Vietnam and the Phillippines, but also in US firms such as AP and Deloittes.

The story broke today in the New York Times; see also coverage in the Telegraph, the BBC, CNN, the Times of India, AP, InfoWorld, Wired and the Wall Street Journal.

Hot Topics in Privacy Enhancing Technologies (HotPETs 2009)

HotPETs – the 2nd Hot Topics in Privacy Enhancing Technologies (co-located with PETS) will be held in Seattle, 5–7 August 2009.

HotPETs is the forum for new ideas on privacy, anonymity, censorship resistance, and related topics. Work-in-progress is welcomed, and the format of the workshop will be to encourage feedback and discussion. Submissions are especially encouraged on the human side of privacy: what do people believe about privacy? How does privacy work in existing institutions?

Papers (up to 15 pages) are due by 8 May 2009. Further information can be found in the call for papers.

Optimised to fail: Card readers for online banking

A number of UK banks are distributing hand-held card readers for authenticating customers, in the hope of stemming the soaring levels of online banking fraud. As the underlying protocol — CAP — is secret, we reverse-engineered the system and discovered a number of security vulnerabilities. Our results have been published as “Optimised to fail: Card readers for online banking”, by Saar Drimer, Steven J. Murdoch, and Ross Anderson.

In the paper, presented today at Financial Cryptography 2009, we discuss the consequences of CAP having been optimised to reduce both the costs to the bank and the amount of typing done by customers. While the principle of CAP — two factor transaction authentication — is sound, the flawed implementation in the UK puts customers at risk of fraud, or worse.

When Chip & PIN was introduced for point-of-sale, the effective liability for fraud was shifted to customers. While the banking code says that customers are not liable unless they were negligent, it is up to the bank to define negligence. In practice, the mere fact that Chip & PIN was used is considered enough. Now that Chip & PIN is used for online banking, we may see a similar reduction of consumer protection.

Further information can be found in the paper and the talk slides.