All posts by Richard Clayton

ePolicing – Tomorrow the world?

This week has finally seen an announcement that the Police Central e-crime Unit (PCeU) is to be funded by the Home Office. However, the largesse amounts to just £3.5 million of new money spread over three years, with the Met putting up a further £3.9 million — but whether the Met’s contribution is “new” or reflects a move of resources from their existing Computer Crime Unit I could not say.

The announcement is of course Good News — because once the PCeU is up and running next Spring, it should plug (to the limited extent that £2 million a year can plug) the “level 2” eCrime gap that I’ve written about before. viz: that SOCA tackles “serious and organised crime” (level 3), your local police force tackles local villains (level 1), but if criminals operate outside their force’s area — and on the Internet this is more likely than not — yet they don’t meet SOCA’s threshold, then who is there to deal with them?

In particular, the PCeU is envisaged to be the unit that deals with the intelligence packages coming from the City of London Fraud Squad’s new online Fraud Reporting website (once intended to launch in November 2008, now scheduled for Summer 2009).

Of course everyone expects the website to generate more reports of eCrime than could ever be dealt with (even with much more money), so the effectiveness of the PCeU in dealing with eCriminality will depend upon their prioritisation criteria, and how carefully they select the cases they tackle.

Nevertheless, although the news this week shows that the Home Office have finally understood the need to fund more ePolicing, I don’t think that they are thinking about the problem in a sufficiently global context.

A little history lesson might be in order to explain why.
Continue reading ePolicing – Tomorrow the world?

Root of Trust ?

I’ve given some talks this year about the Internet’s insecure infrastructure — stressing that fundamental protocols such as BGP and DNS cannot really be trusted at the moment. Although they work just fine most of the time, they are susceptible to attacks which can mean, for example, that you visit the wrong website, or your email is intercepted.

Steps are now being taken, rather faster since Dan Kaminsky came up with a really effective DNS poisoning attack, to secure DNS by using DNSSEC.

The basic idea of DNSSEC is that when you get an answer from the DNS it will be signed by someone you trust. At some point the “trust anchor” for the system will be “.” the DNS root, but for the moment there’s just a handful of “trust anchors” one level down from that. One such anchor is the “.se” country code domain for Sweden. Additionally, Brazil (.br), Puerto Rico (.pr), and Bulgaria (.bg) have signed their zones, but that’s about it for today.

So, wishing to get some experience with the brave new world of DNSSEC, I decided that Sweden was the “in” place to be, and to purchase “cloudba.se” and roll out my first DNSSEC signed domain.

The purchase wasn’t as easy as it might have been — when you buy a domain, Sweden insists that people provide their identity numbers (albeit they have absolutely no way of checking if you’re telling the truth) — or if a company they want a VAT or registration number (which are checkable, albeit I suspect they didn’t bother). I also found that they don’t like spaces in the VAT number — which held things up for a while!

However, eventually they sent me a PGP signed email to tell me I was now the proud owner of “cloudba.se”. Unfortunately, this email wasn’t in RFC3156 PGP/MIME format (or any other format that my usually pretty capable email client understood).

The email was signed with key 0xF440EE9B which was reassuring because the .se registry gives the fingerprint for this key on their website here. Rather less reassuringly footnote (*) next to the fingerprint says “.SE signature for outgoing e-mail. (**) June 1 through August 31.” (the (**) is for a second level of footnote, which is absent — and of course it is now September).

They also enable you to fetch the key through a link on this page to their “PGP nyckel-ID” at http://subkeys.pgp.net.

Unfortunately, fetching the key shows that the signature on the email is invalid. [Update 1 Oct: I’ve finally now managed to validate it, see comment.]

Since the email seems to have originated in the Windows world, but was signed on a Linux box (giving it a mixture of 0D 0A and 0A line endings), then pushed through a three year old copy of MIME-tools I suppose the failure isn’t too surprising. But strictly the invalid signature means that I shouldn’t trust the email’s contents at all — because the contents have definitely been tampered with since the signature was applied.

Since the point of the email was to get me to login for the first time to the registry website and set my password to control the domain, this is a little unfortunate.

Even if the signature had been correct, then should I trust the PGP key?

Well it is pointed to from the registry website which is a Good Thing. However, they do themselves no favours by referencing a version on the public key servers. I checked who had signed the key (which is an alternative way of trusting its provenance — since the email had arrived to a non-DNSSEC secured domain). Turned out there was no-one I knew, and of 4 individual signatures, 2 were from expired keys. The other signature was the IIS root key — which sounds promising. That has 8 signatures, once again not people I know — but only 1 from a non-expired key, so perhaps I can get to know some of the other 7?

Of course, anyone can sign a key on a public key server, so perhaps it makes sense for .se to suggest that people fetch a key with as many signatures as possible — there’s more chance of it being signed by someone they know. Anyway, I have now added my own signature, using an email address at my nice shiny new domain. However, it is possible that I may not have increased the level of trust 🙁

An A to Z of confusion

A few days ago I blogged about my paper on email spam volumes — comparing “aardvarks” (email local parts [left of the @] beginning with “A”) with “zebras” (those starting with a “Z”).

I observed that provided one considered “real” aardvarks and zebras — addresses that received good email amongst the spam — then aardvarks got 35% spam and zebras a mere 20%.

This has been widely picked up, first in the Guardian, and later in many other papers as well (even in Danish). However, many of these articles have got hold of the wrong end of the stick. So besides mentioning A and Z, it looks as if I should have published this figure from the paper as well…

Figure 3 from the academic paper

… the point being that the effect I am describing has little to do with Z being at the end of the alphabet, and A at the front, but seems to be connected to the relative rarity of zebras.

As you can see from the figure, marmosets and pelicans get around 42% spam (M and P being popular letters for people’s names) and quaggas 21% (there are very few Quentins, just as there are very few Zacks).

There are some outliers in the figure: for example “3” relates to spammers failing to parse HTML properly and ending up with “3c” (a < character) at the start of names. However, it isn’t immediately apparent why “unicorns” get quite so much spam, it may just be a quirk of the way that I have assessed “realness”. Doubtless some future research will be able to explain this more fully.

Zebras and Aardvarks

We all know that different people get different amounts of email “spam“. Some of these differences result from how careful people have been in hiding their address from the spammers — putting it en claire on a webpage will definitely improve your chances of receiving unsolicited email.

However, it turns out there’s other effects as well. In a paper I presented last week to the Fifth Conference on Email and Anti-Spam (CEAS 2008), I showed that the first letter of the local part of the email address also plays a part.

Incoming email to Demon Internet where the email address local part (the bit left of the @) begins with “A” (think of these as aardvarks) is almost exactly 50% spam and 50% non-spam. However, where the local part begins with “Z” (zebras) then it is about 75% spam.

However, if one only considers “real” aardvarks and zebras, viz: where a particular email address was legitimate enough to receive some non-spam email, then the picture changes. If one treats an email address as “real” if there’s one non-spam email on average every second day, then real aardvarks receive 35% spam, but real zebras receive only 20% spam.

The most likely reason for these results is the prevalence of “dictionary” or “Rumpelstiltskin” attacks (where spammers guess addresses). If there are not many other zebras, then guessing zebra names is less likely.

Aardvarks should consider changing species — or asking their favourite email filter designer to think about how this unexpected empirical result can be leveraged into blocking more of their unwanted email.

[[[ ** Note that these percentages are way down from general spam rates because Demon rejects out of hand email from sites listed in the PBL (which are not expected to send email) and greylists email from sites in the ZEN list. This reduces overall volumes considerably — so YMMV! ]]]

An insecurity in OpenID, not many dead

Back in May it was realised that, thanks to an ill-advised change to some random number generation code, for over 18 months Debian systems had been generating crypto keys chosen from a set of 32,768 possibilities, rather than from billions and billions. Initial interest centred around the weakness of SSH keys, but in practice lots of different applications were at risk (see long list here).

In particular, SSL certificates (as used to identify https websites) might contain one of these weak keys — and so it would be possible for an attacker to successfully impersonate a secure website. Of course the attacker would need to persuade you to mistakenly visit their site — but it just so happens that one of the more devastating attacks on DNS has recently been discovered; so that’s not as unlikely as it must have seemed back in May.

Anyway, my old friend Ben Laurie (who is with Google these days) and I have been trawling the Internet to determine how many certificates there are containing these weak keys — and there’s a lot: around 1.5% of the certs we’ve examined.

But more of that another day! because earlier this week, Ben spotted that one of the weak certs was for Sun’s “OpenID” website, and that two more OpenID sites were weak as well (by weak we mean that a database lookup could reveal the private key!)

OpenID, for those who are unfamiliar with it, is a scheme for allowing you to prove your identity to site A (viz: provide your user name and password) and then use that identity on site B. There’s a queue of people offering the first bit, but rather less offering the second : because it means you rely on someone else’s due diligence in knowing who their users are — where “who” is a hard sort of thing to get your head around in an online environment.

The problem that Ben and I have identified (advisory here), is that an attacker can poison a DNS cache so it serves up the wrong IP address for openid.sun.com. Then, even if the victim is really cautious and uses https and checks the cert, their credentials can be phished. Thereafter, anyone who trusts Sun as an identity provider could be very disappointed. There’s other attacks as well, but you’ve probably got the general idea by now.

In principle Sun should make a replacement certificate and that should be it (and so they have — read Robin Wilton’s comments here). Except that they need to put the old certificate onto a Certificate Revocation List (CRL) because otherwise it will still be trusted from now until it expires (a fair while off). Sadly, many web browsers, and most of the OpenID codebases haven’t bothered with CRLs (or they don’t enable their checking by default so it’s as if it wasn’t there for most users).

One has to conclude that Sun (and the other two providers) should not be trusted by anyone for quite a while to come. But does that matter ? Since OpenID didn’t promise all that much anyway, does a serious flaw (which does require a certain amount of work to construct an attack) make any difference? At present this looks like the modern equivalent of a small earthquake in Chile.

Additional: Sun’s PR department tell me that the dud certificate has indeed been revoked with Verisign and placed onto the CRL. Hence any system that checks the CRL cannot now be fooled.

Listening to the evidence

Last week the House of Commons Culture, Media and Sport Select Committee published a report of their inquiry into “Harmful content on the Internet and in video games“. They make a number of recommendations including a self-regulatory body to set rules for Internet companies to force them to protect users; that sites should provide a “watershed” so that grown-up material cannot be viewed before 9pm; that YouTube should screen material for forbidden content; that “suicide websites” should be blocked; that ISPs should be forced to block child sexual abuse image websites whatever the cost, and that blocking of bad content was generally desirable.

You will discern a certain amount of enthusiasm for blocking, and for a “something must be done” approach. However, in coming to their conclusions, they do not, in my view, seem to have listened too hard to the evidence, or sought out expertise elsewhere in the world…
Continue reading Listening to the evidence

Personal Internet Security: follow-up report

The House of Lords Science and Technology Committee have just completed a follow-up inquiry into “Personal Internet Security”, and their report is published here. Once again I have acted as their specialist adviser, and once again I’m under no obligation to endorse the Committee’s conclusions — but they have once again produced a useful report with sound conclusions, so I’m very happy to promote it!

Their initial report last summer, which I blogged about at the time, was — almost entirely — rejected by the Government last autumn (blog article here).

The Committee decided that in the light of the Government’s antipathy they would hold a rapid follow-up inquiry to establish whether their conclusions were sound or whether the Government was right to turn them down, and indeed, given the speed of change on the Internet, whether their recommendations were still timely.

The written responses broadly endorsed the Committee’s recommendations, with the main areas of controversy being liability for software vendors, making the banks statutorily responsible for phishing/skimming fraud, and how such fraud should be reported.

There was one oral session where, to everyone’s surprise, two Government ministers turned up and were extremely conciliatory. Baroness Vadera (BERR) said that the report “was somewhat more interesting than our response” and Vernon Coaker (Home Office) apologised to the Committee “if they felt that our response was overdefensive” adding “the report that was produced by this Committee a few months ago now has actually helped drive the agenda forward and certainly the resubmission of evidence and the re-thinking that that has caused has also helped with respect to that. So may I apologise to all of you; it is no disrespect to the Committee or to any of the members.

I got the impression that the ministers were more impressed with the Committee’s report than were the civil servants who had drafted the Government’s previous formal response. Just maybe, some of my comments made a difference?

Given this volte face, the Committee’s follow-up report is also conciliatory, whilst recognising that the new approach is very much in the “jam tomorrow” category — we will all have to wait to see if they deliver.

The report is still in favour of software vendor liability as a long term strategy to improving software security, and on a security breach notification law the report says “we hold to our view that data security breach notification legislation would have the twin impacts of increasing incentives on businesses to avoid data loss, and should a breach occur, giving individuals timely information so that they can reduce the risk to themselves“. The headlines have been about the data lost by the Government, but recent figures from the ICO show that private industry is doing pretty badly as well.

The report also revisits the recommendations relating to banking, reiterating the committee’s view that “the liability of banks for losses incurred by electronic fraud should be underpinned by legislation rather than by the Banking Code“. The reasoning is simple, the banks choose the security mechanisms and how much effort they put into detecting patterns of fraud, so they should stand the losses if these systems fail. Holding individuals liable for succumbing to ever more sophisticated attacks is neither fair, nor economically efficient. The Committee also remained concerned that where fraud does take place, reports are made to the banks, who then choose whether or not to forward them to the police. They describe this approach as “wholly unsatisfactory and that it risks undermining public trust in the police and the Internet“.

This is quite a short report, a mere 36 paragraphs, but comes bundled with the responses received, all of which from Ross Anderson and Nicholas Bohm, through to the Metropolitan Police and Symantec are well worth reading to understand more about a complex problem, yet one where we’re beginning to see the first glimmers of consensus as to how best to move forward.

Slow removal of child sexual abuse image websites

On Friday last week The Guardian ran a story on an upcoming research paper by Tyler Moore and myself which will be presented at the WEIS conference later this month. We had determined that child sexual abuse image websites were removed from the Internet far slower than any other category of content we looked at, excepting illegal pharmacies hosted on fast-flux networks; and we’re unsure if anyone is seriously trying to remove them at all!
Continue reading Slow removal of child sexual abuse image websites

Twisty little passages, all alike

Last month, on the 4th April, I published a document describing how the Phorm system worked and blogged about what I thought of the scheme. The document had been run past Phorm’s technical people to ensure it was correct, but — it turns out — there were still a handful of errors in it. A number of helpful people pointed out that I’d misdescribed third-party cookies (which didn’t matter much because Phorm specifically uses first-party cookies), and I’d managed to reference RFC2695 rather than RFC2965 !

In my original document, I’d waved my hands a little bit about how the system worked if people had blocked cookies for specific domains, and so I swapped some more email with Phorm to better understand, and then published a revised version on 23rd April — so that the correct information would be available to accompany FIPR’s press release and paper on the various laws that the Phorm system breaks. However, there was one final thing that wasn’t dealt with by press time, and that’s now been explained to me….

The Phorm system does some of its tracking magic by redirecting browser requests using HTTP 307 responses. When this was first explained to me at the meeting with Phorm there were two redirections (a scan of my notes is here), but having thought about this for a while, I asked for it to be explained to me again later on, and it turned out that I had previously been misled, and that there were in fact three redirections (here’s my notes of this part of the meeting).

It now turns out, following my further emails with Phorm, that there are in fact FOUR redirections occurring! This is not because my notes are rubbish — but because Phorm have managed to recall more of the detail of their own system!

For full details of how I understand the system works (at least until some more detail comes to light), see the latest version of my explanatory document, but to give you a flavour of it, consider an example visit to www.cnn.com:

  • The user wants to visit www.cnn.com, but their request does not contain a cookie (for www.cnn.com) with a Phorm unique identifier within it. They are redirected (ONE) by the Phorm system to www.webwise.net.
  • The user visits webwise.net by following the redirection. If they do not have a Phorm identifier cookie, then they will be issued with a new identifier and redirected (TWO) elsewhere on webwise.net.
  • The user visits webwise.net for the second time. If they still don’t have a Phorm identifier cookie then their IP address is marked as wishing to opt-out and they will be redirected to www.cnn.com and they won’t be redirected again for at least 30 minutes. If they do have a cookie (or if they had one at the previous stage) they are redirected (THREE) to a special URL within www.cnn.com.
  • The user visits the special URL, which the Phorm system redirects to a fake version of www.cnn.com that sets a www.cnn.com cookie with their Phorm identifier in it, and redirects (FOUR) them to the URL they wanted to visit all along.

For the moment, this appears to be the grand total; there can be up to four redirections, and it is deducible from this description what happens if you refuse (or delete) cookies in the webwise.net and www.cnn.com domains. It is also apparent that if you resolve webwise.net to 127.0.0.1 that you’ll never get past the first redirection; and you will need to rely on the Phorm system spotting these repeated failures and turning off redirection for your IP address.

direct adjective: Straightforward in manner or conduct; upright, honest.

indirect adjective: Mechanism by which Phorm fools your system into accepting tracking cookies from third-party websites, even when those websites promise never to track you!

Stealing Phorm Cookies

Last week I gave a talk at the 80/20 Thinking organised “town hall meeting” about the Phorm targeted advertising system. You can see my slides here, and eventually there will be some video here.

One of the issues I talked about was the possibility of stealing Phorm’s cookies, which I elaborate upon in this post. I have written about Phorm’s system before, and you can read a detailed technical explanation, but for the present, what it is necessary to know is that through some sleight-of-hand, users whose ISPs deploy Phorm will end up with tracking cookies stored on their machine, one for every website they visit, but with each containing an identical copy of their unique Phorm tracking number.

The Phorm system strips out these cookies when it can, but the website can access them anyway, either by using some straightforward JavaScript to read their value and POST it back, or by the simple expedient of embedding an https image ( <img = "https://.... ) within their page. The Phorm system will not be able to remove the cookie from an encrypted image request.

Once the website has obtained the Phorm cookie value, then in countries outside the European Union where such things are allowed (almost expected!), the unique tracking number can be combined with any other information the website holds about its visitor, and sold to the highest bidder, who can collate this data with anything else they know about the holder of the tracking number.

Of course, the website can do this already with any signup information that has been provided, but the only global tracking identifier it has is the visiting IP address, and most consumer ISPs give users new IP addresses every few hours or few days. In contrast, the Phorm tracking number will last until the user decides to delete all their cookies…

A twist on this was suggested by “Barrie” in one of the comments to my earlier post. If the remote website obtains an account at the visitor’s ISP (BT, Talk Talk or Virgin in the UK), then they can construct an advert request to the Phorm system, using the Phorm identifier of one of their visitors. By inspecting the advert they receive, they will learn what Phorm thinks will interest that visitor. They can then sell this information on, or serve up their own targeted advert. Essentially, they’re reverse engineering Phorm’s business model.

There are of course things that Phorm can do about these threats, by appropriate use of encryption and traffic analysis. Whether making an already complex system still more complex will assist in the transparency they say they are seeking is, in my view, problematic.