Text mining is harder than you think

Following last year’s row about Apple’s proposal to scan all the photos on your iPhone camera roll, EU Commissioner Johansson proposed a child sex abuse regulation that would compel providers of end-to-end encrypted messaging services to scan all messages in the client, and not just for historical abuse images but for new abuse images and for text messages containing evidence of grooming.

Now that journalists are distracted by the imminent downfall of our great leader, the Home Office seems to think this is a good time to propose some amendments to the Online Safety Bill that will have a similar effect. And while the EU planned to win the argument against the pedophiles first and then expand the scope to terrorist radicalisation and recruitment too, Priti Patel goes for the terrorists from day one. There’s some press coverage in the Guardian and the BBC.

We explained last year why client-side scanning is a bad idea. However, the shift of focus from historical abuse images to text scanning makes the government story even less plausible.

Detecting online wickedness from text messages alone is hard. Since 2016, we have collected over 99m messages from cybercrime forums and over 49m from extremist forums, and these corpora are used by 179 licensees in 55 groups from 42 universities in 18 countries worldwide. Detecting hate speech is a good proxy for terrorist radicalisation. In 2018, we thought we could detect hate speech with a precision of typically 92%, which would mean a false-alarm rate of 8%. But the more complex models of 2022, based on Google’s BERT, when tested on the better collections we have now, don’t do significantly better; indeed, now that we understand the problem in more detail, they often do worse. Do read that paper if you want to understand why hate-speech detection is an interesting scientific problem. With some specific kinds of hate speech it’s even harder; an example is anti-semitism, thanks to the large number of synonyms for Jewish people. So if we were to scan 10bn messages a day in Europe there would be maybe a billion false alarms for Europol to look at.

We’ve been scanning the Internet for wickedness for over fifteen years now, and looking at various kinds of filters for everything from spam to malware. Filtering requires very low false positive rates to be feasible at Internet scale, which means either looking for very specific things (such as indicators of compromise by a specific piece of malware) or by having rich metadata (such as a big spam run from some IP address space you know to be compromised). Whatever filtering Facebook can do on Messenger given its rich social context, there will be much less that a WhatsApp client can do by scanning each text on its way through.

So if you really wish to believe that either the EU’s CSA Regulation or the UK’s Online Harms Bill is an honest attempt to protect kids or catch terrorists, good luck.

Screenshot of the archived version of the Action Fraud website linked to from the NCA contact us page.

Reporting cybercrime is hard: NCA link to Action Fraud broken for 3 years

Screenshot of the archived version of the Action Fraud website linked to from the NCA contact us page.
Archived version of Action Fraud website

Yesterday I was asked for advice on anonymously reporting a new crypto scam that a potential victim had spotted before they lost money (hint: to a first approximation all cryptocurrencies and cryptoassets are a scam). In the end they got fed up with the difficulty of finding someone they could tell and gave up. However, to give the advice I thought I would check what the National Crime Agency’s National Cyber Crime Unit suggested so I searched “NCA NCCU report scam” and the first result was for the NCA’s Contact us page. Sounds good. It has a “Fraud” section which (as expected) talks about Action Fraud. However, since 2019 this page has linked to the National Archives archive of an old version of the Action Fraud website. So for three years if you followed the NCA’s website’s advice on how to report fraud you would have got very confused until you worked out you were on a (clearly labelled) archive rather than the proper website, which is why none of the forms work.

I reported this problem yesterday and I do not expect it to have been fixed by the time of writing but this problem going unresolved for three years is a clear example of the difficulties faced by victims of cybercrime.

2019 is also the year that Police Scotland declined to pay for Action Fraud as they did not consider it to provide value for money and instead handle fraud reporting internally.

I am PI of a jointly supervised between the University of Strathclyde and the University of Edinburgh PhD project funded by the Scottish Institute for Policing Research and the University of Strathclyde on Improving Cybercrime Reporting. Do get in touch with other stories of the difficulties of reporting cybercrime. The student, Juraj Sikra has published a systematic literature review on Improving Cybercrime Reporting in Scotland. It is clear that there is a long way to go to provide person centred cybercrime reporting for victims and potential victims. However, UK law enforcement in general, and Police Scotland in particular know there is a problem and do want to fix it.

European Commission prefers breaking privacy to protecting kids

Today, May 11, EU Commissioner Ylva Johannson announced a new law to combat online child sex abuse. This has an overt purpose, and a covert purpose.

The overt purpose is to pressure tech companies to take down illegal material, and material that might possibly be illegal, more quickly. A new agency is to be set up in the Hague, modeled on and linked to Europol, to maintain an official database of illegal child sex-abuse images. National authorities will report abuse to this new agency, which will then require hosting providers and others to take suspect material down. The new law goes into great detail about the design of the takedown process, the forms to be used, and the redress that content providers will have if innocuous material is taken down by mistake. There are similar provisions for blocking URLs; censorship orders can be issued to ISPs in Member States.

The first problem is that this approach does not work. In our 2016 paper, Taking Down Websites to Prevent Crime, we analysed the takedown industry and found that private firms are much better at taking down websites than the police. We found that the specialist contractors who take down phishing websites for banks would typically take six hours to remove an offending website, while the Internet Watch Foundation – which has a legal monopoly on taking down child-abuse material in the UK – would often take six weeks.

We have a reasonably good understanding of why this is the case. Taking down websites means interacting with a great variety of registrars and hosting companies worldwide, and they have different ways of working. One firm expects an encrypted email; another wants you to open a ticket; yet another needs you to phone their call centre during Peking business hours and speak Mandarin. The specialist contractors have figured all this out, and have got good at it. However, police forces want to use their own forms, and expect everyone to follow police procedure. Once you’re outside your jurisdiction, this doesn’t work. Police forces also focus on process more than outcome; they have difficulty hiring and retaining staff to do detailed technical clerical work; and they’re not much good at dealing with foreigners.

Our takedown work was funded by the Home Office, and we recommended that they run a randomised controlled trial where they order a subset of UK police forces to use specialist contractors to take down criminal websites. We’re still waiting, six years later. And there’s nothing in UK law that would stop them running such a trial, or that would stop a Chief Constable outsourcing the work.

So it’s really stupid for the European Commission to mandate centralised takedown by a police agency for the whole of Europe. This will be make everything really hard to fix once they find out that it doesn’t work, and it becomes obvious that child abuse websites stay up longer, causing real harm.

Oh, and the covert purpose? That is to enable the new agency to undermine end-to-end encryption by mandating client-side scanning. This is not evident on the face of the bill but is evident in the impact assessment, which praises Apple’s 2021 proposal. Colleagues and I already wrote about that in detail, so I will not repeat the arguments here. I will merely note that Europol coordinates the exploitation of communications systems by law enforcement agencies, and the Dutch National High-Tech Crime Unit has developed world-class skills at exploiting mobile phones and chat services. The most recent case of continent-wide bulk interception was EncroChat; although reporting restrictions prevent me telling the story of that, there have been multiple similar cases in recent years.

So there we have it: an attack on cryptography, designed to circumvent EU laws against bulk surveillance by using a populist appeal to child protection, appears likely to harm children instead.

Hiring for iCrime

A Research Assistant/Associate position is available at the Department of Computer Science and Technology to work on the ERC-funded Interdisciplinary Cybercrime Project (iCrime). We are looking to appoint a computer scientist to join an interdisciplinary team reporting to Dr Alice Hutchings.

iCrime incorporates expertise from criminology and computer science to research cybercrime offenders, their crime type, the place (such as online black markets), and the response. Within iCrime, we sustain robust data collection infrastructure to gather unique, high quality datasets, and design novel methodologies to identify and measure criminal infrastructure at scale. This is particularly important as cybercrime changes dynamically. Overall, our approach is evaluative, critical, and data driven.

Successful applicants will work in a team to collect and analyse data, develop tools, and write research outputs. Desirable technical skills include:

– Familiarity with automated data collection (web crawling and scraping) and techniques to sustain the complex data collection in adversarial environments at scale.
– Excellent software engineering skills, being familiar with Python, Bash scripting, and web development, particularly NodeJS and ReactJS.
– Experience in DevOps to integrate and migrate new tools within the existing ecosystem, and to automate data collection/transmission/backup pipelines.
– Working knowledge of Linux/Unix.
– Familiarity with large-scale databases, including relational databases and ElasticSearch.
– Practical knowledge of security and privacy to keep existing systems secure and protect against data leakage.
– Expertise in cybercrime research and data science/analysis is desirable, but not essential.

Please read the formal advertisement (at https://www.jobs.cam.ac.uk/job/34324/) for the details about exactly who and what we’re looking for and how to apply — and please pay special attention to our request for a covering letter!

A striking memoir by Gus Simmons

Gus Simmons is one of the pioneers of cryptography and computer security. His contributions to public-key cryptography, unconditional authentication, covert channels and information hiding earned him an honorary degree, fellowship of the IACR, and election to the Rothschild chair of mathematics when he visited us in Cambridge in 1996. And this was his hobby; his day job was a mathematician at Sandia National Laboratories, where he worked on satellite imagery, arms-control treaty verification, and the command and control of nuclear weapons.

During lockdown, Gus wrote a book of stories about growing up in West Virginia during the depression years of the 1930s. After he circulated it privately to a few friends in the cryptographic community, we persuaded him to put it online so everyone can read it. During this desolate time, coal mines closed and fired their workers, who took over abandoned farms and survived as best they could. Gus’s memoir is a gripping oral history of a period when some parts of the U.S.A. were just as poor as rural Africa today.

Here it is: Another Time, Another Place, Another Story.

Security course at Cambridge

I have taken over the second-year Security course at Cambridge, which is traditionally taught in Easter term. From the end of April onwards I will be teaching three lectures per week. Taking advantage of the fact that Cambridge academics own the copyright and performance rights on their lectures, I am making all my undergraduate lectures available at no charge on my YouTube channel frankstajanoexplains.com. My lecture courses on Algorithms and on Discrete Mathematics are already up and I’ll be uploading videos of the Security lectures as I produce them, ahead of the official lecturing dates. I have uploaded the opening lecture this morning. You are welcome to join the class virtually and you will receive exactly the same tuition as my Cambridge students, at no charge. 


The philosophy of the course is to lead students to learn the fundamentals of security by “studying the classics” and gaining practical hands-on security experience by recreating and replicating actual attacks. (Of course the full benefits of the course are only reaped by those who do the exercises, as opposed to just watching the videos.)


This is my small contribution to raising a new generation of cyber-defenders, alongside the parallel thread of letting young bright minds realise that security is challenging and exciting by organising CTFs (Capture-The-Flag competitions) for them to take part in, which I have been doing since 2015 and continue to do. On that note, any students (undergraduate, master or PhD) currently studying in a university in UK, Israel, USA, Japan, Australia and France still have a couple more days to sign up for our 2022 Country to Country CTF, a follow-up to the Cambridge to Cambridge CTF that I co-founded with Howie Shrobe and Lori Glover at MIT in 2015. The teams will mix people at different levels so no prior experience is required. Go for it!

CoverDrop: Securing Initial Contact for Whistleblowers

Whistleblowing is dangerous business. Whistleblowers face grave consequences if they’re caught and, to make matters worse, the anonymity set – the set of potential whistleblowers for a given story – is often quite small. Mass surveillance regimes around the world don’t help matters either. Yet whistleblowing has been crucial in exposing corruption, rape and other crimes in recent years. In our latest research paper, CoverDrop: Blowing the Whistle Through A News App, we set out to create a system that allows whistleblowers to securely make initial contact with news organisations. Our paper has been accepted at PETS, the Privacy Enhancing Technologies Symposium.

To work out how we could help whistleblowers release sensitive information to journalists without exposing their identity, we conducted two workshops with journalists, system administrators and software engineers at leading UK-based news organisations. These discussions made it clear that a significant weak point in the whistleblowing chain is the initial contact by the source to the journalist or news organisation. Sources would often get in touch over insecure channels (e.g., email, phone or SMS) and then switch to more secure channels (e.g., Tor and Signal) later on in the conversation – but by then it may be too late. 

Existing whistleblowing solutions such as SecureDrop rely on Tor for anonymity and expect a high degree of technical competence from its users. But in many cases, simply connecting to the Tor network is enough to single out the whistleblower from a small anonymity set. 

CoverDrop takes a different approach. Instead of connecting to Tor, we embed the whistleblowing mechanism in the mobile news app published by respective news organisations and use the traffic generated by all users of the app as cover traffic, hiding any messages from whistleblowers who use it. We implemented CoverDrop and have shown it to be secure against a global passive network adversary that also has the ability to issue warrants on all infrastructure as well as the source and recipient devices.

We instantiated CoverDrop in the form of an Android app with the expectation that news organisations embed CoverDrop in their standard news apps. Embedding CoverDrop into a news app provides the whistleblower with deniability as well as providing a secure means of contact to all users. This should nudge potential whistleblowers away from using insecure methods of initial contact. The whistleblowing component is a modified version of Signal, augmented with dummy messages to prevent traffic analysis. We use the Secure Element on mobile devices, SGX on servers and onion encryption to reduce the ability of an attacker to gain useful knowledge even if some system components are compromised.

The primary limitation of CoverDrop is its messaging bandwidth, which must be kept low to minimise the networking cost borne by the vast majority of news app users who are not whistleblowers. CoverDrop is designed to do a critical and difficult part of whistleblowing: establishing initial contact securely. Once a low-bandwidth communication channel is established, the source and the journalist can meet in person, or use other systems to send large documents.

The full paper can be found here.

Mansoor Ahmed-Rengers, Diana A. Vasile, Daniel Hugenroth, Alastair R. Beresford, and Ross Anderson. CoverDrop: Blowing the Whistle Through A News App. Proceedings on Privacy Enhancing Technologies, 2022.

Arm releases experimental CHERI-enabled Morello board as part of £187M UKRI Digital Security by Design programme

Professor Robert N. M. Watson (Cambridge), Professor Simon W. Moore (Cambridge), Professor Peter Sewell (Cambridge), Dr Jonathan Woodruff (Cambridge), Brooks Davis (SRI), and Dr Peter G. Neumann (SRI)

After over a decade of research creating the CHERI protection model, hardware, software, and formal models and proofs, developed over three DARPA research programmes, we are at a truly exciting moment. Today, Arm announced first availability of its experimental CHERI-enabled Morello processor, System-on-Chip, and development board – an industrial quality and industrial scale demonstrator of CHERI merged into a high-performance processor design. Not only does Morello fully incorporate the features described in our CHERI ISAv8 specification to provide fine-grained memory protection and scalable software compartmentalisation, but it also implements an Instruction-Set Architecture (ISA) with formally verified security properties. The Arm Morello Program is supported by the £187M UKRI Digital Security by Design (DSbD) research programme, a UK government and industry-funded effort to transition CHERI towards mainstream use.

Continue reading Arm releases experimental CHERI-enabled Morello board as part of £187M UKRI Digital Security by Design programme