I am at the Privacy Enhancing Technologies Symposium (PETS 2016) in Darmstadt until Friday, and will try to liveblog some of the sessions in followups to this post. (I can’t do them all as there are some parallel sessions.)
I am at the Privacy Enhancing Technologies Symposium (PETS 2016) in Darmstadt until Friday, and will try to liveblog some of the sessions in followups to this post. (I can’t do them all as there are some parallel sessions.)
In the session on censorship, the first speaker was Tariq Elahi, with a systematisation-of-knowledge paper on censorship resistance systems. (Declaration: my research student Sheharbano Khattak was a coauthor.) Censors face political, economic, social and technological bounds; the mechanisms are imperfect and while the censor tries to push the RoC curve into the bottom left corner, a censorship resistant system (CRS) tries to make it a straight line. A common strategy is to put content in services like Google and Amazon to maximise collateral damage; the censor can try to establish local alternatives or threaten the third party. In short, the games one can play with third parties can get quite complex, and the paper explores them.
Richard McPherson spoke next on CovertCast. Quite a lot of encrypted services cross common censorship boundaries, including skype and live streaming video such as YouTube and Twitch. YouTube already supports password-protected and invite-only streams, so he used it. The stegotext is encoded two bits at a time to a pixel field; at this rate it survives compression and coding at a rate of about 10k bytes per image. The downside is about 20–30s latency and the fix is to crawl and cache. It’s best for getting news sites past the censor (though the Wall Street Journal is rather slow, thanks to the high-resolution images).
Fred Douglas was next with Salmon, a volunteer-based proxy distribution system. Assume the censor could block any server it discovers; a central server distributes VPN servers to users, with encrypted email as the low-bandwidth control channel, and user entry via trusted introduction or a plausible Facebook account. Server blocking raises the suspicion level of the users that used it; else their trust level climbs. It’s said to be better than rBridge, the previous best. In questions, it was remarked that real censors don’t ban every server but observe, but wait until the failed coup attempt and then lock everyone up; it was admitted that this was beyond the threat model.
Shuai Li described Mailet as “instant social networking under censorship”. Like Sweet (Hotpets 13), it’s a server to which you can tunnel using email to services like Twitter and Facebook; the improvement includes mechanisms to protect your password from the server, as well as more attention to architectural consistency and interaction integrity.
The last speaker was Tariq Elahi again, presenting a framework for the game-theoretic analysis of censorship resistance. The dynamic behaviour of a censor and a censorship resistance system might be modeled as a strategic game between rational actors. One equilibrium strategy is for the circumventor to never violate a certain threshold below which the censor always plays “allow”; this works in indefinite horizon games as it’s always worth waiting one more round. However when this strategy is used across multiple channels, the censor can play eviction games by throttling some of then; monotonic traffic allocation reduces the benefit of this.
Albert Kwon presented Riffle, an Efficient Communication System With Strong Anonymity; it’s an improved mix design with verifiable shuffling and private information retrieval where symmetric keys are shared with mixes in a setup phase to make performance reasonable. In questions, it was admitted that dealing with users leaving and arriving would be hard.
Micah Sherr was next talking about BGP-level traffic confirmation attacks on Tor, where the adversary is trying to enumerate Tor users. The adversary will logically target the highest volume guard nodes, and not tamper with communications too much, but use standard BGP attacks when it does; it turns out that 92% of guard bandwidth is vulnerable to more-specific attacks, and the average AS can increase the share of client-guard traffic it can seen from 0 to 13% with only six attacks. As for attacks involving ads for shorter paths guards can discover their AS neighbourhoods and put them in Tor directories. In short, Tor may be an overlay, but you have to pay attention to the underlay too.
Tommaso Frassetto spoke about Selfrando, which defends against the attack the FBI used on Tor in 2013, where a compromised server sent a code that compromised a user browser causing it to beacon outside the Tor channel. It uses a custom linker-wrapper to make code-reuse attacks harder on complex C/C++ apps such as browsers by means of code randomisation. Every instance of the browser ca be different. Simple ROP attacks are blocked, forcing the attacker to use attacks such as JIT-ROP against which other defences are available.
Micah Sherr came back for the next talk, on group communication with MTor. Doing videoconferencing with Tor to Hangouts is a pain, as Tor isn’t a drop-in replacement for TCP/IP; so what can we do? His idea is to replace the exit relays with a multicast relay. How do you handle churn, control flow and give decent anonymity? He argues that multicast intrinsically requires less security than genera communications as the multicast group itself can be traced by a global passive adversary. The new design uses less than half the bandwidth of classic Tor.
Sebastian Meiser has been doing large-scale anonymity modeling of Tor path selection against structural attacks. He has written a tool, MaTor 2.0, to analyse the anonymity of different path selection and adversary strategies. Thus, for example, he can compare an adversary who corrupts k nodes with one whose bandwidth is limited in Mbit/sec, and compare five different path selection algorithms in both cases; alternatively he can compare an adversary that compromises every node in Germany with one whose attack budget is a certain number of dollars. Second-order effects include ways of spotting groups of users who choose different path-selection algorithms.
Yaoqi Jia has developed inference attacks on the peer-assisted CDNs Swarmify, BemTV and P2PSP; an adversary on the same LAN can observe all resources going to and from the victim. He has therefore developed an anonymous peer-assisted CDN which unlinks a participant’s ID with their online trace. It uses local onion routing but with careful parameter selection. He has some performance figures from a test implementation.
Armon Barton is concerned that a well-positioned AS might sit between a user and the guard, and also between the exit node and the destination; tier 1 transit providers are only one example. Various researchers have simulated the probability of a Tor building a vulnerable path or stream and it comes out at perhaps a sixth. Excluding them by client-side calculation to check that client-guard doesn’t collide with exit-target is tiresome. His solution is to note that vulnerable paths are dominated by six particular ASes and proposed destination-naive AS-aware path selection that simply avoids them. You can’t blacklist a lot more than this or you have too little choice of guard, leading to a guard placement attack. Even so you only reduce the vulnerable stream rate to 8-14%.
Raphael Toledo has been working on lower-cost epsilon-private information retrieval for use in certificate transparency. Google’s certificate transparency and the Let’s Encrypt project have each generated over 4 million published certificates; PIR doesn’t scale to this. A user can use Tor to contact a database with each genuine query hidden among several dummies, but still would have to query much of the database to get a decent epsilon value. It turns out that he can greatly improve this by making combinatorial requests and get more privacy with less traffic.
The fourth speaker was Carlos Aguilar-Melchor, talking on private information retrieval for everyone. Information retrieval can be private either information theoretically or computationally; Devet and Goldberg showed how to combine the two 2014. His version, called XPIR, uses lattice-based cryptography to replace the Pailler encryption with matrix multiplication; the underlying primitive is ring learning with errors, optimised with a fast polynomial kernel. However instead of the encryption expanding message size by 2, the expansion is almost three orders of magnitude.
Tuesday’s last talk was by Ryan Henry, who’s been trying to make information theoretic PIR more efficient using batch codes. One idea is “rampified” Shamir secret sharing with a piecewise linear combination function rather than a polynomial; it’s claimed to give some privacy (but less then the perfect variety) at much lower computational cost. Another is to send a vector of polynomials rather than a matrix; yet another is u-ary encoding.
The first of Wednesday’s keynotes was from Vijay Varadharajan, talking about the complexity of protecting modern systems with things, apps, cloud services and social components as well as public policy aspects. This is leading to increased threat velocity: attacks happen faster, and are reported faster. Meanwhile attribution gets harder, cascading effects less predictable and punishment for wrongdoing less certain. The needed countermeasures start with better basic hygiene everywhere and continue through using cloud services to monitor what’s going on, and app updates to fix holes. Centralisation holds out the prospect of some real expertise at the centre, even if just in the service provider rather than in each entrepreneurial firm. In recent years he has been working on secure cloud storage, melding role-based access control with cryptographic mechanisms, drawing initially on Goyal’s attribute-based encryption work. His latest work, role-based encryption, uses ID-based mechanisms; he worked through a example of how crypto can support role inheritance. The storage overhead is a fixed 432 bytes per file, but with 1000 users, though, decryption takes ten seconds or more. So this is at the threshold of usability, and Vijay is working with a health authority in Australia on a feasibility study for a patient record system. He suggested it could also support more fine-grained digital subscriptions: different departments at a university could subscribe to different journals.
Angela Sasse gave the second keynote. She has found that in many organisations security gets in the way of productivity; people organise their work schedule to minimise the number of authentications they have to do, with this reducing productivity by as much as 30% in one case as people refused to take corporate laptops and could not check email when on the road. Widespread non-compliance and workarounds at other organisations lead to a background noise level in which bad guys can hide. And usability is getting worse: most sessions are now on touchscreens, where the dogma of mixed-case passwords with special characters makes life even more miserable (with smart users figuring out acceptable passwords that require only one toggle but have low entropy). Bank tokens are widely hated leading banks to stampede to apps, with the iPhone taking biometrics mainstream. With face biometrics you have to smile to pay, but you must not smile in the passport booth. And when evaluating schemes, watch the failure-to-enrol rate as well as the failure-to-recognise rate, as the trade-off between these tends is real but tends to be neglected. Anyway, we are now finding that behavioural biometrics such as typing biometrics have now improved from an equal error rate of 5% to false accept of 0.03% and false alarms at 1.5%, thanks to Roy Maxion. This makes them more viable than static biometrics such as face recognition. The trend is for banks to pass their keystreams to continuous authentication companies such as Biocatch. She is a bit worried about the profiling such firms use, which include everything from cognitive traits to physiological factors to context such as geolocation. For example, if they are worried about you, they may yank the cursor off to the side of the screen and watch how you drag it back, thus doing an authentication without the customer’s awareness let alone consent. Related creepy things include alertness monitoring in cars and “engagement monitoring” for kids in US schools – wristbands that give teachers feedback on what arouses the students, or sends them to sleep (that was a Gates Foundation project; but Knewton already monitors student engagement by tracking their clicks and keystrokes). Such things are sold on face validity: “Surely the more data the better?” rather than on the basis of evidence. The use of social sorting in everything from police deployment to the targeting of leaflets against teenage pregnancy raises further issues, especially as essentially statistical techniques are assumed by many to be deterministic. All this circumvents the privacy calculus people actually use, which is about who will use their information and for what purpose; when we no longer have any idea what our data will be used for, this fails. Technology is great at saying no to people, with its shield and its veil being less painful to authority than face to face refusal: in a world that combines growing urbanism, hypermobility, pressure on resources and technophilia, with both citizens and surveillers going for all sorts of gadgets, the strains on tech policy will increase. The power increasingly comes to lie with those who dominate the tech, Stephen Graham’s “new military urbanism” replaces human rights and legal systems with risk profiling based on the citizen’s association with various kinds of disruption and resistance. It’s increasingly difficult to control stuff; opting out become suspicious; and when we kill people based on metadata, we kill innocents too. In short, we need to continuously reappraise and re-scope the privacy problem, and we also need to get better at communicating to people that the benefits of much of this technology are oversold. Is the way forward consumer unionisation, or civil disobedience? (The data protection movement was sparked when activists volunteered to be census enumerators in Germany and all called in sick on the day.)
The third keynote was by Jean-Pierre Hubaux, talking on healthcare security. He started off describing a number of medical data breaches in the USA and Switzerland, including a ransomware incident. IBM reckoned in 2015 that over 33% of all attacks are against healthcare providers, against 32% on computer services and 24% on government; the sector is the top target. Yet health insurers become ever greedier for data, offering discounts in Switzerland to citizens who sign up for fitness tracking. This creates the incentive for users to cheat, quite apart from both malicious and honest-but-curious surveillance. What’s more, Apple’s mHealth isn’t just an app but an ecosystem involving multiple apps, wearable devices and medical institutions. Google Fit and Samsung’s S-Health are headed the same way, though in Google’s case 64% of apps send data in clear and 82% ghost data with third parties such as Amazon (see Dongjing He et al). In general there’s a big grey zone between regulated apps and the “quantified self” stuff such as fitbit; yet it’s the same data and the same bodies. Add to this the UK’s 100,000-genome project, Obama’s precision medicine initiative which involves ten times as many, and the Global Alliance for Global Health where Google, IBM, Microsoft, Apple and Amazon are setting standards for data formats and security. They are talking of 0.5Tb/person and a need to protect data for a century; there are many semi-trusted stakeholders, often with conflicting requirements. The new CRISPR-CAS9 genome editing technique holds out the prospect of departing from Darwinian evolution; although there’s a moratorium on using it on people, it’s been used on monkeys. It’s bound to be used on people eventually. Now given all this, what might the system model be for personalised medicine? Attackers may go after all sorts of stakeholders and with all sorts of motives; but in general attackers will mostly follow the clinical trajectory (looking at data by persons) rather than the research one (looking at data by phenotype, or by SNP) which may give us some leverage. He has an “HIVDemo” system that uses homomorphic techniques and provides pharmacogenomic reports. He’s also been looking at kin genomic privacy; sites like OpenGenome.org let people upload their genome but family members may have different privacy preferences and finding worthwhile trade-offs is a really hard problem. For more, see http://www.genomeprivacy.org.
The human-factors session started with Crowdsourcing for context by Emmanuel Bello-Ogunu. Bluetooth low-energy beacons continuously send a signal that triggers action; the app checks the cloud to understand it. But do you want to share your location every time you’re shopping and you encounter one? Emmanuel’s idea is that users can collaborate to label beacons. He recruited 90 participants and paid them a $5 gift card to shop at a Barnes and Noble, and “click to label me” at test beacons. They had to decide which circles of friends they’d be prepared to share this label with. The task was gamified with participants rewarded for labeling as many beacons as possible. He found the crowd could in fact contribute accurate labels. Some labels (the ATM, the restrooms, the health and beauty section and the shot glasses) were significantly more sensitive, in that participants were less willing to share the fact that they’d been there.
Sam Grogan contrasts data access in the USA and Ireland; he surveyed 873 US and Irish residents to see whether they were aware of data access mechanisms and interested in using them. Over half of Irish participants knew they could get data via the data protection laws; 18% of Americans mistakenly thought they could too. More Americans were aware of their right to get copies of their credit reports. Most Americans wrongly think they have more access rights; almost as many Americans think they should have a right, and almost all (95% US, 94% Irish) said they’d use it. They were shown a tool to select the Yahoo or Google ad types they’d see, and a “reclaim your name” data broker app. The last one was least popular (still over 60%) as most people aren’t aware of data brokers.
Susan Gregor’s topic is privacy concerns in journalism, and the tension between individual and organisational computer security. 84% of the top 20 news sites have been the target of state-sponsored attacks while individual reporters have been targeted too. The attacks vary from legal through technical, and the available tools aren’t well adopted by either reporters or editors, half of whom admitted to not using any of them. Following a previous paper at Usenix last year, she talked to fifteen journalists, seven editors and seven technologists. There are tensions between journalists and editors, who agree on source protection and brand integrity but individual journalists are uninterested in phishing, password sharing and resource limits, which vex editors. Editors don’t want to know passphrases in case they are subpoenaed, and don’t want to be too prescriptive about how journalists communicate with sources. However journalists can’t use privacy-enhancing technologies if they can’t install them as they don’t have administrator rights on their PCs; and they don’t have the expertise or bandwidth to evaluate products. Journalists also work with multiple editors. To create really good tools for the press, technologists need to consider all the stakeholders, and how people actually work.
Keith Ross has been studying the right to be forgotten. Google’s implementation started in May 2014 and has honoured 43% of 1.5m requests so far; if it approves removal of a trip you made to Shanghai, it will remove queries that name you but not searches on “trip to Shanghai”. Keith investigated 283 UK articles that are know to have been delisted (as the BBC, the Telegraph, the Daily Mail and the Guardian have identified them); the mean number of names per article is 2.1 and the median is 1. The topics range from financial misconduct to pedophilia and other sexual matters, but a few with spying or drugs. Latent Dirichlet Allocation was used to text analysis, which gave similar results; the top topics were violent crime, drunk/drugged driving, drug use, murder and prostitution. He has also found that with moderate effort a transparency activist can determine delisted URLs and the names of people who requested delisting. One trick is to test whether a search of google.co.uk of “name” and “article title” returns no result; in that case the name is the requester (as set out in the law and Google’s states procedures). To find delisted articles, crawl a newspaper, search for articles on topics typically selected for delisting, then test each name in the article with the title against the local google until you get a hit. (Some newspapers help by providing all their articles on a DVD.) He did a proof-of-concept against El Mundo in Spain, testing 85,000 articles for 37 crime terms and making 6.410 queries on 4.164 articles from a single machine based in Brazil. He discovered two previously unknown delisted links, which were cross-checked by searching for the title only on google.com against google.es. So RTBF is only weakly effective for news media. This does not however affect the 95% of delistings are for personal private media, for which RTBF appears to work. (In previous work Keith shows that online privacy laws such as RTBF and COPPA can actually increase the risks to minors.)
The last talk of the session was by William Melchior, on Do not track me sometimes. He’s interested in understanding how users actually understand online tracking via third party cookies, so we can build better privacy tools. They interviewed subjects about their understanding of first and third party cookies; their perceived outcomes, whether beneficial or harmful, and whether hidden; and their preferences about the nuances of specific situations. They found, for example, that users are less comfortable with third-party tracking than first, particularly in sensitive contexts. They looked at whether various tools such as Adblock, Ghostery or private browsing could tackle the problems; they can but are bad at allowing the benefits. William believes this is because the tools don’t give sufficiently fine-grained controls to enable selective blocking based on situational factors. They looked at whether situational preferences could be predicted using machine learning and found that AdaBoost worked after a fashion, so there is some hope for automated tool support.
Jens Grossklags started the second human-factors sessions with a talk on interdependent privacy. How much do people actually care about each others’ privacy in practice, and how is it affected by the context of collection? They split 295 mTurkers between treatments where friends’ personal data was or was not relevant to an app’s performance. People valued their own information more highly than their friends’, but valued friends’ information less highly in the relevant case. Their valuation of others’ privacy is affected both by their concern for their own privacy and their concern for others generally. However their valuation vastly underestimates the amount of information shared about friends; they are privacy egoists when it comes to decisions about sharing friends’ data.
Yaxing Yao’s topic was Flying Eyes and Hidden Controllers: Privacy Perceptions of Drones. They asked sixteen subjects about their knowledge, experience and views of drones. One participant considered public space to be only parades; walking normally down the street she considered private, and similarly sitting in a shopping mall, as someone coming too close can eavesdrop on her phone call or read her email. Another expected that a drone operator would have to get consent to use footage of his face; but another said recording people at a party was OK. Creepiness factors included the visibility of the drone, whether it invaded their personal space, whether the controller is hidden or visible, and whether its purpose is known to bystanders. He suggests that drone makers and operators should devise ways of making drones more discoverable, approachable and accountable, so they can be sensitive to local social and cultural norms.
Frank Kargl was next, talking on privacy dark patterns. He studies abusive websites that bully you into handing over too much data, rapacious privacy policies, disguised ads, and apps that grab access to much more data than they need. Can we identify patterns in such practices? Sure; the bad guys are adept at maximising data collection; publishing private data; centralising it; preserving interrelationship between data; and obscuring things so that data subjects can’t find out what’s going on. In detail, address book leaching, bad defaults, forced registration and other abuses are skilfully engineered to secure broad compliance using behavioural mechanisms. See https://dark.privacypatterns.eu for more.
Hamza Harkous told us about The Curious Case of the PDF Converter that Likes Mozart. He started off from online complaints about apps trying to get apparently excessive access, and started off by looking at the top 100 Google Drive apps in the Chrome store. Examples include a pdf to word converter that wants access to all my files, in order to convert one of them. This is not needed where the app writer used Google’s file picker API. Overall, 76 of the 100 top apps were overprivileged. How can this be deterred? He has developed a tool that sorts the requested permissions into those that are needed and those that aren’t; for the latter he displays what information this leaks about you, such as photos or locations from their metadata, the far-reaching insights that could be inferred, and the most prominent faces to which the app writer would now have access. They tested these as notifcations. They also analysed over 600 apps in the Chrome store and found that 40% had changed their access – almost all to witch from partial to full access. The overall conclusion is that we need privacy dashboards that make overprivilege visible, and that also make permission changes salient.
Wednesday’s last speaker was Lawrence Saul, speaking On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction. Most people choose redaction tools based on their peers’ behaviour, and there are many examples of their getting it wrong. Do blurring and mosaicing actually protect text? The answer is no. Straightforward machine-learning techniques allow fairly trivial text recovery for all the parameter values seen on the web. He uses hidden Markov models where the observable output is a window bar sliding along the text (as the mosaic and character boundaries typically don’t align) while the hidden state is the underlying plaintext alphabet. We train the model and use the Viterbi algorithm to recover the most likely string. Technical details and optimisations are in the paper. In fact, any redaction after rendering is unfortunate as even solid-bar redaction leaks the size of the covered text; this leaked an intelligence agency that had tipped off the USA about bin Laden.
Thursday’s talks started with Andreas Kurtz and Hugo Gascon reporting a study of fingerprinting local devices. They collected 13,000 fingerprints from 8,000 iOS devices, identifying devices by configuration, installed apps and music tastes, and using cookies to spot recurring devices. They also re-identified fingerprints of different length by flattening them, computing a Jaccard coefficient, and using optimal threshold learning; they got 97% accuracy on apps and 94% on music taste. They conclude that although Apple made user tracking much harder (by blocking the AppleID from the sandbox from iOS 8), but it’s still perfectly feasible to do, and using features that users cannot feasibly block. Starting from iOS 10, Apple will ask users whether an app can access their music library, so as to prevent music taste being used as an identifier.
Tao Wang was next, studying whether one can attack Tor with website fingerprinting. Website fingerprinting attacks are hard as it’s nontrivial to record noisy packet sequence data, for example if the target is downloading background music as well as visiting a target page; 10-20 noise cells a second is enough to really cut accuracy. He’s tried various strategies for separating signal from noise, and the trade-off between a small fresh training set for the signal and a larger but less up-to-date one.
Gabor Gyorgy Gulyas was next with a talk on Near-Optimal Fingerprinting with Constraints. Apple’s iOS9 lets apps have limited access to other apps but there are leaks. He presents a number of heuristics for targeted fingerprinting. and found that users ended up in an anonymity sets size of one between 85 and 95% of the time, depending on the length of the fingerprint used. He found he could track Tor users using web bugs.
Giulia Fanti’s topic was Building a RAPPOR with the Unknown. Privacy-preserving data aggregation involves adding random noise to data (perhaps using a differentially private mechanism) followed by an aggregation phase in which you can learn marginals; an example is the Rappor scheme shipped in Chrome. However it can’t estimate joint distributions (which you need, for example, to spot malware) and you have to know the dictionary of strings. She has developed a toolbox for estimating joint distributions using expectation maximisation; she can then estimate distributions on unknown strings by hacking them into n-grams and patching the distributions together.
The morning’s last speaker was Pedro Moreno-Sanchez, who is Listening to Whispers of Ripple. The Ripple credit network allows people to transact not just cryptocurrencies but regular ones like dollars or Euros, and user-defined currencies such as cows. It allows transactions only along a path from sender to receiver where every node has enough credit; transaction locality makes things faster and more scalable. Yet all the transactions are public as in bitcoin, and although people use pseudonyms, these do not guarantee privacy. Pedro set out to measure the relative privacy of bitcoin and ripple. Various heuristics are available such as whether wallets are hot or cold, and how people’s cold wallet and their hot one might be linked. He clustered 934,484 ripple transactions (7% of them) and managed to de-anonymise tens of thousands of transactions linked to their publicly-avowed wallets; Bitstamp acknowledged this and now all their wallets are publicly identified.
The afternoon session on mobility and location privacy started with a systematization-of-knowledge talk on Privacy on Mobile Devices by Chad Spensky. Is it possible to have privacy on mobile devices? His hypothesis was that many real leaks were due to interactions at different layers in the stack, as the top layers visible to the user have little access, while the more opaque low-level functions can get the lot. Only 32% of the top 100 iOS apps and top 50 Android apps are accessible to people without a college education; and not only do a third of apps request data they don’t need, but most users don’t understand this. Of the top 50 banking apps, 4 iOS versions and 2 Android versions still validate SSL certs wrong, opening up man-in-the-middle attacks. On Android, they found that apps with no permissions could access wallpaper, network activity, directory structure and kernel crashes. Application vetting has been circumvented in multiple ways. At the OS level, there have been cases of root-level malware and infected developer tools, as well as a lot of side channels. Firmware may be interesting as both hardware people and software people tend to ignore it: it could be compromised to permit covert data capture by specialised coprocessors. So the fuzzed the NFC drivers and got lots of crashes in older handsets. Finally, at the hardware level, TEE processes have unlimited access, and the hardware crypto has low visibility and little regulation. The baseband may also shares memory with the main CPU. Can Apple break their own crypto? Nobody really knows. Oh, and SIM cards can communicate directly with the baseband; how do we know they can’t get up to no good? To sum up, modern phones are complex, with ill-defined trust relationships; reducing these might be one way forward, while mechanism design for privacy might be a complementary mitigation at the application layer.
The next speaker was Laurent Simon, and his topic was Don’t Interrupt Me While I Type: Inferring Text Entered Through Gesture Typing on Android Keyboards (declaration: I’m a coauthor). Android supports gesture typing, in which the user drags their finger from one letter of a word to the next; this becomes a series of hardware interrupts from the screen to the kernel, and software interrupts from the kernel to the keyboard all. These are visible to all apps, so a curious app can try to reconstruct text typed into another app by looking at the distribution of zero-speed events. The long zero-speed events give the word boundaries and the short ones (with their locations) give the letters. The best classifier was found to be a recurrent neural network, as this enables us to model the dependency of a letter not just on the previous letters in that word, but the previous words in the sentence. This was evaluated with the most common 200 words in a 10,000-word chat corpus. 34% of the time, the correct word appears top of the list; it’s second 10% of the time. The question was whether we could link posts on the anonymous messaging board yikyak to users; we can re-identify 80% of users from a three-word sentence among 35 sentences, or 95% from a corpus of 10 sentences. So the interrupts in the Android global virtual files do leak information.
Frederik Möllers and Christoph Sorge are tackling Privacy Challenges in the Quantified Self Movement. What should the providers do, what do they do, how does it look to users, and does it work? Under GDPR, data processing for service provision are unproblematic unless for health, in which case you need consent; but what’s health? If your location trace shows you never run or cycle, is that health data? Most quantified-self firms store data exclusively in the cloud. they surveyed 393 UK residents, who considered slppe patterns, weight, muscle mass and even mood more sensitive than location data (and way more sensitive than financial). 75% were worried about transfer of identifiable data, and 51% about anonymised data; 30% would pay a one-off fee, and 21% a monthly fee, to prevent data transfer. 85% said they would not use a QS provider that financed itself by selling data to third parties; two-third of the runtastic users said they’d stop using the service if they sold data (which the company does, at least according to its privacy policy). They’re interested in exploring when data are sufficiently anonymised to count as no longer personal.
Berker Ağır’s talk was On the Privacy Implications of Location Semantics. He implemented an inference attack using location cloaking to leak user location; whether a location cell shows one cinema or two loaks some data. So semantic information can help reconstruct geographical history. He trained a Bayesian network with semantic web data, having built a dataset from foursquare checkins and geo-tagged tweets in regions in six big cities. This is the first empirical demonstration of how location semantics affects location privacy.
Hao Wu is interested in Location Privacy with Randomness Consistency. Users of mobile social networks can be identified by query patterns, which can be mitigated by adding location noise and distance offset; he proposes an entropy minimisation attack that is an order of magnitude more efficient than previous attacks (about 100 queries versus 1400 for RANDDP). It can also lower the probability that the attacker will guest right from 100% to 80% even with many queries.
The last regular speaker was Kassem Fawaz, talking on Privacy vs. Reward in Indoor Location-Based Services. Indoor localisation is supposed to make shopping better but privacy concerns have hindered adoption. So he asked shoppers at Nordstrom and Walmart; 61% rejected tracking, 24% would allow some tracking and 15% full tracking. Yet on closer questioning, 40% of privacy-oriented participants would share some of their location for some benefits; while only 26% of the service-oriented participants would share data with third parties. In short, most people are privacy pragmatists, but it’s about how you frame things. He has a mobility model PR-LBS in which users move between zones and zone boundary transitions can be seen as transactions that may or may not be disclosed; decisions can be based on price or on a differential privacy mechanism. He claims this hits the sweet spot of the three classes of users (though the service-oriented participants are most satisfied).
First speakers were Marc and Gunes asking for collaborators on work to extend the Tor browser with a crawler.
Jeremy Epstein was next, asking us to read the new US privacy strategy; fifteen agencies fund research including NSF, DHS and DARPA, all of them working through NITRD. They have a privacy research strategy which people wanting US research funds might care to read.
Nick Matthewson spoke about Tor guard selection. Many real-world factors are ignored by academics, such as guards running on laptops whose connection is intermittent; users changing network settings; fascist firewalls; people going between places with and without IPv6 report. He wants more people to work on this; google Tor bug 12595.
Tariq Elahi has a humorous demo: LitigaTor! Search for /.*.Tor*/! and threaten people.
Jason Cronk has a startup, Microdesic, which does anonymous tokens for systems such as subways, linked to the blockchain
Roger Dingledine told us that the Tor Research Safety Board has been working on a process for people who want to measure the Tor network ethically and want to present an analysis of the benefits and any residual risks. The TRSB will not be an IRB but will try to break your proposal to see if its safe or whether its safety can be improved. This can provide input to IRB, or a section in your research paper.
Cecilia Bocovich talked about Slitheen, a new decoy routing system for censorship resistance. Systems such as Telex are vulnerable to latency attacks. Her system has an overt user simulator that fetches the overt site in its entirety; the relay replaces leaf data such as images that won’t prompt for further content. There’s a proof-of-concept at http://slitheenzyanwize.onion
Susan Landau has written a legal paper on the erosion of the distinction between content and metadata. This is breaking down with IP. For example, how do you deal with DNS (is a third party involved) and domain fronting (www.google.com/maps needs a warrant while maps.google.com can be got with a subpoena). Oh, and the US government violates its own rules. There is much more.
Susan McGregor has been doing some posters to communicate the technical design of the Internet in various languages to lay people.
Carmela Troncoso wants to hire four postdocs in Madrid.
Zinaida Beneson is organising a STAST16, a workshop on socio-technical security in Luxembourg on September 23rd. She is also collecting browser fingerprints and is seeking more participants in this project.
Ian Goldberg presented powerpoint karaoke with Linda Lee the victim talking on Horava-Lifshits gravity, to a deck that had lots of tensor calculus and some floating cats.
Elizabeth picked up on Susan McGregor’s paper yesterday on the tensions between reporters and editors. How do you turn this knowledge into training? How can an institution train staff whose goals are at odds with the firm’s? Perhaps we can learn from the way doctors and lawyers are trained. She’s interested in pointers at relevant literature, and in meeting potential collaborators.
Rachel Greenstadt is the incoming program chair and promises to “make PETS great again”. She’s gonna build a paywall and make IEEE pay for it! We are bringing PETS back to America!
Pedro Moreno-Sanchez has been working on P2P mixing for unlinkable transactions. he builds on dining cryptographers rather than mix nets or onion routing; existing mixing protocols such as Dissent and CoinShuffle need round linear in the number of peers and quadratic in the number of malicious peers. The idea is to use power sums: to get m1 and m2, it’s enough to know their sum and the sum of their squares.
Steven Murdoch has been teaching a four-hour infosec bootcamp for journalists. To get OTP across, he set up an XMPP server with full logging; the tools are here.
Claudia Diaz presented the Andreas Pfitzmann Best Student Paper Award to Samuel Grogan for Access Denied! Honourable mentions go to Oleksii Starov for Are you sure you want to contact us? and to Laurent Simon for Don’t Interrupt Me While I Type.
David Wu has been working with order-revealing encryption with some leakage that provide practical performance, and that can defeat the inference attacks presented at CCS 2015. Each database element has a left ciphertext with practical security and a right ciphertext with semantic security.
Dan from Cybernetika in Estonia has developed Sharemind, a statistical analysis system based on multiparty computation.
Isabella from De Montfort is interested in the strength of privacy metrics. She has been working on ways to display multiple metrics of attackers in 3 by 4 rectangles which enable people to visualise more complex attributes of attack power and risk.
Iraklis Symeonidis from KU Leuven has been studying the collateral damage of Facebook apps. Now 70% of users are concerned about free apps stealing their personal data.
Amir Herzberg asks “Can Johnny text securely?” and is calling for people to participate in a usability study of anonymous and secure messaging and texting. He’ll be comparing WhatsApp, Telegram, Wiber, secret and whisper and wants to know whether users understand the aspects of protection or care. He has an extended version of Telegram as it’s open source; it’s interoperable with legacy Telegram but instrumented. This is a long-term, real-use experiment; if interested contact Amir.Herzberg@gmail.com.
Rob from US Navy Labs wrote shadow, a simulator that runs Tor. He wants people who use it to contact him so he can put links to us on the website, and to support the funding case.
PETS was followed by the HotPETS workshop, with the first talk given by Linus Gasser on managing large numbers of ssh keys without necessarily being able to link devices. With one key per service per device, it does not use a CA but a private blockchain run by a cothority – a collective authority. Mining is done by proof of user confirmation and blocks are signed by all nodes once a majority of signatures are present, giving a signed forward link to the next block. Subchains will be used to pseudonymize keys.
Pedro Moreno-Sanchez talked about Ripple; yesterday he talked about de-anonymising ripple transactions and in this talk he’s sketching how to define privacy properly for a distributed credit network. The linkage between path choice and the credit available to each node on it complicates matters; the solution is to have “landmarks”, like credit reference agencies, but to split knowledge so that landmarks can do partial signatures on a user’s credit rating, that are aggregated to present a proof of a correct path to a bank.
Florian Dold spoke on Taler, a system being developed at INRIA for sender-anonymous payments. This can allow micropayments for content and assuage worries about recipients avoiding taxes or laundering money. It’s based on standard crypto: Chaum-style blind signatures on tokens purchased from the exchange which redeems them as soon as the merchant presents them. Everything’s kept simple; tokens that are partially spent or whose spending failed are tainted and must be replaced. The prototype is a browser extension and the business model is small transaction fees for the exchange.
The HotPETS keynote was by Dan Meredith of the Open Technology Fund. He started by thanking the PETS community for all the good that systems like Tor have done for human rights round the world. The fund, started in 2012, is predominantly supported by the US government and promotes public-interest and human-rights technology; its remit is free expression under article 19 of the universal declaration of human rights, and specifically free speech online. Its focus is technology-centred efforts to increase access to the Internet and combat repressive surveillance. They get 100-150 applications every two months and now over a billion people use technologies that they’ve supported, spending $35-40m over the past four years. Fundees range from an Iranian woman running a website and measuring local interference through the Noise protocol to Letsencrypt, now the third-biggest certificate issuer after only eight months. The sort of problem he’s faced is how to enable a Chinese resident to research a human-rights issue despite the Great Firewall and the language gap; this involves everything from VPN and proxy tools to domain fronting. The “great fire” project simply puts censored websites on amazon web services to make them quickly accessible to all. Encrypted text had a visible effect; in Vietnam, for example, dissidents got closely tailed until a decade ago when the state figured out how to tap and track mobile phones and the tails vanished. After the arrival of textsecure, and later redphone and signal, the tails suddenly appeared again. The fund has supported over a dozen crypto projects.
One issue is that the people who will be most helped, or harmed, by the technology the PETS community develops and maintains are not members of our community; failures take time to come to our attention, and the gap between government capabilities and those of private persons are bigger than ever before (even in Zimbabwe). When developing stuff, make unscrupulous suppliers like Hacking Team part of the design and development process. When designing technologies to shift power, be realistic, and think about scale: how many people do you have to affect to have impact? A hundred journalists, or seven million members of the population? Do we need deniability? Will the protocol scale up to millions if suddenly needed? Messaging services may seem like they’re decentralising but they aren’t necessarily;, if they shift power to Google’s push service. Should you use Orbit on IPFS instead? And what sort of mechanisms might be create to ensure that the products we create are actually doing good? Apart from the technical stuff like read-teaming and security audits, developers should involve end users in a codesign process, and indeed more and more of the fund’s applicants are coming from the target countries themselves. But there is much value in just getting stuff out there; get a hundred flowers blooming, and be compassionate and humble about it.
The afternoon’s session was started by Harry Halpin talking about the responsibility of opens standards in the era of surveillance. There has been recent discussion about whether extensibility hurts deployment and whether open standards slow down implementation. Is it not simpler to just get to large numbers of users through the major platforms? Harry discussed W3C standards such as web authentication; these might welcome more scrutiny from the research community, not just for technical review but for human-rights aspects (such as WebRTC busting the VPNs used by Iranian human-rights aspects). The big controversy right now is the forthcoming incorporation of DRM into browsers, and the petition to ensure that participant firms don’t sue security researchers who disclose vulnerabilities responsibly. Protocol design isn’t getting any easier with all sorts of issues from rampant algorithm agility to worries about postquantum crypto. We need standards for everything from blockchains to software updates and we don’t have enough independent academics standing up for human rights. In questions, it was noted that rather than asking everyone else to change to suit the standards bodies, one might ask how the standards bodies should change to fit the world.
Ame Elliott is fostering a community of UX professionals and users, Simply Secure. We have to learn to deal with customers who have an adversarial relationship with, for example, their phone carrier, which is constantly pushing malware into cheap android handsets to try and milk them. We need to be careful that research subjects are engaged as people empowered to complain, rather than as subjects. User studies can have real effects, and she has a number of tricks; when taking photographs, get a model release and spend 3–5 minutes taking some flattering photos which you share with the subject, or use non-identifiable photography of hands, feet, etc. See her resources here.
Marios Isaakidis thinks that Tor has become too much of a Swiss army knife and thinks we need more diverse anonymity primitives. He is developing CENO, a censorship resistant system on top of freenet. It can provide anonymity in the face of a global adversary to both producers and consumers of information, as well as deniability and persistence, as resources are replicated. On top of this, some freenet nodes and channel creators, request handlers, inserters or backbone nodes, connected by high-trust links and with higher bandwidth and storage. There’s no need to publish proxy / bridge addresses; there’s an existing user base; there’s fifteen years of history; and there’s existing resistance to traffic analysis. Drawbacks are poor support for dynamic content (two minute to fifteen minute latency for update); and that unpopular content eventually gets discarded.
David Lazar spoke on Alpenhorn, which lets users establish a shared secret with strong metadata privacy, no prior out-of-band communication, and strong forward secrecy. Pond assumes physical meetings; vuvuzela assumes a PKI and ricochet leaks information via cleartext IDs. However Alpenhorn uses identity-based encryption; a server provides a private key to any user who proves she owns an email address with a DKIM signature. The rest is math; the details involve onion-encrypting stuff with three servers with transient public keys, the private parts of which they are trusted to delete. Given one honest and competent server, it should all work, though it’s heavyweight (it takes 300Kb even after optimisation to set up a session key to talk to someone). More here.
Amirali Sanatinia has been working on hidden service directories, and the problem that some can be malicious. His proposal is honey onions generated for a day, a week or a month; 1500 of them will cover 95% of HSDirs so there are 4500 active onions. Working out which onions were visited following an suspicious incident enables you to find the smallest set of HSDir explaining it. A deployment from February to April this year detected at 111 HSDirs that may have snooped on contents over 40,000 visits (there was a peak in March in hidden services). 30 of them were also found by the Tor team.No snooping HSDirs were found in China, the Middle East or Africa (where Tor is often blocked); an Ali Baba node was found but in their California data centre. Most were automated and some tried aggressively to find vulnerabilities. More broadly, there is a public benefit from identifying misbehaving relays.
The last speaker at HotPeTS was Aylin Caliskan, talking on the discrimination and unfairness embedded in language models. We rely heavily on machine-learning algorithms, trained on human data, to drive applications like predictive policing. Do the algorithms pick up unfair and discriminatory biases from human language? Aylin uses language models that represent semantic spaces by word embedding. She focuses on word2vec and glove, the two leading models, which are used in sentiment analysis, named entity recognition and much else. As random examples, translations from (gender-neutral) Turkish to English arrived with gender stereotyping added by the machine translation; and the TayTweets robot trained itself to be a genocidal racist from Internet chat. Stereotype threat is another issue, as is subconscious bias: the implicit association test shows many people to be biased. She’s been mining text for evidence of subconscious bias against black Americans, and for gender bias by profession. She does indeed find a strong (89%) correlation with ground truth about the proportion of female members of each profession. Females are also associated with family, and men with career, with high effect size, while whites are associated with being pleasant and blacks with being unpleasant. Similar results are found from turning her machine-learning tools on text about disability and sexual orientation. What should policymakers do about this? We face an avalanche of applications that include biased text processing. Should firms clean the data beforehand, or the model afterwards? It’s not clear how to do either without losing utility.
This last paper won the best paper award by 233 votes to the runner-up’s 90.