Category Archives: Security engineering

Bad security, good security, case studies, lessons learned

Pico part III: Making Pico psychologically acceptable to the everyday user

Many users are willing to sacrifice some security to gain quick and easy access to their services, often in spite of advice from service providers. Users are somehow expected to use a unique password for every service, each sufficiently long and consisting of letters, numbers, and symbols. Since most users do not (indeed, cannot) follow all these rules, they rely on unrecommended coping strategies that make passwords more usable, including writing passwords down, using the same password for several services, and choosing easy-­to-­guess passwords, such as names and hobbies. But usable passwords are not secure passwords and users are blamed when things go wrong.

This isn’t just unreasonable, it’s unjustified, because even secure passwords are not immune to attack. A number of security breaches have had little to do with user practices and password strength, such as the Snapchat hacking incident, theft of Adobe customer records and passwords, and the Heartbleed bug. Stronger authentication requires a stronger and more usable authentication scheme, not longer and more complex passwords.

We have been evaluating the usability of our more secure, token-­based system: Pico, a small, dedicated device that authenticates you to services. Pico is resistant to theft-resistant because it only works when it is close to its owner, which it detects by communicating with other devices you own – Picosiblings. These devices are smaller and can be embedded in clothing and accessories. They create a cryptographic “aura” around you that unlocks Pico.

For people to adopt this new scheme, we need to make sure Pico is psychologically acceptable – unobtrusive and easily and routinely used, especially relative to passwords. The weaknesses of passwords have not been detrimental to their ubiquity because they are easy to administer, well understood, and require no additional hardware or software. Pico, on the other hand, is not currently easy to administer, is not widely understood, and does require additional hardware and software. The onus is on the Pico team to make our solution convenient and easy to use. If Pico is not sufficiently more convenient to use than passwords, it is likely to be rejected, regardless of improvements to security.

This is a particular challenge because we are not merely attempting to replace passwords as they are supposed to be used but as they are actually used by real people, which is more usable, though much less secure, than our current conception of Pico.

Small electronic devices worn by the user – typically watches, wristbands, or glasses – have had limited success (e.g. Rumba Time Go Watch and the Embrace+ Wristband). Reasons include issues with the accuracy of the data they collect, time spent having to manage data, the lack of control over appearance, the sense that these technologies are more gimmicks than useful, the monetary cost, and battery life (etc.). All of these issues need to be carefully considered to pass the user’s cost-benefit analysis of adoption.

To ensure the psychological acceptability of Pico, we have been conducting user studies from the very early design stages. The point of this research is to make sure we don’t impose any restrictive design decisions on users. We want users to be happy to adopt Pico and this requires a user-­centred approach to research and development based on early and frequent usability testing.

Thus far, we have qualitatively investigated user experiences of paper prototypes of Pico, which eventually informed the design of three-­dimensional plastic prototypes (Figure 1).

Figure 1a.
Figure 1. Left: Early paper and plasticine Pico prototypes; Right: Plastic (Polymorph) low-fidelity Pico prototypes

This exploratory research provided insight into whether early Pico designs were sufficient for allowing the user to navigate through common authentication tasks. We then conducted interviews with these plastic prototypes, asking participants which they preferred and why. In the same interviews, we presented participants with a range of pseudo-­Picosiblings (Figure 2) to get an idea of the feasibility of Picosiblings from the end­user’s perspective.

Figure 2.
Figure 2. The range of pseudo-Picosiblings including everyday items (watch, keys, accessories, etc.) and standalone options (magnetic clips and free-standing coins)

The challenge seems to be in finding a balance between cost, style, and usefulness. Consider, for example, the usefulness of a watch. While we expect a watch to tell us the time (to serve a function), what we really care about is its style and cost. This is the difference between my watch and your watch, and it is where we find its selling power. Most wearable electronic devices, such as smart-­watches and fitness gadgets, advertise function and usefulness first, and then style, which is often limited to one, or maybe two, designs. And cost? Well, you’ll just have to find a way to pay for it. Pico, like these devices, could provide the potential usefulness required for widespread and enduring adoption, which, if paired with low cost and user style, should have a greater degree of success than previous wearable electronic devices.

Initial analysis of the results reveals polarised opinions of how Pico and Picosiblings should look, from being fashionable and personalizable to being disguised and discrete. Interestingly, users seemed more concerned about the security of Pico than about the security of passwords. Generally, however, the initial research indicates that users do see the usefulness of Pico as a standalone device, providing it is reliable and can be used for a wide range of services; hardware is no benefit to people unless it replaces most, if not all, passwords, otherwise it becomes another thing that people have to remember.

A legitimate concern for users is loss or theft; we are working to ensure that such incidents do not cause inconvenience or pose threat to the user by making the system easily recoverable. Related concerns relevant to possessing physical devices are durability, physical ease-­of-­use, the awkwardness of having to handle and aim the Pico at a QR code, and the everyday convenience of remembering and carrying several devices.

To make remembering and carrying several devices easier and more worthwhile, interviews revealed that Picosiblings should have more than one function (e.g. watches, glasses, ID cards). By making Picosiblings practical, users are more likely to remember to take them, and to perceive the effort of carrying them around as being outweighed by the benefit. Typically, 3-­4 items were the maximum number of Picosiblings that users said they would be happy to carry; the aim would be to reduce the required number of Picosiblings to 1 or 2 (depending on what they are), allowing users to carry more on them as “backups” if they were going to use them anyway.

Though suggested by some, the same emphasis on dual-­function was not observed for Pico, since this device serves a sufficiently valuable function in itself. However, while many found it perfectly reasonable to carry a dedicated, secure device (given its function), some did express a preference for the convenience of an App on their smartphone. To create a more streamlined experience, we are currently working on such an App, which should give these potential Pico users the flexibility they seem to desire.

By taking into account these and other user opinions before committing to a single design and implementation, we are working to ensure Pico isn’t just theoretically secure – secure only if we can rely on users to implement it properly despite any inconvenience. Instead, we can make sure Pico is actually secure, because we will be creating an authentication scheme that requires users to do only what they would do (or better, want to do) anyway. We can achieve this by taking seriously the capabilities and preferences of the end-­user.

 

Pico part II: What’s wrong with QR code password replacement schemes, and how to fix them!

Users don’t want to authenticate, they want to do useful or enjoyable things like sending emails, ordering groceries or playing games. To alleviate the burden of having to type passwords, Pico and several other schemes, such as SQRL and tiQR, let the user simply scan a QR code; then a cryptographic protocol authenticates the user behind the scenes and initiates a session. But users, unless they are on the move, may prefer to run their email or web browsing sessions on their full-size computer instead of on their  smartphone, whose user interface is relatively limited. Therefore they don’t want an authenticated session between their smartphone and the website but between their computer and the website, even if it’s the smartphone that scans the QR code.

In the original 2011 Pico paper (footnote 37), the website kept track of which “page impression” from a web browser was related to which Pico authentication by including a nonce in each login page QR code and having the Pico sign and return it as part of the authentication. Since then, within the Pico team, there has been much discussion of the so-called Page Impression Nonce or PIN, infamous both for the attacks it enables and its unfortunate, overloaded acronym. While other schemes may have called it something different, or not called it anything at all, it was always present in one form or another because they all used it to solve this same problem of linking browser sessions to authentications.

For example, in the SQRL system each QR code contains a URL, part of which is a random nonce (the PIN in this system). The SQRL app must sign and return this URL, thus associating the nonce with the app’s per-verifier public key. The web browser then starts its session by making another request which includes the URL (and thus the PIN) and gets back a session cookie.

So what’s the problem?

The problem with this kind of mechanism is that anyone else who learns the PIN can also make that second request, thus logging themselves in as the user who scanned the QR code. For example, a bad guy can obtain a QR code and its PIN from the login page of bank.com and display it somewhere, like the login page of randomgameforum.com, for a victim to scan. Now, assuming the victim had an account at bank.com, the attacker obtains a bank.com session that the victim unsuspectingly initiated with their smartphone.

Part of the problem is that QR codes are not human-readable. Some have suggested that a simple confirmation step (“Do you really want to login to bank.com?”) might prevent such attacks, but we decided this wasn’t really good enough from a security or a usability perspective. We don’t want users to have to read the confirmation dialog and press the OK button every time they authenticate, and realistically they won’t, especially if they never normally do anything other than press OK.

Moreover, the confirmation step doesn’t help at all when the relaying of the QR code is combined with traditional phishing techniques. Consider receiving this email:

From: security@bank.com
To: victim@example.com
Subject: Urgent: Account security threat
---
Dear Customer

<compelling phishing mumbo jumbo>

To keep your account secure, please scan this QR code:

<login QR code with PIN known by the sender>

Kind regards,

Account security department

and if you oblige:

Do you really want to login to bank.com?

Now the poor user thinks “Well yes, I do, that’s exactly what the account security team asked me to do” and even worse: “I’m definitely not being phished, I remember what those security people kept telling me about checking the address of the website before logging in”.

How to fix it

The solution we came up with is called session delegation. Instead of having a nonce in each QR code, which anyone can later trade-in for an authenticated session, we have the website return a session delegation token to the Pico (not the web browser) as part of the authentication protocol. The Pico may then delegate the session to the browser on the bigger computer by sending it this token, via a secure channel. (For further details see section 4.1 of our “lousy phish” paper.) The price to pay for this strategy is that it requires a channel from the Pico to the browser, which is much harder to provide than the one in the opposite direction (the visual “QR code” channel).

We made a prototype which used Bluetooth for the delegation channel but, because Bluetooth was sometimes difficult to set up and not universally available, we even thought about using an audio cable plugged into the microphone jack of the computer. However, we were still worried about the availability and usability of these hardware-based solutions. We did a lot of research into NAT and firewall traversal techniques (such as STUN and TURN) to see if we could use peer-to-peer IP connectivity, but this is not possible in all cases without a separate signalling channel. In our latest prototype we’re using a “rendezvous point”, which is a very simple relay server we’ve designed, running in the public Internet. The rendezvous point is the most universal and usable solution, but does come with some privacy concerns, namely that the untrusted rendezvous server gets to see the Pico/computer IP address pairs which are communicating. So we still allow privacy-conscious users to adopt less convenient alternatives if they’re willing to pay the price of setting up Bluetooth, connecting cables or changing their firewall/NAT settings, but we don’t impose that cost on everyone.

The drawback of this approach is that the user’s computer requires some Pico software to receive the delegation tokens, via the rendezvous point or whatever other channel. Having to install these hurts the “deployability” of the system as a whole and could render it completely useless in situations where installing new software is not possible. But another innovation, making the delegation token take the form of a URL, means there is always a last-resort fallback channel: manual transcription. If a Pico user can’t install the software on, or doesn’t want to trust, a particular computer, they can always still retype the token URL. There are other security concerns related to having URLs which will log your browser into someone else’s account, but you’ll have to read the lousy phish paper for a more detailed discussion of this topic.

There is clearly much interest in finding a replacement for passwords and several schemes (such as US 8261089 B2Snap2Pass, tiQR, US 20130219479 A1, QRAuth, SQRL) propose using QR codes. But upon close inspection, all of the above use a page impression nonce, making them vulnerable to session hijacking attacks. We rejected the idea that this could be solved simply by getting the user to carry out more checks and instead we propose an architectural fix which provides a more secure basis for the design of Pico.

For more information about Pico, have a look at our website, sign up to our mailing list and stay tuned for more Pico-related posts on Light Blue Touchpaper in the near future.

The CHERI capability model: Revisiting RISC in an age of risk (ISCA 2014)

Last week, Jonathan Woodruff presented our joint paper on the CHERI memory model, The CHERI capability model: Revisiting RISC in an age of risk, at the 2014 International Symposium on Computer Architecture (ISCA) in Minneapolis (video, slides). This is our first full paper on Capability Hardware Enhanced RISC Instructions (CHERI), collaborative work between Simon Moore’s and my team composed of members of the Security, Computer Architecture, and Systems Research Groups at the University of Cambridge Computer Laboratory, Peter G. Neumann’s group at the Computer Science Laboratory at SRI International, and Ben Laurie at Google.

CHERI is an instruction-set extension, prototyped via an FPGA-based soft processor core named BERI, that integrates a capability-system model with a conventional memory-management unit (MMU)-based pipeline. Unlike conventional OS-facing MMU-based protection, the CHERI protection and security models are aimed at compilers and applications. CHERI provides efficient, robust, compiler-driven, hardware-supported, and fine-grained memory protection and software compartmentalisation (sandboxing) within, rather than between, addresses spaces. We run a version of FreeBSD that has been adapted to support the hardware capability model (CheriBSD) compiled with a CHERI-aware Clang/LLVM that supports C pointer integrity, bounds checking, and capability-based protection and delegation. CheriBSD also supports a higher-level hardware-software security model permitting sandboxing of application components within an address space based on capabilities and a Call/Return mechanism supporting mutual distrust.

The approach draws inspiration from Capsicum, our OS-facing hybrid capability-system model now shipping in FreeBSD and available as a patch for Linux courtesy Google. We found that capability-system approaches matched extremely well with least-privilege oriented software compartmentalisation, in which programs are broken up into sandboxed components to mitigate the effects of exploited vulnerabilities. CHERI similarly merges research capability-system ideas with a conventional RISC processor design, making accessible the security and robustness benefits of the former, while retaining software compatibility with the latter. In the paper, we contrast our approach with a number of others including Intel’s forthcoming Memory Protection eXtensions (MPX), but in particular pursue a RISC-oriented design instantiated against the 64-bit MIPS ISA, but the ideas should be portable to other RISC ISAs such as ARMv8 and RISC-V.

Our hardware prototype is implemented in Bluespec System Verilog, a high-level hardware description language (HDL) that makes it easier to perform design-space exploration. To facilitate both reproducibility for this work, and also future hardware-software research, we’ve open sourced the underlying Bluespec Extensible RISC Implementation (BERI), our CHERI extensions, and a complete software stack: operating system, compiler, and so on. In fact, support for the underlying 64-bit RISC platform, which implements a version of the 64-bit MIPS ISA, was upstreamed to FreeBSD 10.0, which shipped earlier this year. Our capability-enhanced versions of FreeBSD (CheriBSD) and Clang/LLVM are distributed via GitHub.

You can learn more about CHERI, BERI, and our larger clean-slate hardware-software agenda on the CTSRD Project Website. There, you will find copies of our prior workshop papers, full Bluespec source code for the FPGA processor design, hardware build instructions for our FPGA-based tablet, downloadable CheriBSD images, software source code, and also our recent technical report, Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture, and Jon Woodruff’s PhD dissertation on CHERI.

Jonathan Woodruff, Robert N. M. Watson, David Chisnall, Simon W. Moore, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton, and Michael Roe. The CHERI capability model: Revisiting RISC in an age of risk, Proceedings of the 41st International Symposium on Computer Architecture (ISCA 2014), Minneapolis, MN, USA, June 14–16, 2014.

EMV: Why Payment Systems Fail

In the latest edition of Communications of the ACM, Ross Anderson and I have an article in the Inside Risks column: “EMV: Why Payment Systems Fail” (DOI 10.1145/2602321).

Now that US banks are deploying credit and debit cards with chips supporting the EMV protocol, our article explores what lessons the US should learn from the UK experience of having chip cards since 2006. We address questions like whether EMV would have prevented the Target data breach (it wouldn’t have), whether Chip and PIN is safer for customers than Chip and Signature (it isn’t), whether EMV cards can be cloned (in some cases, they can) and whether EMV will protect against online fraud (it won’t).

While the EMV specification is the same across the world, they way each country uses it varies substantially. Even individual banks within a country may make different implementation choices which have an impact on security. The US will prove to be an especially interesting case study because some banks will be choosing Chip and PIN (as the UK has done) while others will choose Chip and Signature (as Singapore did). The US will act as a natural experiment addressing the question of whether Chip and PIN or Chip and Signature is better, and from whose perspective?

The US is also distinctive in that the major tussle over payment card security is over the “interchange” fees paid by merchants to the banks which issue the cards used. Interchange fees are about an order of magnitude higher than losses due to fraud, so while security is one consideration in choosing different sets of EMV features, the question of who pays how much in fees is a more important factor (even if the decision is later claimed to be justified by security). We’re already seeing results of this fight in the courts and through legislation.

EMV is coming to the US, so it is important that banks, customers, merchants and regulators know the likely consequences and how to manage the risks, learning from the lessons of the UK and elsewhere. Discussion of these and further issues can be found in our article.

PhD studentship: Model-based assessment of compromising emanations

We have a fully funded 3.5-year PhD Studentship on offer, from October 2014, for a research student to work on “Model-based assessment of compromising emanations”. The project aims to improve our understanding of electro-magnetic emissions that are unintentionally emitted by computing equipment, and the eavesdropping risks they pose. In particular, it aims to improve test and measurement procedures (TEMPEST) for computing equipment that processes extremely confidential data. We are looking for an Electrical Engineering, Computer Science or Physics graduate with an interest in electronics, software-defined radio, hardware security, side-channel cryptanalysis, digital signal processing, electromagnetic compatibility, or machine learning.

Check the full advert and contact Dr Markus Kuhn for more information if you are interested in applying, quoting NR03517. Application deadline: 23 June 2014.

Light Blue Touchpaper now on HTTPS

Light Blue Touchpaper now supports TLS, so as to protect passwords and authentication cookies from eavesdropping. TLS support is provided by the Pound load-balancer, because Varnish (our reverse-proxy cache) does not support TLS.

The configuration is intended to be a reasonable trade-off between security and usability, and gets an A– on the Qualys SSL report. The cipher suite list is based on the very helpful Qualys Security Labs recommendations and Apache header re-writing sets the HttpOnly and Secure cookie flags to resist cookie hijacking.

As always, we might have missed something, so if you notice problems such as incompatibilities with certain browsers, then please let us know on <lbt-admin@cl.cam.ac.uk>. or in the comments.

The pre-play vulnerability in Chip and PIN

Today we have published a new paper: “Chip and Skim: cloning EMV cards with the pre-play attack”, presented at the 2014 IEEE Symposium on Security and Privacy. The paper analyses the EMV protocol, the leading smart card payment system with 1.62 billion cards in circulation, and known as “Chip and PIN” in English-speaking countries. As a result of the Target data breach, banks in the US (which have lagged behind in Chip and PIN deployment compared to the rest of the world) have accelerated their efforts to roll out Chip and PIN capable cards to their customers.

However, our paper shows that Chip and PIN, as currently implemented, still has serious vulnerabilities, which might leave customers at risk of fraud. Previously we have shown how cards can be used without knowing the correct PIN, and that card details can be intercepted as a result of flawed tamper-protection. Our new paper shows that it is possible to create clone chip cards which normal bank procedures will not be able to distinguish from the real card.

When a Chip and PIN transaction is performed, the terminal requests that the card produces an authentication code for the transaction. Part of this transaction is a number that is supposed to be random, so as to stop an authentication code being generated in advance. However, there are two ways in which the protection can by bypassed: the first requires that the Chip and PIN terminal has a poorly designed random generation (which we have observed in the wild); the second requires that the Chip and PIN terminal or its communications back to the bank can be tampered with (which again, we have observed in the wild).

To carry out the attack, the criminal arranges that the targeted terminal will generate a particular “random” number in the future (either by predicting which number will be generated by a poorly designed random number generator, by tampering with the random number generator, or by tampering with the random number sent to the bank). Then the criminal gains temporary access to the card (for example by tampering with a Chip and PIN terminal) and requests authentication codes corresponding to the “random” number(s) that will later occur. Finally, the attacker loads the authentication codes on to the clone card, and uses this card in the targeted terminal. Because the authentication codes that the clone card provides match those which the real card would have provided, the bank cannot distinguish between the clone card and the real one.

Because the transactions look legitimate, banks may refuse to refund victims of fraud. So in the paper we discuss how bank procedures could be improved to detect whether this attack has occurred. We also describe how the Chip and PIN system could be improved. As a result of our research, work has started on mitigating one of the vulnerabilities we identified; the certification requirements for random number generators in Chip and PIN terminals have been improved, though old terminals may still be vulnerable. Attacks making use of tampered random number generators or communications are more challenging to prevent and have yet to be addressed.

Update (2014-05-20): There is press coverage of this paper in The Register, SC Magazine UK and Schneier on Security.
Update (2014-05-21): Also now covered in The Hacker News.

Heartbleed and RSA private keys

UPDATE:

Matthew Arcus pointed out that a freed buffer will always have the first 16 bytes (x86_64) corrupted by two free list pointers, according to glibc’s malloc implementation. This means the leak mechanism I described below will only explain the partial leak I exploited, but not complete prime number leaks others have seen. In fact, the uncorrupted numbers come from the long-lived copy of p and q in the Montgomery context.

——–

By now everyone knows about the OpenSSL Heartbleed vulnerability: a missing bounds check in one of the most popular TLS implementations has made millions of web servers (and others) leak all sorts of sensitive information from memory. This could leak login credentials, authentication cookies and web traffic to attackers. But could it be used to recover the site’s TLS private key? This would enable complete decryption of previously-recorded traffic if perfect forward secrecy was not negotiated at the time and otherwise Man-in-The-Middle attacks to all future TLS sessions.

Since this would be a much more serious consequence of Heartbleed, I decided to investigate. The results were positive: I was able to extract private keys from a test Nginx server after a few day’s work. Later I applied my techniques to solve the CloudFlare Challenge. Along with a few other security researchers, we independently demonstrated that RSA private keys are indeed at risk. In this blog post, I’ll provide some detail on how to extract the private key and why the attack is possible.

How to extract the private key

Readers not familiar with RSA can read about it here. To simplify things a bit, a large (2048 bits) number N is constructed by multiplying together two large randomly generated prime numbers p and q. N is made public while p and q are kept secret. Finding p or q allows recovery of the private key. A generic attack is just factorising N, but this is believed to be difficult. However, with a vulnerability like Heartbleed, the attack is much simpler: because the web server needs the private key in memory to sign the TLS handshake, p and q must live in memory and we can try to obtain them with Heartbleed packets. Then the problem simply becomes how to identify them in the returned data. This is easy, as we know p and q are 1024 bits (128 bytes) long, and OpenSSL represents big numbers little-endian in memory, so a brute-force approach of treating every 128 consecutive bytes in the heartbleed packets as little-endian numbers and testing if it divides N is sufficient to spot potential leaks. This is how most people solved the CloudFlare challenge.

Coppersmith improvement

But wait, isn’t our scenario just like a cold boot attack? And there has been a lot of research on recovering RSA private keys with partial information. One of the most famous papers from Coppersmith presents message recovery attacks on related messages or insufficient padding, as well as factorisation with partial knowledge with the help of lattice basis reduction. With the Coppersmith attack, N can be efficiently factorised if either the top or bottom half bits of p is known. Put it in context, we only need the top/bottom 64 bytes of p in order to compute the private keys, compared to the naive brute-force which requires all 128 bytes. In practice, reaching Coppersmith’s limit is computational expensive (although still much better than factorisation), but assuming 77 bytes (60%) known we can comb the heartbleed packets for potential private key segments very quickly.

In retrospect, there were 242 instances of private key remnants suitable for the Coppersmith attack, out of 10,000 packets (64 KB each) that I had collected. Implementation of the Coppersmith attack was made easy thanks to the comprehensive computer algebra building blocks from Sage (although I later found out that Sage already has Coppersmith implemented).

Can we do even better? If you have ever viewed a RSA private key using openssl rsa -text -in server.key, you will notice that there are quite a few numbers other than the two prime factors p and q. In fact, they are precomputed values for RSA’s Chinese Remainder Theorem optimisation. If some of them are leaked they can also be used to deduce p. And what about the Montgomery representations of p and q that OpenSSL use for fast multiplication? They also admit a variant of Coppersmith so partial bits are useful as well. With this in mind I set out to search for them in the collected packets from my test server where these values are all known. But not even single one of them appears partially (> 16 bytes) in the dataset. How is this possible?

Note: all my experiments and the CloudFlare challenge are targeting Nginx which is single-threaded. It is possible that for a multi-threaded web server, more leakage can be observed.

Why p and only p leaks

When heartbleed first came out, people argued that RSA private keys should not be leaked because they are loaded when the web server starts so they occupy a lower memory address, and because the heap grows upwards, a later-allocated buffer leaked by Heartbleed shouldn’t reach them. This argument is consistent with my inability to find any CRT precomputed values, but with one exception: p is definitely leaked somehow. If we assume this argument is correct, the question becomes: why is p leaked?

To add more to the mystery, OpenSSL apparently scrubs all temporary BigNums it used! To reduce the overhead of dynamically allocating temporary values OpenSSL provides a BN_CTX which is a pool of BigNums operating in stack (LIFO) fashion. Upon finishing the context is destroyed and all allocated buffers are scrubbed. This would mean that when the heartbleed packet is constructed there shouldn’t be any temporary values left in memory (again assuming single thread) because the BN_CTX has long been released.

I won’t bother you with all the pain I went through to identify the cause, so here is the spoiler: when a BigNum is being extended to a bigger buffer size, its original buffer is not zeroised before being freed. The chain of the control flow path that leads to the leak of p is even more subtle. During intial TLS handshake the server key exchange is signed with the private key. The CRT signing performs a modulo p operation, which caused p<<BN_BITS2 to be stored in a temporary variable allocated from the BN_CTX pool. Later in the CRT fault-injection check, that temporary variable is reused (remember BN_CTX operates like a stack) as val[0]. An interesting fact is that a reallocated temporary variable only gets its lowest nibble zeroised, and in the case of p<<BN_BITS2 nothing is destroyed. val[0] immediately receives a Montgomery-reduced value, but since the original buffer is unable to accommodate the new value, it gets extended and p gets released to free heap space, waiting to be grabbed. And because this happens every time a TLS handshake occurs, it can be spilled everywhere.

Since it is difficult to find which BigNums may be extended and leaked statically, I instrumented OpenSSL and experimented a bit. It turned out that a shifted version of the Montgomery representation of p will also be freed at the leaky point, but that only happened at the first RSA exponentiation when the Montgomery context is initialised, so it will live in a low memory address and I was unable to locate it in any collected packets.

The OpenSSL team has been notified about the above leak. Although to be fair this is not exactly a security bug by itself as OpenSSL is never explicitly designed to prevent sensitive data leakage to heap.

I’d like to thank Joseph Bonneau for the useful comments and proof reading.