Recent advancements in Machine Learning (ML) have taught us two main lessons: a large proportion of things that humans do can actually be automated, and that a substantial part of this automation can be done with minimal human supervision. One no longer needs to select features for models to use; in many cases people are moving away from selecting the models themselves and perform a Network Architecture Search. This means non-stop search across billions of dimensions, ever improving different properties of deep neural networks (DNNs).
However, progress in automation has brought a spectre to the feast. Automated systems seem to be very vulnerable to adversarial attacks. Not only is this vulnerability hard to get rid of; worse, we often can’t even define what it means to be vulnerable in the first place.
Furthermore, finding adversarial attacks on ML systems is really easy even if you do not have any access to the models. There are only so many things that make cat a cat, and all the different models that deal with cats will be looking at the same set of features. This has an important implication: learning how to trick one model dealing with cats often transfers over to other models. Transferability is a terrible property for security because it makes adversarial ML attacks cheap and scalable. If there is a camera in the bank running a similar ML model to the camera you can get in Costco for $5, then the cost of developing an attack is $5.
As of now, we do not really have good answers to any of these questions. In the meantime, ML controlled systems are entering the human realm.
In this Three Paper Thursday I want to talk about works from the field of adversarial ML that make it much more understandable.
——
First, there is a common belief that a computer’s performance is worse than that of a human, at least for recognition tasks. This argument is commonly brought up in the context of automatic driving.
Geirhos et al. have conducted a very interesting study, where they looked into human performance for object detection and compared it to DNNs solving the same task [1]. They evaluated the patterns behind the errors and found that DNNs outperform humans on clean images, yet struggle when perturbations are applied. In particular, converting to greyscale, changing image contrast and applying white noise have a large impact on neural network’s performance. Human performance also degrades, but at a slower rate. Here, it should be noted that neural networks have not been trained to deal with such images, while humans have had millions of years to develop our visual system to survive in various lighting conditions such as in forests and at night.
It does appear right now that the performance of off-the-shelf DNNs is slightly worse when the sensors degrade or when attacks are applied to their models. Yet it doesn’t have to be this way. Unlike humans, computer systems can scale effectively and collect more information from the environment than humans.
——
Our second paper concerns how you run complex DNNs on simpler hardware-constrained devices. More often than not, commonly used models cannot fit into the devices on which people want to run them. There is now a field of neural network compression that looks into ways of reducing the computational and memory footprints of the common DNNs. This leads to the questions of (a) how hard it is to trick them, and (b) whether the same transferability properties apply.
Two years ago we ran a study looking into exactly this question [2]. We found that the same transferability properties apply to commonly used compression techniques. In fact, they work both ways. Compressed models can be used to attack uncompressed models from which they were derived; and vice-versa, once we have an attack on a full model, we can derive attacks on other compressed versions. Ultimately this shows that compression techniques should not be considered sufficient to provide any useful diversification between networks, so additional defence mechanisms are required to protect DNNs.
——
Third, I want to talk about attacks against certifiable robustness. The adversarial ML community thinks of robustness as a DNN’s ability to be consistent in its prediction despite the efforts of a particular attacker. By now, multiple ways have been developed to construct models that are certifiably robust to certain lp norm attackers; i.e. for all training inputs within the lp ball range, the classifier produces the same prediction.
Jacobsen et al. have looked into constructing invariance-based adversarial examples against those certified models [3]. Instead of constructing adversarial examples to cause a change in the behaviour of the model, they change the input in a meaningful way, while preserving its behaviour intact. In other words, while other adversarial techniques exploit the oversensitivity of DNNs, it is also possible to exploit their undersensitivity. One example in their paper is that a tennis ball in the image might be replaced by a strawberry that is within the lp norm and thus preserves the network’s decision.
Jacobsen and his colleagues show the existence of invariance-based adversarial samples for a range of different defence techniques and in different natural and unnatural datasets. They evaluate the attack performance against human annotators and show a wide discrepancy in predictions. Their findings suggest that certifiable robustness assumptions should be taken more carefully, as both undersensitivity and oversensitivity present a problem for deep neural networks.
[1] Geirhos, Robert, et al. “Comparing deep neural networks against humans: object recognition when the signal gets weaker.” , 2017
[2] Zhao, Yiren, et al. “To compress or not to compress: Understanding the interactions between adversarial attacks and neural network compression.“, 2018
[3] Jacobsen, Jörn-Henrik, et al. “Exploiting excessive invariance caused by norm-bounded adversarial robustness.” arXiv preprint arXiv:1903.10484 (2019).