Recent advancements in Machine Learning (ML) have taught us two main lessons: a large proportion of things that humans do can actually be automated, and that a substantial part of this automation can be done with minimal human supervision. One no longer needs to select features for models to use; in many cases people are moving away from selecting the models themselves and perform a Network Architecture Search. This means non-stop search across billions of dimensions, ever improving different properties of deep neural networks (DNNs).
However, progress in automation has brought a spectre to the feast. Automated systems seem to be very vulnerable to adversarial attacks. Not only is this vulnerability hard to get rid of; worse, we often can’t even define what it means to be vulnerable in the first place.
Furthermore, finding adversarial attacks on ML systems is really easy even if you do not have any access to the models. There are only so many things that make cat a cat, and all the different models that deal with cats will be looking at the same set of features. This has an important implication: learning how to trick one model dealing with cats often transfers over to other models. Transferability is a terrible property for security because it makes adversarial ML attacks cheap and scalable. If there is a camera in the bank running a similar ML model to the camera you can get in Costco for $5, then the cost of developing an attack is $5.
As of now, we do not really have good answers to any of these questions. In the meantime, ML controlled systems are entering the human realm.
In this Three Paper Thursday I want to talk about works from the field of adversarial ML that make it much more understandable.