Robust Ensemble Method

Title

Ensemble Methods for Robust Image Classification

Members

Rob Bagott and Eric Ewing

Motivation

The goal of this project is to improve the robustness of learned methods. Adversarial attacks present problems to successful deployment of deep learning methods because of the risks they present. In the normal non-adversarial context, ensembles of models (even weak predictors) can improve over the performance of a single classifier. Our goal is to achieve similar results, but for robustness instead of accuracy. We will work on image classification tasks and measure our performance based on standard measures of adversarial robustness and natural accuracy.

Related work:

[1] Shows that naive ensembles of weak defenses are not strong. Their experiments showed that combining predictions of models not meant to be ensembled does not improve robustness. However, they only naively implemented ensembling and made the assumption that the models were trained without knowing that they would be used in an ensemble. However, these assumptions mean that their results, that ensembles don't trivially add to robustness, don't necessarily hold in general and there is hope in combining models.

[2] Shows that ensembles can be trained against adversarial examples in such a way that the attacks are not transferrable between ensemble members. This is achieved with a method they call diversity training, which involves adding a regularizer that punishes similar loss gradients between models in the ensemble. Combining this methodology with different weighting strategies could prove more effective.

[3] Shows how attacks can be generated against multiple classifiers in an ensemble through fictitious play between an agent that develops adversarial examples and an agent that randomizes predictions between the outputs of multiple classifiers. We intend to use a similar approach when attacking and training our ensembles.

[1] Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial ro- bustness via randomized smoothing. In International Conference on Ma- chine Learning, pages 1310–1320. PMLR, 2019.

[2] Sanjay Kariyappa, Moinuddin K. Qureshi. Improving Adversarial Robustness of Ensembles with Diversity Training

[3] Juan C. Perdomo, Yaron Singer. Robust Attacks against Multiple Classifiers

Data

MNIST or CIFAR10 are the standard methods for robust ML. Robust ML cares about robustness to adversaries and not how hard it was to train a model in the first place.

Models

The primary goal is to find an ensembling method, either a weight for each model or a fancier attention-like mechanism, that combines the predictions of a number of CNN models. We may also add in ensemble-aware training to try and train new models that fill in the gaps of the current ensemble. This might be done by progressively training new models with differently-aligned loss gradients.

Metrics

There's a natural set of metrics for robust ML. We will test against the standard set of white and black-box attacks and compare to existing algorithms. The baseline is that we should be able to do better when combining models and not worse with each new model added.

Ethics

Robustness matters to the trust of deep learning systems. If adversarial attacks are effective against learned models, it means they can't reliably be deployed without risk of incorrect predictions or exploitation. As long as our models are not robust, they cannot be used for safety-critical or high-consequence operations. In addition to protection from adversaries, robustness measures also tell us how robust our models are to natural noise. Improvements in this area may allow neural networks to behave in ways we perceive as more human or natural.