Adverserial Attacks on Neural Networks

Tyler Forman posted an update — Apr 23, 2023 11:18 PM EDT

Title Adversarial Attacks on Neural Networks

Who Tyler Forman Luke Choi John Zhou Anthony Kang

Introduction The objective of Goodfellow et al. (2015) was to demonstrate the ease of creating adversarial examples that can cause a neural network to output an incorrect result with high confidence. The researchers explain that the weakness of deep learning models stems from their linear nature, using quantitative data on the generalizability of these weaknesses as evidence. We chose this paper because it presents an interesting problem in the field of deep learning with potential consequences including the ability to trick a facial recognition system using specially designed “adversarial glasses.” The neural networks we will be implementing and evaluating will be solving a classification problem.

When the neural networks are of high dimension it is very easy for a perturbation to greatly change the classification as even a small change can move the data from the small cluster that most of the data resides in the high dimensional space.

Related Work Szegedy et al. (2014) demonstrates that the output of a neural network depends less on the structures in the network, but rather has more to do with the output space of the vector operations performed by the model. From this observation, they show that the input-output mapping created by a model is often discontinuous, meaning small perturbations in the input can result in a drastically different output. Furthermore, these adversarial examples are generally consistent between models trained with different datasets. This means that adversarial examples to neural networks are intrinsic to the way in which these models learn rather than being caused simply due to random chance.

Data MNIST ImageNet

Methodology Use generic classifiers, simple CNN with softmax and hidden layers Have a “noise function” that we “increase” to find the limits of the model Show that our perturbations generalize

Metrics Change in accuracy, minimum level of perturbation Generalization of perturbations across different models

Base: Non generalized perturbations

Target: Generalized perturbations

Stretch: Defense against perturbation that is not just training with perturbed data (ex if the data moves outside the general “cluster” then lose confidence)

Ethics Important to have more accurate image interpretation but also raises moral issues - facial recognition will be improved. ImageNet likely has bias because its annotations are crowdsourced, so the annotations will reflect the bias of the annotators.

Log in or sign up for Devpost to join the conversation.