Adverserial Attacks on Neural Networks

Luke Choi started this project — May 12, 2023 05:40 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.

Tyler Forman posted an update — May 08, 2023 06:13 AM EDT

Introduction Goodfellow et al. (2015) found that neural networks are vulnerable to misclassifying adversarial examples, which are inputs with small, intentional perturbations that cause the model to output an incorrect answer with high confidence. They also found that these adversarial attacks can be reused on other models.They argued that the primary cause of this vulnerability is neural networks' linear nature, which is supported by new quantitative results. Challenges The main challenges have come from implementing and quantifying our noise. This is the most complicated part of our project. We have also run into difficulties dealing with the tradeoff between making a good model for CIFAR-10 and making one that we can run within a reasonable timeframe that allows for testing and adjustments. Insights Our model is performing about as we have expected. We have been able to make a custom CIFAR-10 CNN that is performing reasonably well (70%) and we have confidence that given more time to train and adjust it we can get higher. The baseline for human performance on this dataset is 94% so anything above it would be “superhuman.” Our MNIST model is performing well, but we did not expect to have trouble as it is an “easy” dataset to have a model work well with. Our noise function is performing well. We have gotten decent results with MNIST, with accuracy dropping down to 40%. We are still mostly able to make out what the numbers should be, and we expect this to fare better with CIFAR-10 as since MNIST is largely black and white it is hard to make imperceptible perturbations.

Plan We will train our CIFAR-10 model more extensively and upgrade it by adding more layers and taking advantage of increased training time with more dropout layers. We will try to also use another CIFAR-10 model to show that we can generalize our noise, if time is permitting.

Log in or sign up for Devpost to join the conversation.

Tyler Forman posted an update — Apr 23, 2023 11:18 PM EDT

Title Adversarial Attacks on Neural Networks

Who Tyler Forman Luke Choi John Zhou Anthony Kang

Introduction The objective of Goodfellow et al. (2015) was to demonstrate the ease of creating adversarial examples that can cause a neural network to output an incorrect result with high confidence. The researchers explain that the weakness of deep learning models stems from their linear nature, using quantitative data on the generalizability of these weaknesses as evidence. We chose this paper because it presents an interesting problem in the field of deep learning with potential consequences including the ability to trick a facial recognition system using specially designed “adversarial glasses.” The neural networks we will be implementing and evaluating will be solving a classification problem.

When the neural networks are of high dimension it is very easy for a perturbation to greatly change the classification as even a small change can move the data from the small cluster that most of the data resides in the high dimensional space.

Related Work Szegedy et al. (2014) demonstrates that the output of a neural network depends less on the structures in the network, but rather has more to do with the output space of the vector operations performed by the model. From this observation, they show that the input-output mapping created by a model is often discontinuous, meaning small perturbations in the input can result in a drastically different output. Furthermore, these adversarial examples are generally consistent between models trained with different datasets. This means that adversarial examples to neural networks are intrinsic to the way in which these models learn rather than being caused simply due to random chance.

Data MNIST ImageNet

Methodology Use generic classifiers, simple CNN with softmax and hidden layers Have a “noise function” that we “increase” to find the limits of the model Show that our perturbations generalize

Metrics Change in accuracy, minimum level of perturbation Generalization of perturbations across different models

Base: Non generalized perturbations

Target: Generalized perturbations

Stretch: Defense against perturbation that is not just training with perturbed data (ex if the data moves outside the general “cluster” then lose confidence)

Ethics Important to have more accurate image interpretation but also raises moral issues - facial recognition will be improved. ImageNet likely has bias because its annotations are crowdsourced, so the annotations will reflect the bias of the annotators.

Log in or sign up for Devpost to join the conversation.

Adverserial Attacks on Neural Networks

Built With

Updates