Adverserial Attacks on Neural Networks

Tyler Forman posted an update — May 08, 2023 06:13 AM EDT

Introduction Goodfellow et al. (2015) found that neural networks are vulnerable to misclassifying adversarial examples, which are inputs with small, intentional perturbations that cause the model to output an incorrect answer with high confidence. They also found that these adversarial attacks can be reused on other models.They argued that the primary cause of this vulnerability is neural networks' linear nature, which is supported by new quantitative results. Challenges The main challenges have come from implementing and quantifying our noise. This is the most complicated part of our project. We have also run into difficulties dealing with the tradeoff between making a good model for CIFAR-10 and making one that we can run within a reasonable timeframe that allows for testing and adjustments. Insights Our model is performing about as we have expected. We have been able to make a custom CIFAR-10 CNN that is performing reasonably well (70%) and we have confidence that given more time to train and adjust it we can get higher. The baseline for human performance on this dataset is 94% so anything above it would be “superhuman.” Our MNIST model is performing well, but we did not expect to have trouble as it is an “easy” dataset to have a model work well with. Our noise function is performing well. We have gotten decent results with MNIST, with accuracy dropping down to 40%. We are still mostly able to make out what the numbers should be, and we expect this to fare better with CIFAR-10 as since MNIST is largely black and white it is hard to make imperceptible perturbations.

Plan We will train our CIFAR-10 model more extensively and upgrade it by adding more layers and taking advantage of increased training time with more dropout layers. We will try to also use another CIFAR-10 model to show that we can generalize our noise, if time is permitting.

Log in or sign up for Devpost to join the conversation.