Shallow Residual Network For Image Classification

Min Seong Kang posted an update — Nov 23, 2020 06:14 PM EST

Checkin 2

Introduction

In our project, we are trying to solve the problem of image classification using the Residual Network (ResNet), which was first defined in a paper written by the Microsoft Research team in 2015. The objective of the paper is to solve the problem of degradation in Deep convolutional networks that describes the phenomenon when, with increasing depth, accuracy gets saturated and then degrades rapidly. The researchers found the solution in Residual Learning that involves "Skip Connection" or identity mapping that adds an output from a previous layer to a layer ahead. We will implement this Residual Learning on our self-designed convolutional network. The reason why we chose this paper is that we were very fascinated by the CNN assignment on image classification with two classes, so we want to design a model that recognizes a broader pool of objects and performs better. In order to accomplish these two goals, we need a deeper network with Residual Learning.

Challenges

We have implemented a plain neural network with 18 convolution layers. One challenge that we meet is how to train a large dataset with a network with such deep layers efficiently utilizing GPU. In the lecture, the instructor roughly talked about using tensorflow.data.Dataset() to fetch images, divide them and train them in batches. However, in the particular setting of CIFAR dataset, we also need to fetch the corresponding labels. tensorflow.data.Dataset() has non deterministic behavior such that we do not know how to guarantee the one-to-one correspondence of images and their labels. If we let tensorflow.data.Dataset() be deterministic, for every epoch we train, we cannot shuffle the dataset thus we cannot benefit from the randomness of shuffle. To sum up, although we can fetch the all data and load them into the memory as we do in our assignment, we are now working on how to fetch one batch of data every time and train on them.

Insights

We implemented plain 16 layers CNN that is also called VGG-16. The basic architecture of this model involves convolutions with small filters of 3x3 followed by max pooling that down samples filter maps by the factor of 2. Plain-16 will serve as the baseline for our residual network: we will add residual learning on top of this network and explore supposed boost in performance. Although currently our code does not output any errors or dimensionality issues, it doesn’t work properly as the final accuracy is only 10% which tells us that there are bugs to be fixed.

Plan

Although we are experiencing some difficulties with implementing Deep CNNs, we think we are on track with our project. As of now, we should dedicate more time on our Plain-16 network to make it work and result in a reasonable accuracy. The next step would be to add residual learning to Plain-16 and think about the ways to boost the performance of the model. We are thinking about minor changes in the number of layers of our network.

Log in or sign up for Devpost to join the conversation.