Reflection
Introduction
In this project, we solve a traditional problem in the deep learning area, which is image classification. Although more and more deeper and larger network architectures have been proposed and achieved better performance on image classification problems, emerging problems such as gradient vanishing and computation power consumption have worsened due to the larger size of parameters and number of network layers. Inspired by spinal architecture of humans, H M Dipu Kabir et al. designed a new linear layer as the substitute of the fully connected layers for CNNs. With such a linear layer, a CNN can achieve state-of-the-art performance by using less parameters, which, in turn, reduces the computation overhead. This project is to reimplement the network architecture described in the paper, SpinalNet: Deep Neural Network with Gradual Input, and test our implementation on the dataset such as, ImageNet and Google Quick Draw. The reason why we choose this paper is two-fold. First, because the authors of this paper addressed an emerging and significant problem in the deep learning area, which is high computation overhead. People tend to set up deeper networks with larger numbers of parameters to achieve better performance, which leads to large energy and hardware resource consumption. However, this paper demonstrated that with careful design of the network architecture and usage of a reasonable number of parameters, the performance can still be better than the state-of-the-art in most cases. Second, the design of this network is inspired by the function of the human nervous system, which, we think, is an interesting topic and good direction to design a neural network.
Challenges
After we finished building the network based on the architecture provided by the paper, we used MNIST dataset to test our neural network. Although we set the same hyperparameters, including learning rate, CNN’s kernel size and stride length, the network’s loss did not decrease. We think this problem is caused by different approaches between PyTorch and Tensorflow to initialize layers’ parameters. After we tuned the learning rate and tried different scales and distributions of data initialization, we finally were able to train SpinalNet using MNIST. Once we verified the architecture, we changed the dataset to Google Quick Draw. However, this dataset was too large to finish the training in a short period of time. In order to get a preliminary result, we picked 24 classes from the dataset, and processed the data again. In this way, we were able to finish the training in an hour.
Insights
We have completed the implementation of VGG model and SpinalVGG model and run both models with Google Quick Draw dataset. We currently reached the accuracy of 89.94% for SpinalVGG which is only slightly higher than the model without adding the spinalNet. Although the results are on the right track, we are expecting to see more improvements in accuracy by adding the spinalNet. The authors of the paper indicated that models with spinal nets would show more advantage as more epochs are trained. Therefore, we are planning on training more epochs to see how the results are.
Plan
We are on track with our project. As we finished the implementation of VGG and SpinalVGG and ran brief experiments to make sure they are working as expected. In the remaining times of this semester, we would spend more time figuring out the transfer learning approach. The author substituted the final feed-forward layer of pre-trained models (available for PyTorch) from torch vision, and we found it not as simple for Keras pre-trained models for TensorFlow. We are currently not thinking about potential changes, as we feel like we are on the right track working towards our end goal.
Log in or sign up for Devpost to join the conversation.