Traffic Sign CNN Classifier

Title

Who

Sebastian Martino

Code

https://github.com/SebastianMartino/Traffic-Sign-CNN-Classifier

Introduction

The problem I am trying to solve with this project is the classification of road signs by making use of CNNs. The research paper I am reimplementing can be found here. The objectives of this paper included both the simultaneous detection and classification of traffic signs, however, this project will limit its scope to only solve the classification of these traffic signs.

Related Work

There are a number of different works beyond the research paper I am basing this project on that have attempted to solve this problem. One such example can be found in this article here. This example walks through creating a CNN based network for detection and classification of traffic signs using tensorflow and keras using the GTSRB dataset from Kaggle. I also intent to use tensorflow and keras for this project, meaning this related work may serve as a useful reference for my own implementation, however I intend to use a different data set as described in the next section.

Data

I intend to use the Chinese Traffic Sign Database which contains 6164 traffic sign images across 58 sign categories with 4170 labeled training images and 1994 testing images. I may look to use other datasets as I work on this project, for example a US traffic sign dataset, but for now this dataset makes sense use due to its free access and pre-labled images.

Update: Pivoted to the German Traffic Sign Recognition Benchmark (GTSRB) dataset which consists of 30,000+ labeled training images and 10,000+ testing images across 43 classes of road signs.

Methodology

The research paper I am reimplementing outlines their network's architecture as a series of 6 convolutional layers which branches after layer 6 into 3 separate convolutional layers, a bounding-box layer, a pixel layer, and a label layer. I intend to take a similar but simpler approach to this architecture; I will likely reduce the number of convolutional layers used, and should not need the branching layers at the end as I am only attempting classification rather than simultaneous detection and classification.

Update: Diverged from architecture outline in research paper, was able to achieve very high accuracy following the modified LeNet Architecture described in this blogpost (with some minor tweaks made myself) which consists of:

1st Convolution layer, 32 filters, kernel size of 1x1, relu activation

2nd Convolution layer, 32 filters, kernel size of 5x5, relu activation

Max pooling, pool size of 2x2

3rd Convolution layer, 32 filters, kernel size of 5x5, relu activation

Max pooling, pool size of 2x2

Flatten layer

1st Fully connected layer, output size of hidden_dim1 (1024), relu activation

Dropout layer, dropout rate of 0.3

2nd Fully connected layer, output size of hidden_dim2 (512), relu activation

Dropout layer, dropout rate of 0.3

3rd and final Fully connected layer, output size of num_classes (43), softmax activation

Metrics

In the research paper I am reimplementing, they were able to achieve an accuracy for simultaneous detection and classification of 88%. My goals for this project will be baseline classification accuracy of 65%, target accuracy of 70%, and a stretch goal accuracy of 85%.

Ethics

Why is Deep Learning a good approach to this problem?: With the advent of convolutional neural networks, we have made huge strides in deep learning with image classification. The problem I am approaching in this project, which is purely classification of traffic signs, is therefore a perfect candidate for a deep learning solution with CNNs.

Who are the major stakeholders in this problem? What are the consequences of mistakes made by your algorithm?: An obvious stakeholder for this problem is the autonomous vehicle industry. In order for a self driving car to operate on the roads, it will certainly need to be able to recognize and interpret road signs in order to follow traffic rules and operate safely on the roads with other vehicles. Because this algorithm is partially responsible for lawful and safe operations of these autonomous vehicles, the consequences of mistakes made by it may have deadly consequences. Imagine if this model being used in a self driving car misinterprets a speed limit sign in a residential neighborhood to read 65mph instead of 25. This would not only put the passengers of at risk with the vehicle driving at unsafe speeds but also endanger other vehicles and pedestrians in the area. For any practical deep learning model to be put to use in an actual self driving car, it is critical that the classifier's accuracy is nearly 100% and that there are other safeguards and redundancies that reduce the chance of these dangerous and deadly mistakes.

Division of Labor

This is a solo project so I am responsible for every part

Project Checkin 2 Reflection

https://docs.google.com/document/d/1aEHG3LahXC7Ku-fQdRLr3ng8m6oCknI65Bklm-qSuJA/edit?usp=sharing

Final Project Refletion

SEE UPDATE BELOW

Built With

python
tensorflow

Updates

Sebastian Martino posted an update — Dec 09, 2021 05:30 PM EST

Title:

Traffic Sign CNN Classifier

Who:

Sebastian Martino

Introduction:

The problem I am trying to solve with this project is the classification of road signs by leveraging the power of CNNs, something with particularly useful applications in the field of autonomous vehicles. The research paper I am reimplementing can be found here. The objectives of this paper included both the simultaneous detection and classification of traffic signs, however, this project will limit its scope to only solve the classification of these traffic signs.

Methodology:

The my final implementation followed the modified LeNet Architecture described in this blogpost (with some minor tweaks made myself) which consists of:

1st Convolution layer, 32 filters, kernel size of 1x1, relu activation

2nd Convolution layer, 32 filters, kernel size of 5x5, relu activation

Max pooling, pool size of 2x2

3rd Convolution layer, 32 filters, kernel size of 5x5, relu activation

Max pooling, pool size of 2x2

Flatten layer

1st Fully connected layer, output size of hidden_dim1 (1024), relu activation

Dropout layer, dropout rate of 0.6

2nd Fully connected layer, output size of hidden_dim2 (512), relu activation

Dropout layer, dropout rate of 0.6

3rd and final Fully connected layer, output size of num_classes (43), softmax activation

Results

My model was able to achieve ~95% accuracy on the testing data after 10 epochs; as mentioned in the research paper, others have been able to achieve over 99% accuracy using the same German Traffic Sign dataset.

Challenges

The only major roadblocks I encountered when working on this project were related to finding and working with the road sign datasets. Initially I wanted to work with the same data set used in the original research paper, the Chinese Traffic Sign Dataset, but found it a bit too cumbersome to preprocess. The research paper made note of similar projects using the German Traffic Sign Recognition Benchmark dataset which were able to achieve very high accuracy in both detection and classification (nearly 100%), and I was also able to find other papers and blogposts using this dataset, so I decided after my second checkpoint to pivot to using this dataset instead. As mentioned in my second reflection, I had planned to eventually find a U.S. road sign dataset to use, however I had no luck finding a free dataset that wouldn’t require significantly more preprocessing and manual labeling.

Reflection

I believe my model met and exceeded my initial expectations. I think I was a bit too conservative with my base, target, and stretch goals (65%, 70% and 85% respectively) as I was basing my expectations on the ~88% accuracy that the research paper was able to achieve with a different, more complex, dataset while also doing simultaneous detection and classification. The research paper mentions projects that have used the German dataset I eventually transitioned to which were able to achieve more than 99% accuracy. I was initially concerned that somehow my model was overfitting given such high accuracy, however I took steps to avoid this, adding multiple dropout layers and reducing the learning rate, and I also saw that the training and testing accuracy were reasonably correlated (i.e. didn’t have high training accuracy & low testing accuracy, both grew at similar rates with each epoch). This along the fact that others have been able to achieve nearly 100% accuracy using the same data made me think that the 95% accuracy I was able to achieve was not unreasonable. My approach did not change too significantly over time, aside from the previously mentioned pivot in dataset selection. One change I did decide to make was to follow a different architecture from the one described in the original research paper, instead following the modified LeNet Architecture described in this blogpost. Given others using this same dataset were able to achieve near perfect classification accuracy, given more time I would modify my model further to increase my accuracy to be closer to 99%; however, I am more than happy with the 95% accuracy I was able to achieve with my implementation. Working on this project I was able to gain a deeper understanding of CNNs and their practical applications in both image identification and recognition. I also was able to discover and learn a number of very useful tensorflow apis that greatly streamlined my implementation process, namely the use of sequential, compile, and fit, as well as the tensorboard library for visualizing loss and accuracy.

Log in or sign up for Devpost to join the conversation.

Sebastian Martino started this project — Nov 12, 2021 06:58 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.