Title: The Detection & Classification of Traffic Signs using Convolutional Neural Networks
Who: Sahil Bansal (sbansa12), Shreyas Raman (ssunda11), Emily Zhang (ezhang52)
Final Writeup: https://docs.google.com/document/d/14BhSDrJvWzFaDc4MqUGe8yBz8A7wVeeDw5J0qN1dND0/edit?usp=sharing
Introduction:
The paper we are trying to implement detects and classifies traffic signs, using multi-layered CNNs, particularly for noisy images (e.g. different lighting or weather conditions). The paper extends traditional application of CNNs in image processing by jointly detecting and further classifying several traffic signs (based on shape, color and structure). The model stands out by aiming to take input images where the target object occupies a small (approx. 80x80 pixel) regions of a large (2000x2000 pixel) image. Whilst previous object-detection models (e.g. using Support Vector Machines) have been effective for detecting target objects occupying a large portion of the pixel space, the application of multi-layer CNNs in this paper far outperforms them.
We selected this paper because it offers us a chance to expand our understanding of image classification with CNNs by further introducing a branched 3-stream architecture for pixel classification, bounding boxes (detection) and labels display (classification) i.e. the potential for detection and classification described by the model on the paper seems to be an interesting avenue for exploration. The potential exploration into object ‘detection’ (beyond classification) would be a novel exploration to the content covered in the course; thus we were interested to explore how this could be achieved through a CNN architecture and what additional impacts it would have on model performance - beyond the standard multi-layer CNNs covered in lecture. Image detection and recognition functionalities can be used in several real-life scenarios e.g. autonomous vehicles recognizing traffic signs while driving, smart glasses or cameras requiring auto-focus with object detection, etc.
Related Work:
We were not aware of any specific examples of traffic sign detection and classification algorithms prior to starting this project. Traffic sign detection is a widely covered subject in the topic of autonomous vehicles. This article (https://phys.org/news/2019-05-traffic-recognition-influential-decade.html) touches upon the subject to traffic sign recognition in this field.
The article mentions that traffic signs are much simpler to categorize in comparison to other more complex objects, as traffic signs all have a simple, relatively standard set of colors, shapes, and symbols. Another interesting point made is that an autonomous vehicle must oftentimes rely on “real-time feeds” of what the camera can see. The article essentially describes how traffic sign detection is becoming less of a commodity and more of a need, as autonomous vehicles are becoming more popular.
Public Implementations:
https://cg.cs.tsinghua.edu.cn/traffic-sign/ (Source Code of Paper, Written in Caffe) https://lijiancheng0614.github.io/2019/04/16/2019_04_16_TT100K/#architecture (Code of Model in Paper) https://github.com/asyncbridge/tsinghua-tencent-100k (Code of model, Uses Caffe) https://github.com/JunshengFu/traffic-sign-recognition (Traffic Sign Recognition Model, Not the Paper’s Model) https://github.com/jacobssy/Traffic_Sign_detection (Traffic Sign Detection Model, Not the Paper’s Model) https://github.com/vamsiramakrishnan/TrafficSignRecognition (Traffic Sign Recognition Model, Not the Paper’s Model)
Data: We have found the following datasets that can be used:
https://www.kaggle.com/valentynsichkar/traffic-signs-preprocessed A little less than 87000 examples in training dataset, 43 classes. Has 9 pickle files for train, validation, and testing dataset. Files 0-3 have RGB images, files 4-9 are greyscale. All preprocessing has been done. https://www.kaggle.com/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign The German Traffic Recognition Benchmark is a standardised dataset that has over 50000 images with 42 classes. It has been preprocessed to include classID, shapeID, colorID, sign ID. https://sid.erda.dk/public/archives/ff17dc924eba88d5d01a807357d6614c/published-archive.html German Traffic Sign Detection Benchmark with 600 training images, 300 testing images, and ground truth for both train and test sets (ground truth for test sets are placed in a separate file). https://cg.cs.tsinghua.edu.cn/traffic-sign/ Original Dataset for Paper: Researchers annotated the images by hand, recording the voiding box, boundary vertices, and class label for the sign. Has 100000 images. We may need to weed out classes with too few instances. For example, the paper states that classes with less than 100 instances were weeded out while classes with less than 1000 instances were augmented (by randomly rotating the “standard template” for a class) to give them more than 1000. We may also need to add random images without street signs for additional noise. These datasets have had preprocessing done to them, we would just need to read in that data and perhaps augment the data through rotations, size changes, removing some classes, or similar.
Methodology:
The backbone architecture of our model is a multi-layered Convolutional Neural Network (CNN). The network for object detection (i.e. bounding box generation and object localization - for multiple signs) seems as if it would be hardest to implement. We are thinking of following the model’s implementation of Selective Search, Edge Boxes and BING; however the paper does not provide extensive exploration of their implementation of these features, thus we are also looking into adapting the paper’s model by implementing a Mask-R CNN or Fast-RCNN network as an initial working segment for traffic sign detection.
Within the overarching CNN pipeline, the paper recommends splitting into three branches after the 6th layer. The paper recommends branching into a pixel layer ( the probability of a 4x4 pixel region having the target object), a bounding-box layer (the distance between a 4x4 pixel region and the four sides of the target object’s predicted bounding box), and a label layer (outputs a classification vector, similar to a logit vector, with the probability of belonging to a specific subclass of traffic signs). The paper’s description of the label layer implementation is also slightly unclear, as the layer seems like it outputs a single probability vector when an image can have more than one target (traffic sign) present within it. How the model works with images of multiple signs is yet to be determined.
The general layout of the model will be: 8 convolution layers (layers 1, 2, 5 will have pooling and stride; layer 1, 2 will have an additional lrn layer; layers 6, 7 will have additional dropout). We will fork after the 6th layer, so that layer 7 will be a forked layer with 3 separate convolution layers with dropout, and layer 8 will consist of 3 parallel branches connecting to those in layer 7: bounding box, pixel, and label layers.
We will split training, testing samples in the ratio 2:1. A Hinge Loss Stochastic Gradient Descent (HLSGD) is suggested to train the CNNs.
Metrics:
The paper’s model has an accuracy of 84% and a 94% recall at Jaccard similarity coefficient of 0.5. The paper also uses MicrosoftCOCO benchmark, dividing the traffic-sign images according to the pixel size of the target on the image: small objects (area <322pixels), medium objects(322962) - thereby differentiating the model’s performance on multiple sized images/targets.
Our base goal and metric will encompass the general accuracy of the model. We want to be able to detect and classify street signs with an accuracy of 65-75% (with perhaps an 80-85% on the German Dataset).
If we run into problems with training on such a large dataset, we may have to adapt the goal to the classification of street signs and work with a simpler dataset that just includes traffic signs with labels. This would be the case if we can’t seem to train or get detection working with the original dataset, we may try to incorporate the German Detection Dataset if possible.
Our target goal is to be able to detect and classify street signs. Time permitting, we aim to be able to detect and classify medium to large objects (as defined by Microsoft COCO) and segment our accuracies between these differently sized images.
If we are able to achieve our target goal, we will improve accuracy on smaller sized objects. Another stretch goal is to use other metrics of measuring our model like a Jaccard similarity coefficient if we are able to achieve our desired accuracy.
Ethics:
What broader societal issues are relevant to your chosen problem space? Traffic sign detection can be useful for people who struggle with driving ( perhaps from vision problems or another disability). On a whole, autonomous vehicles can be used by people who are unable to drive (or have a disability where they need assistance when driving). Traffic sign detection is an integral part of autonomous vehicles, which is useful for people who struggle to drive. Traffic sign detection may also be used in other applications. For example, if a driver often misses traffic signs (i.e. if the driver is elderly) then there could be an application that announces the presence of a traffic sign ahead with an audio cue. The problem of traffic sign detection is very useful to address issues of people who cannot drive or have trouble driving. This contributes to overall road safety, as a traffic sign detection algorithm would reduce human error (like missing a sign or driving past a stop sign). An example of this would be if a semi-autonomous car stops the car in front of a traffic sign that a human driver may drive past. Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm? The algorithm would mainly be used by automobile companies; when looking up traffic sign detection or classification, several results that came up were from car companies like Nissan. These companies use such algorithms for autonomous and semi-autonomous vehicles, which means that the algorithm plays an important role in the decision making of autonomous vehicles. Any mistakes in the algorithm would lead to terrible consequences, as they could result in injury or death if the autonomous vehicle performs the wrong action in response to a wrongly classified street sign. It is important not to assume that the algorithm is always accurate, because understanding the flaws in our algorithm will allow us to know the limitations that an application of the model has.
Division of labor:
-Emily: Writing/Poster Creation and Helping With Coding
-Shreyas: Dealing with Bounding Boxes and Visualizer
-Sahil: Coding Model
Built With
- python
- tensorflow

Log in or sign up for Devpost to join the conversation.