
Traffic-sign recognition (TSR) technology- a technology by which a vehicle is able to recognize the traffic signs that are placed on the road e.g. “ Turn right ahead”, “Speed limit”, or “Stop” etc.- can be implemented using CNNs. This is important because a prompt response to real-time traffic events can prevent road accidents.This article will explain all the steps taken to design a Deep Learning model to do that.

Key python libraries will be imported.


Image for post

The dataset used is a German Traffic Sign Dataset. We will use around 34,800 images for training dataset, 12630 images for test dataset and 4410 images for validation dataset.

Image for post

You can display the dataset using the following code:


Image for post
Image for post

Data is shuffled to prevent the neural network from learning the order of images. RGB pixels are averaged by dividing by 3. Then, RGB images must be transformed to Grayscale before performing Normalization.

Image for post
Image for post

The gray scale and normalized images can then be visualized.


Image for post
Image for post

Convolutions are applied to extract features from the images. As input, a CNN takes tensors of shape (image_height, image_width, color_channels), here color_channels will be 1 since we have converted colored images to gray-scale ones.

The code below defines the convolution base using a common pattern: a stack of Conv2D and MaxPooling2D layers.

ReLU is used in the hidden layers as it can make the training speed of deep neural networks faster as compared to traditional activation functions. If the input is positive, the derivative of ReLu is 1 which is a constant, thus deep neural networks do not need to take extra time for computation of error terms during training phase.

Average pooling involves calculating the average for each patch of the feature map. This means that each square of the feature map is down sampled to the average value in the square. Dropout is used to reduce over-fitting.

Image for post

Output tensor from the convolutional base (of shape (10,10, 16)) is fed into Conv2D layer, which outputs a tensor (of shape (5,5,16)) which is flattened first-as Dense layers take vectors as input (which are 1D)- and then fed into Dense layers to perform classification.


Image for post

The datset has 43 output classes, so you use a final Dense layer with 43 outputs and a softmax activation, which assigns decimal probabilities to each class in a multi-class problem. These decimal probabilities must add up to 1.0.

Adam optimizer is used since its uses performs the best on average, while compiling the CNN.compile, along with loss= sparse-categrical_crossentropy . since our classes are mutually exclusive, meaning each sample belongs exactly to one class.

Image for post

CNN.evaluate is used to assess the performance of a CNN, here we can see around 80% test accuracy.


Image for post

Training and Validation Loss:


Image for post

Training and Validation Accuracy:


Image for post

Plotting a Confusion Matrix


Here, a Confusion Matrix is used to evaluate the quality of the output of a classifier on a dataset. In this plot, the diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions. True labels are along the y-axis while predicted labels are on the x axis.

Image for post
Image for post

Displaying the Output Images with Classes:


Image for post

The above output shows that, if an image of traffic signs is input, then it can be classified accurately.


Citation J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011. Coursera Project Network

