Vehicle Detection Project
The Goal
To write a software pipeline to identify vehicles in a video from a front-facing camera on a car.
In my implementation, I used a Deep Learning approach to image recognition. Specifically, I leveraged the extraordinary power of Convolutional Neural Networks (CNNs) to recognize images.
However, the task at hand is to not just detect a vehicle presence, but rather to point to its location. Turns out CNNs are suitable for these type of problems as well. There is a lecture in CS231n Course dedicated specifically to localization and the principle I’ve employed in my solution basically reflects the idea of a region proposal discussed in that lecture and implemented in the architectures such as Faster R-CNN.
The main idea is that since there is a binary classification problem (vehicle/non-vehicle), we can construct the model in such a way that it would have an input size of a small training sample (e.g., 64x64x3) and a single-feature convolutional layer of 1x1 at the top, which output will be used as a probability value for classification.
Having trained this type of a model, the input’s width and height dimensions can be expanded arbitrarily, transforming the output layer’s dimensions from 1x1 to a map with an aspect ratio approximately matching that of a new large input.
Essentially, this would be equal to:
Cutting new big input image into squares of the models’ initial input size (e.g., 64x64)
Detecting the subject in each of those squares
Stitching the resulting 1x1 detections, preserving the same order as the corresponding squares in the source input into a map with the aspect ratio of the sides approximately matching that of a new big input image.
Data
For training I used the datasets provided by Udacity: KITTI-extracted part of vehicles and a corresponding number of samples from non-vehicles, randomly sampled.
The Final model had difficulties in detecting white Lexus in the Project video, so I augmented the dataset with about 200 samples of it. Additionally, I used the same random image augmentation technique as in Project 2 for Traffic Signs Classification, yielding about 1500 images of vehicles from the Project video. The total number of vehicle’s images used for training, validation and testing was about 7500.
更多精彩文章请关注公众号: