【源码】基于卷积神经网络的车辆检测

最新推荐文章于 2022-08-20 15:41:00 发布

梅花香——苦寒来

最新推荐文章于 2022-08-20 15:41:00 发布

阅读量399

点赞数

原文链接：https://mp.weixin.qq.com/s?__biz=MzUxMTk0OTA3Nw==&mid=2247516650&idx=6&sn=45f4cb46cf6d8301808dbc8388228295&chksm=f9692190ce1ea886cbd457db2bfbb6736926797a6245722fbe1b0d707a9888916a50509e8218&token=910055140&lang=zh_CN#rd

版权

这篇博客介绍了作者如何运用深度学习中的卷积神经网络（CNN）来实现车载前视摄像头的车辆检测。通过构建一个能定位车辆的模型，利用类似Faster R-CNN的区域提议方法，将小样本输入扩大到任意尺寸，从而在大图上进行车辆检测。作者使用了Udacity提供的数据集进行训练，并针对模型在特定车型上的识别困难进行了数据增强。最终，训练和测试的车辆图像总数约为7500张。

摘要由CSDN通过智能技术生成

Vehicle Detection Project
The Goal
To write a software pipeline to identify vehicles in a video from a front-facing camera on a car.
In my implementation, I used a Deep Learning approach to image recognition. Specifically, I leveraged the extraordinary power of Convolutional Neural Networks (CNNs) to recognize images.

However, the task at hand is to not just detect a vehicle presence, but rather to point to its location. Turns out CNNs are suitable for these type of problems as well. There is a lecture in CS231n Course dedicated specifically to localization and the principle I’ve employed in my solution basically reflects the idea of a region proposal discussed in that lecture and implemented in the architectures such as Faster R-CNN.

The main idea is that since there is a binary classification problem (vehicle/non-vehicle), we can construct the model in such a way that it would have an input size of a small training sample (e.g., 64x64x3) and a single-feature convolutional layer of 1x1 at the top, which output will be used as a probability value for classification.

Having trained this type of a model, the input’s width and height dimensions can be expanded arbitrarily, transforming the output layer’s dimensions from 1x1 to a map with an aspect ratio approximately matching that of a new large input.

Essentially, this would be equal to:

Cutting new big input image into squares of the models’ initial input size (e.g., 64x64)

Detecting the subject in each of those squares

Stitching the resulting 1x1 detections, preserving the same order as the corresponding squares in the source input into a map with the aspect ratio of the sides approximately matching that of a new big input image.

Data
For training I used the datasets provided by Udacity: KITTI-extracted part of vehicles and a corresponding number of samples from non-vehicles, randomly sampled.

The Final model had difficulties in detecting white Lexus in the Project video, so I augmented the dataset with about 200 samples of it. Additionally, I used the same random image augmentation technique as in Project 2 for Traffic Signs Classification, yielding about 1500 images of vehicles from the Project video. The total number of vehicle’s images used for training, validation and testing was about 7500.

在这里插入图片描述