吴恩达深度学习学习笔记——C4W3——目标检测——作业——自动驾驶之车辆检测

最新推荐文章于 2024-04-02 10:41:44 发布

预见未来to50

最新推荐文章于 2024-04-02 10:41:44 发布

阅读量896

点赞数

分类专栏：机器学习、深度学习（ML/DL)

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/113783045

版权

机器学习、深度学习（ML/DL) 专栏收录该内容

123 篇文章 11 订阅

订阅专栏

这里主要梳理一下作业的主要内容和思路，完整作业文件可参考:

https://github.com/pandenghuang/Andrew-Ng-Deep-Learning-notes/tree/master/assignments/C4W3/Assignment/Car_detection_for_Autonomous_Driving

作业完整截图，参考本文结尾：作业完整截图。

Autonomous driving - Car detection（自动驾驶之车辆检测）

Welcome to your week 3 programming assignment. You will learn about object detection using the very powerful YOLO model. Many of the ideas in this notebook are described in the two YOLO papers: Redmon et al., 2016 (https://arxiv.org/abs/1506.02640) and Redmon and Farhadi, 2016 (https://arxiv.org/abs/1612.08242).

You will learn to:

Use object detection on a car detection dataset
Deal with bounding boxes

...

1 - Problem Statement（问题陈述）

You are working on a self-driving car. As a critical component of this project, you'd like to first build a car detection system. To collect data, you've mounted a camera to the hood (meaning the front) of the car, which takes pictures of the road ahead every few seconds while you drive around.

You've gathered all these images into a folder and have labelled them by drawing bounding boxes around every car you found. Here's an example of what your bounding boxes look like.

If you have 80 classes that you want YOLO to recognize, you can represent the class label 𝑐c either as an integer from 1 to 80, or as an 80-dimensional vector (with 80 numbers) one component of which is 1 and the rest of which are 0. The video lectures had used the latter representation; in this notebook, we will use both representations, depending on which is more convenient for a particular step.

In this exercise, you will learn how YOLO works, then apply it to car detection. Because the YOLO model is very computationally expensive to train, we will load pre-trained weights for you to use.

2 - YOLO（YOLO算法）

YOLO ("you only look once") is a popular algoritm because it achieves high accuracy while also being able to run in real-time. This algorithm "only looks once" at the image in the sense that it requires only one forward propagation pass through the network to make predictions. After non-max suppression, it then outputs recognized objects together with the bounding boxes.

2.1 - Model details（模型详述）

First things to know:

The input is a batch of images of shape (m, 608, 608, 3)
The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤,𝑐) as explained above. If you expand 𝑐 into an 80-dimensional vector, each bounding box is then represented by 85 numbers.

We will use 5 anchor boxes. So you can think of the YOLO architecture as the following: IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19, 5, 85).

Lets look in greater detail at what this encoding represents.

Figure 2: Encoding architecture for YOLO

If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting that object.

...

2.2 - Filtering with a threshold on class scores（筛选目标类）

You are going to apply a first filter by thresholding. You would like to get rid of any box for which the class "score" is less than a chosen threshold.

The model gives you a total of 19x19x5x85 numbers, with each box described by 85 numbers. It'll be convenient to rearrange the (19,19,5,85) (or (19,19,425)) dimensional tensor into the following variables:

...

2.3 - Non-max suppression（非极大值抑制）

Even after filtering by thresholding over the classes scores, you still end up a lot of overlapping boxes. A second filter for selecting the right boxes is called non-maximum suppression (NMS).

...

2.4 Wrapping up the filtering（使用滤波器）

It's time to implement a function taking the output of the deep CNN (the 19x19x5x85 dimensional encoding) and filtering through all the boxes using the functions you've just implemented.

...

Summary for YOLO:（YOLO算法小结）
- Input image (608, 608, 3)
- The input image goes through a CNN, resulting in a (19,19,5,85) dimensional output.
- After flattening the last two dimensions, the output is a volume of shape (19, 19, 425):

Each cell in a 19x19 grid over the input image gives 425 numbers.
425 = 5 x 85 because each cell contains predictions for 5 boxes, corresponding to 5 anchor boxes, as seen in lecture.
85 = 5 + 80 where 5 is because (𝑝𝑐,𝑏𝑥,𝑏𝑦,𝑏ℎ,𝑏𝑤) has 5 numbers, and and 80 is the number of classes we'd like to detect

- You then select only few boxes based on:

Score-thresholding: throw away boxes that have detected a class with a score less than the threshold
Non-max suppression: Compute the Intersection over Union and avoid selecting overlapping boxes

- This gives you YOLO's final output.

...

3 - Test YOLO pretrained model on images（使用图片测试YOLO预训练模型）

In this part, you are going to use a pretrained model and test it on the car detection dataset. As usual, you start by creating a session to start your graph. Run the following cell.

...

3.1 - Defining classes, anchors and image shape.（定义类、锚框和图像尺寸）

Recall that we are trying to detect 80 classes, and are using 5 anchor boxes. We have gathered the information about the 80 classes and 5 boxes in two files "coco_classes.txt" and "yolo_anchors.txt". Let's load these quantities into the model by running the next cell.

The car detection dataset has 720x1280 images, which we've pre-processed into 608x608 images.

...

3.2 - Loading a pretrained model（加载预训练模型）

Training a YOLO model takes a very long time and requires a fairly large dataset of labelled bounding boxes for a large range of target classes. You are going to load an existing pretrained Keras YOLO model stored in "yolo.h5". (These weights come from the official YOLO website, and were converted using a function written by Allan Zelener. References are at the end of this notebook. Technically, these are the parameters from the "YOLOv2" model, but we will more simply refer to it as "YOLO" in this notebook.) Run the cell below to load the model from this file.

...

3.3 - Convert output of the model to usable bounding box tensors（将模型输出转换为边界框张量）

The output of yolo_model is a (m, 19, 19, 5, 85) tensor that needs to pass through non-trivial processing and conversion. The following cell does that for you.

...

3.4 - Filtering boxes（筛选边界框）

yolo_outputs gave you all the predicted boxes of yolo_model in the correct format.

...

3.5 - Run the graph on an image（在图像上画出边界框）

Let the fun begin. You have created a (sess) graph that can be summarized as follows:

yolo_model.input is given to yolo_model. The model is used to compute the output yolo_model.output
yolo_model.output is processed by yolo_head. It gives you yolo_outputs
yolo_outputs goes through a filtering function, yolo_eval. It outputs your predictions: scores, boxes, classes

...

What you should remember: - YOLO is a state-of-the-art object detection model that is fast and accurate
- It runs an input image through a CNN which outputs a 19x19x5x85 dimensional volume.
- The encoding can be seen as a grid where each of the 19x19 cells contains information about 5 boxes.
- You filter through all the boxes using non-max suppression. Specifically:

- Score thresholding on the probability of detecting a class to keep only accurate (high probability) boxes
- Intersection over Union (IoU) thresholding to eliminate overlapping boxes

- Because training a YOLO model from randomly initialized weights is non-trivial and requires a large dataset as well as lot of computation, we used previously trained model parameters in this exercise. If you wish, you can also try fine-tuning the YOLO model with your own dataset, though this would be a fairly non-trivial exercise.。

References: The ideas presented in this notebook came primarily from the two YOLO papers. The implementation here also took significant inspiration and used many components from Allan Zelener's github repository. The pretrained weights used in this exercise came from the official YOLO website.

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi - You Only Look Once: Unified, Real-Time Object Detection (2015)
Joseph Redmon, Ali Farhadi - YOLO9000: Better, Faster, Stronger (2016)
Allan Zelener - YAD2K: Yet Another Darknet 2 Keras
The official YOLO website (https://pjreddie.com/darknet/yolo/)

作业完整截图：

预见未来to50

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
吴恩达深度学习学习笔记——C4W3——目标检测——作业——自动驾驶之车辆检测

这里主要梳理一下作业的主要内容和思路，完整作业文件可参考:https://github.com/pandenghuang/Andrew-Ng-Deep-Learning-notes/tree/master/assignments/C4W3/Assignment/Car_detection_for_Autonomous_Driving作业完整截图，参考本文结尾：作业完整截图。Autonomous driving - Car detection（自动驾驶之车辆检测）Welcome to your
复制链接

扫一扫