CNN第三周 Car detection with YOLO

姑苏落雨心中

已于 2022-08-29 20:50:09 修改

阅读量669

点赞数 2

分类专栏： CNN 文章标签： python 深度学习机器学习

于 2021-08-05 11:14:16 首次发布

本文链接：https://blog.csdn.net/a1137608040/article/details/119378449

版权

CNN 专栏收录该内容

8 篇文章 15 订阅

订阅专栏

第三周

Problem Statement

对于汽车检测系统来说，我们得到一些数据可以对这些数据做出的处理就是如下图：
在这里插入图片描述
如果想要识别m个分类，可以把分类标签c从1标记到m，或者把它变为m维的向量，后面使用预先训练好的权重来进行使用。

YOLO

YOLO是目前比较流行的算法，在预测时只需要进行一次前向传播，在使用NMS后，与边界框一起输出识别对象

Model Details

需要注意：

The input is a batch of images, and each image has the shape (m, 608, 608, 3)
The output is a list of bounding boxes along with the recognized classes. Each bounding box is represented by 6 numbers (pc,bx,by,bh,bw,c) as explained above. If you expand 𝑐 into an 80-dimensional vector, each bounding box is then represented by 85 numbers.
也就是说对于输入的图像为(m, 608, 608, 3),输出图像为(pc,bx,by,bh,bw,c),而且这个c的数量不固定

Anchor Boxes

anchor boxes的结构是:(m,nH,nW,anchors,classes)
YOLO架构是：(m,608,608,3)----->DEEP CNN----->ENCODING(m,19,19,5,85)

Encoding
在这里插入图片描述
也就是说：对于一(608,608,3)的图片经过CNN最后为(19,19,5,85)，对于19x19的分割每个单元格中的矩阵为(5,85)

Class score
对于每个单元格的每个锚框而言，我们将计算下列元素的乘积，并提取该框包含某一类的概率。
在这里插入图片描述
Visualizing classes

对于每个19x19的单元格，找寻最大的可能性值，在5个锚框和不同的类之间取最大值
根据单元格预测的最可能的对象来使用添加颜色的方式来标记单元格

上面的可视化不是YOLO算法预测的核心部分，可视化YOLO输出的方法是绘制他的输出的边界框

每个单元格会输出5个anchor box，所以要预测19x19x5=1805个anchor box
因此要舍弃低概率的anchor box

Filtering with a Threshold on Class Scores

现在我们要为阈值进行过滤，我们要去掉一些预测值低于预设值的锚框。模型共计会有19 × 19 × 5 × 85 个数字，每一个锚框由85个数字组成（其中80为分类剩下5个是置信度和框位置参数）
box_confidence:维度为(19,19,5,1)的张量
boxes:维度为(19,19,5,4)的张量，包含了anchor box的位置坐标
box_class_probs:维度为（19,19,5，80）的张量，包含了所有anchorbox检测对象的概率

Exercise 1 - yolo_filter_boxes

先要计算对象的可能性：对于(19,19,5,1)和(19,19,5,80)的乘积后为：(19,19,5,80),也就是说维度会统一向最大的地方走。
对于每一个anchorbox框需要找到：

索引的位置
最大的值
根据阈值去创建掩码，创建出来的数组元素是Boolean类型的，true表示保留框

def yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold = .6):
    """Filters YOLO boxes by thresholding on object and class confidence.
    通过对对象和类的置信度设置阈值来过滤YOLO框
    Arguments:
        boxes -- tensor of shape (19, 19, 5, 4) 指的是框的坐标
        box_confidence -- tensor of shape (19, 19, 5, 1) 
        box_class_probs -- tensor of shape (19, 19, 5, 80) 单元格中所有anchor box的对象
        threshold -- real value, if [ highest class probability score < threshold],
                     then get rid of the corresponding box  阈值

    Returns:
        scores -- tensor of shape (None,), containing the class probability score for selected boxes		包含了保留了的锚框的分类概率
        boxes -- tensor of shape (None, 4), containing (b_x, b_y, b_h, b_w) coordinates of selected boxes  包含了保留了的锚框的(b_x, b_y, b_h, b_w)
        classes -- tensor of shape (None,), containing the index of the class detected by the selected boxes 包含了保留了的锚框的索引

    Note: "None" is here because you don't know the exact number of selected boxes, as it depends on the threshold. 
    For example, the actual output size of scores would be (10,) if there are 10 boxes.
    """
    
    ### START CODE HERE
    # Step 1: Compute box scores  计算出anchor box 
    ##(≈ 1 line) 得出的分数是(19,19,5,80)
    box_scores = box_confidence * box_class_probs

    # Step 2: Find the box_classes using the max box_scores, keep track of the corresponding score
    # 选出p*c中分数最高的数和
    ##(≈ 2 lines)
    box_classes = tf.math.argmax(box_scores,axis=-1)   #最大数的索引
    box_class_scores = tf.math.reduce_max(box_scores,axis=-1)  #最大数
    
    # Step 3: Create a filtering mask based on "box_class_scores" by using "threshold". The mask should have the 
    # same dimension as box_class_scores, and be True for the boxes you want to keep (with probability >= threshold)
    ## (≈ 1 line) 创建出掩码 分数太低的设为false
    filtering_mask = (box_class_scores>=threshold)
    
    # Step 4: Apply the mask to box_class_scores, boxes and box_classes
    ## (≈ 3 lines)  通过掩码对scores boxes classes 进行操作
    scores = tf.boolean_mask(box_class_scores,filtering_mask)
    boxes = tf.boolean_mask(boxes,filtering_mask)
    classes = tf.boolean_mask(box_classes,filtering_mask)
    ### END CODE HERE
    
    # 最终得到的是三个是比较高的锚框
    return scores, boxes, classes

使用到了两个比较重要的函数tf.math.argmax()和tf.math.reduce_max()
tf.math.argmax():用于计算张量tensor沿着某一维度的最大值的索引,也就是求出最大元素的位置下标
在这里插入图片描述
结果
修改数组元素
结果
得到的是每一行中列最大位置的下标，如果后面axis的参数为0则是求行最大

tf.math.reduce_max():返回的数最大值的数
在这里插入图片描述
结果:

Non-max Suppression

尽管我们通过阈值过滤掉了一些低分类的锚框，但是结果还是不够准确，通过NMS使得最好的结果保留下来在这里插入图片描述
在NMS中使用到了一个非常重要的功能，叫做交并比(IoU)
在这里使用左上和右下角来定义方框

def iou(box1, box2):
    """Implement the intersection over union (IoU) between box1 and box2
    Arguments:
    box1 -- first box, list object with coordinates (box1_x1, box1_y1, box1_x2, box_1_y2)
    box2 -- second box, list object with coordinates (box2_x1, box2_y1, box2_x2, box2_y2)
    """
    # 分别是左上，右上，左下，右下
    (box1_x1, box1_y1, box1_x2, box1_y2) = box1
    (box2_x1, box2_y1, box2_x2, box2_y2) = box2

    ### START CODE HERE
    # Calculate the (yi1, xi1, yi2, xi2) coordinates of the intersection of box1 and box2. Calculate its Area.
    ##(≈ 7 lines)
    # 两个锚框的x 1 坐标的最大值
    xi1 = np.max([box1_x1,box2_x1])
    两个锚框的y 1 坐标的最大值
    yi1 = np.max([box1_y1,box2_y1])
    xi2 = np.min([box1_x2,box2_x2])
    yi2 = np.min([box1_y2,box2_y2])
    #得出的是相交的宽度
    inter_width = max((xi2-xi1),0)
    #相交的高度
    inter_height = max((yi2-yi1),0)
    # 相交部分的面积
    inter_area = inter_width * inter_height
    
    # Calculate the Union area by using Formula: Union(A,B) = A + B - Inter(A,B)
    ## (≈ 3 lines)
    # 第一个框的面积
    box1_area = max(box1_x2 - box1_x1,0) * max(box1_y2 - box1_y1,0)   # A
    box2_area = max(box2_x2 - box2_x1,0) * max(box2_y2 - box2_y1,0)         #B第二个
    # IoU中的分母
    union_area = box1_area + box2_area - inter_area
    
    # compute the IoU
    iou = inter_area / union_area
    ### END CODE HERE
    
    return iou

在实现上面代码中要注意:
在这里插入图片描述
在进行对比max()和np.max()函数时打印出来结果如下：

也就是说两个出来的结果一致,当我们修改代码中全部为max()函数时

这个时候结果也能通过

查看文档函数的内容
max():
np.max():
可以发现两个函数不是一个函数，对于np.max()来说里面的参数必需是array类型，这也就是为什么在使用np.max()时里面参数为[]数组的原因

YOLO Non-max Suppression

想要实现NMS的步骤：

选择分值高的锚框
计算与其他框的重叠部分，并删除与iou_threshold相比重叠的框。
返回第一步，直到不再有比当前选中的框得分更低的框。
这将删除与选定框有较大重叠的其他所有锚框，只有得分最高的锚框仍然存在。

def yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    """
    Applies Non-max suppression (NMS) to set of boxes
    实现NMS使得一个分类只有一个锚框
    
    Arguments:
    scores -- tensor of shape (None,), output of yolo_filter_boxes() 输入也是输出最后是得分
    boxes -- tensor of shape (None, 4), output of yolo_filter_boxes() that have been scaled to the image size (see later) 框的坐标
    classes -- tensor of shape (None,), output of yolo_filter_boxes() 分类最大分数的索引
    max_boxes -- integer, maximum number of predicted boxes you'd like 预测的锚框数量的最大值
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering IoU的阈值
    
    Returns:
    scores -- tensor of shape (, None), predicted score for each box
    boxes -- tensor of shape (4, None), predicted box coordinates
    classes -- tensor of shape (, None), predicted class for each box
    
    Note: The "None" dimension of the output tensors has obviously to be less than max_boxes. Note also that this
    function will transpose the shapes of scores, boxes, classes. This is made for convenience.
    """
    
    max_boxes_tensor = tf.Variable(max_boxes, dtype='int32')     # tensor to be used in tf.image.non_max_suppression()

    ### START CODE HERE
    # Use tf.image.non_max_suppression() to get the list of indices corresponding to boxes you keep
    ##(≈ 1 line) 使用tf.image.non_max_suppression()来获取与我们保留的框相对应的索引列表
    nms_indices = tf.image.non_max_suppression(boxes,scores,max_boxes,iou_threshold)
    
    # Use tf.gather() to select only nms_indices from scores, boxes and classes
    ##(≈ 3 lines) 利用 tf.gather()函数保留锚框
    scores = tf.gather(scores,nms_indices)
    boxes = tf.gather(boxes,nms_indices)
    classes = tf.gather(classes,nms_indices)
    ### END CODE HERE
    return scores, boxes, classes

以上代码为实现NMS保留最终锚框的函数，其中用到了tf.image.non_max_suppression()和tf.gather()
tf.image.non_max_suppression():按照参数scores的降序贪婪的选择边界框的子集。

tf.image.non_max_suppression(
    boxes,
    scores,
    max_output_size,
    iou_threshold=0.5,
    score_threshold=float('-inf'),
    name=None
)

我们在代码中打印出来:
在这里插入图片描述

最后得到的10个边框数据

tf.gather():主要是从第二个参数中的数据得到第一个参数中的数据,相当于通过第二个参数中保留的位置信息提取出第一个参数中的数据

Wrapping Up the Filtering

Exercise 4 - yolo_eval

通过阈值过滤和NMS使得我们的anchor box更加准备
boxes = yolo_boxes_to_corners(box_xy, box_wh)通过该函数可以将box_xy, box_wh转换为boxes内容
boxes = scale_boxes(boxes, image_shape)它将yolo锚框坐标（x，y，w，h）转换为角的坐标（x1，y1，x2，y2），以适应yolo_filter_boxes()的输入
下面是yolo_boxes_to_corners(box_xy, box_wh)函数

def yolo_boxes_to_corners(box_xy, box_wh):
    """Convert YOLO box predictions to bounding box corners."""
    box_mins = box_xy - (box_wh / 2.)
    box_maxes = box_xy + (box_wh / 2.)

    return tf.keras.backend.concatenate([
        box_mins[..., 1:2],  # y_min
        box_mins[..., 0:1],  # x_min
        box_maxes[..., 1:2],  # y_max
        box_maxes[..., 0:1]  # x_max
    ])

YOLO检测

def yolo_eval(yolo_outputs, image_shape = (720, 1280), max_boxes=10, score_threshold=.6, iou_threshold=.5):
    """
    Converts the output of YOLO encoding (a lot of boxes) to your predicted boxes along with their scores, box coordinates and classes.
    将YOLO编码的输出（很多锚框）转换为预测框以及它们的分数，框坐标和类。
    
    Arguments:
    yolo_outputs -- output of the encoding model (for image_shape of (608, 608, 3)), contains 4 tensors:
    
                    box_xy: tensor of shape (None, 19, 19, 5, 2) tensor类型，维度为(None, 19, 19, 5, 1)
                    box_wh: tensor of shape (None, 19, 19, 5, 2)tensor类型，维度为(None, 19, 19, 5, 2)
                    box_confidence: tensor of shape (None, 19, 19, 5, 1)tensor类型，维度为(None, 19, 19, 5, 2)
                    box_class_probs: tensor of shape (None, 19, 19, 5, 80) tensor类型，维度为(None, 19, 19, 5, 80)
    image_shape -- tensor of shape (2,) containing the input shape, in this notebook we use (608., 608.) (has to be float32 dtype) tensor类型，维度为（2,），包含了输入的图像的维度，这里是(608.,608.)
    max_boxes -- integer, maximum number of predicted boxes you'd like 整数，预测的锚框数量的最大值
    score_threshold -- real value, if [ highest class probability score < threshold], then get rid of the corresponding box 实数，可能性阈值。
    iou_threshold -- real value, "intersection over union" threshold used for NMS filtering 实数，交并比阈值。
    
    Returns:
    scores -- tensor of shape (None, ), predicted score for each box
    boxes -- tensor of shape (None, 4), predicted box coordinates
    classes -- tensor of shape (None,), predicted class for each box
    """
    
    ### START CODE HERE
    # Retrieve outputs of the YOLO model (≈1 line) 获取YOLO模型的输出
    box_xy, box_wh, box_confidence, box_class_probs = yolo_outputs
    
    # Convert boxes to be ready for filtering functions (convert boxes box_xy and box_wh to corner coordinates) 
    # 中心点转换为边角
    boxes = yolo_boxes_to_corners(box_xy, box_wh)
    print(boxes.shape)
    
    # yolo_non_max_suppression(scores, boxes, classes, max_boxes = 10, iou_threshold = 0.5):
    # yolo_filter_boxes(boxes, box_confidence, box_class_probs, threshold = .6)
    # Use one of the functions you've implemented to perform Score-filtering with a threshold of score_threshold (≈1 line)
    # 可信度分值过滤
    scores, boxes, classes = yolo_filter_boxes(boxes,box_confidence,box_class_probs,score_threshold)
    
    
    # Scale boxes back to original image shape.
    # 缩放锚框，以适应原始图像
    boxes = scale_boxes(boxes, image_shape)
    
    # Use one of the functions you've implemented to perform Non-max suppression with 
    # maximum number of boxes set to max_boxes and a threshold of iou_threshold (≈1 line)
    # 使用非最大值抑制NMS
    scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)
    ### END CODE HERE
    
    return scores, boxes, classes

代码在这里卡了好长时间的地方是yolo_filter_boxes在使用阈值过滤的时候写成了NMS过滤一直找不到错误，最后通过同学才发现。在做代码的时候是先做代码后分析的导致很多地方在做的时候不是非常清楚
使用下面的加载下载过的权重，进行训练

yolo_model = load_model("model_data/", compile=False)

预测

def predict(image_file):
    """
    Runs the graph to predict boxes for "image_file". Prints and plots the predictions.
    
    Arguments:
    image_file -- name of an image stored in the "images" folder.
    
    Returns:
    out_scores -- tensor of shape (None, ), scores of the predicted boxes
    out_boxes -- tensor of shape (None, 4), coordinates of the predicted boxes
    out_classes -- tensor of shape (None, ), class index of the predicted boxes
    
    Note: "None" actually represents the number of predicted boxes, it varies between 0 and max_boxes. 
    """

    # Preprocess your image
    image, image_data = preprocess_image("images/" + image_file, model_image_size = (608, 608))
    
    yolo_model_outputs = yolo_model(image_data)
    yolo_outputs = yolo_head(yolo_model_outputs, anchors, len(class_names))
    
    ## 使用之前的YOLO过滤得到 out_scores, out_boxes, out_classes
    out_scores, out_boxes, out_classes = yolo_eval(yolo_outputs, [image.size[1],  image.size[0]], 10, 0.3, 0.5)

    # Print predictions info
    print('Found {} boxes for {}'.format(len(out_boxes), "images/" + image_file))
    # Generate colors for drawing bounding boxes.
    colors = get_colors_for_classes(len(class_names))
    # Draw bounding boxes on the image file
    #draw_boxes2(image, out_scores, out_boxes, out_classes, class_names, colors, image_shape)
    draw_boxes(image, out_boxes, out_classes, class_names, out_scores)
    # Save the predicted bounding box on the image
    image.save(os.path.join("out", image_file), quality=100)
    # Display the results in the notebook
    output_image = Image.open(os.path.join("out", image_file))
    imshow(output_image)

    return out_scores, out_boxes, out_classes

姑苏落雨心中

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
CNN第三周 Car detection with YOLO

第三周Problem Statement对于汽车检测系统来说，我们得到一些数据可以对这些数据做出的处理就是如下图：如果想要识别m个分类，可以把分类标签c从1标记到m，或者把它变为m维的向量，后面使用预先训练好的权重来进行使用。YOLOYOLO是目前比较流行的算法，在预测时只需要进行一次前向传播，在使用NMS后，与边界框一起输出识别对象Model Details需要注意：The input is a batch of images, and each image has the shape
复制链接

扫一扫