YOLO V3 -- 学习笔记

最新推荐文章于 2023-06-06 15:57:45 发布

DIAJEY

最新推荐文章于 2023-06-06 15:57:45 发布

阅读量288

点赞数

分类专栏： YOLO 文章标签：深度学习计算机视觉

本文链接：https://blog.csdn.net/DIAJEY/article/details/115557236

版权

YOLO 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

参考文章：
yolo系列之yolo v3【深度解析】-- 木盏
 目标检测–YOLOV3（附TensorFlow代码详解）

参考视频教程：
目标检测基础——YOLO系列模型（理论和代码复现）-- PULSE_

改进点（对比V2）

1.分类网络(darknet-19) --> darknet-53

darknet53结构中没有池化层和全连接层。因此在前向传播中，张量的尺寸变换通过改变卷积核的步长实现(如步长为2，即每次卷积后将图片边长缩小为原来的一般，因此V3的结果也是缩小到了原来的1/32)
在这里插入图片描述

darknet-53比起darknet-19虽然速度变慢了一些，但是准确率提升了不少，因此也可以看出yolov3是在保证实时性(FPS>36)的情况下，尽可能地提升分类地准确度。

2.借鉴了FPN的金字塔结构

之前说到V2对于小物体检测还存在一定不足，因此V3改变了对于多尺度特征提取的策略
在这里插入图片描述
使用上图的上采样方法可以实现多尺度的feature map提取，并且可以将不同尺度的特征进行concat(concat–张量拼接，会在相加的同时会扩充张量的维度，add则只是简单的相加，维度不变)，这样做的好处可以保证特征信息的完整性。

通过这种方法，最后会得到三种尺度的图像(13,26,52),13x13的每格像素信息较多，因此可以用来检测大目标，同理26x26用来检测中等大小的目标，52x52则用来检测小目标。从而达到大小目标一起检测的效果。

YOLO V3骨架

在这里插入图片描述
DBL:卷积+batch normalization + leaky relu yolov3的基本组件，也是最小的组件。Leaky Relu是ReLu的改进版本，对于赋值给一个很小的值，不至于为0，保留更多的信息

resn:n表示数字，即res_block中有几个res_unit。v3开始借鉴了ResNet的残差结构，通过这种结构，可以让网络结构更深。

OutPut

在这里插入图片描述
从上图可以看出，v3输出了3中不同尺度的feature map，即y1,y2,y3。通过这个类似FCN的结构，可以采用多尺度对不同size的目标进行检测（越精细，网格越多的图检测更精细的物体）

·深度255是因为每一种大小要预测3个box，并且每个盒子中有(x,y,w,h,confidence)五个参数以及要识别的80个类别概率(3x(5+80)=255)

通过上式可以看出v3每个格子可以检测三个盒子，且能检测80个类别

BBOX Prediction

从v2开始，对于盒子的预测使用了anchor机制，即通过锚点框对每个锚点格进行预测。而锚点框的具体位置信息通过预测相对位置计算
在这里插入图片描述

yolov2最终结果预测出(tx,ty,tw,th,to),然后通过公式转化为真实的位置值。

V3的改进在于，预测时采用了logistic regression。这种方法对于anchor包围的部分进行一个目标性评分，即该位置是目标的可能性。通过该方法可以去掉不必要的anchor，减少计算量（该步骤在predict之前，因此最后只计算可能性最大的anchor）

代码解读

先回顾一下V3的整体网络结构
在这里插入图片描述
· yolo_block：一个若干个DVL组成的模块，主要生产inputs和route两个结果，route用于配合下一个尺寸的特征一起计算，inputs则是输入检测层用于bbox的计算

·detect_layer：检测层，Anchor框会先通过kmeans对数据集中候选框进行聚类，得到5种常见尺寸，后面通过公式和output进行计算，得到真实位置和大小

#定义候选框，来自coco数据集
_ANCHORS = [(10., 13.), (16., 30.), (33., 23.), (30., 61.), (62., 45.), (59., 119.), (116., 90.), (156., 198.), (373., 326.)]

def _detection_layer(inputs,num_classes,anchors,img_size,data_format):
#通过原先定好的anchor和前面的ouputs输出获得锚点框的相对位置和到校
    print(inputs.get_shape())
    """
    得到通道数为num_anchors*(5+num_classes)，大小为h*w的预测结果
    """
    num_anchors = len(anchors) #候选框个数
    predictions = slim.conv2d(inputs,num_anchors*(5+num_classes),1,stride=1,
                            normalizer_fn = None, activation_fn=None,
                            biases_initializer = tf.zeros_initializer())
    shape = predictions.get_shape().as_list() # [batch, H, W, C]  C = num_anchors*(5+num_classes)
    print("shape",shape)#三个尺度的形状分别为：[1, 13, 13, 3*(5+c)]、[1, 26, 26, 3*(5+c)]、[1, 52, 52, 3*(5+c)]
    grid_size = shape[1:3] #去 NHWC中的HW
    dim = grid_size[0] * grid_size[1]#每个格子所包含的像素 h*W
    bbox_attrs = 5 + num_classes
    
    """
    因为最终要得到每个候选框里每个像素的bbox_attrs，所以先reshape成 num_anchors * dim, bbox_attrs格式，然后对bbox_attrs按照2,2,1,num_classes的格式进行单元属性的拆分
    """
    #把h和w展开成dim [batch,num_anchors * dim,5+num_classes]
    predictions = tf.reshape(predictions, [-1, num_anchors * dim, bbox_attrs])
    stride = (img_size[0] // grid_size[0], img_size[1] // grid_size[1])#缩放参数 32（416/13）
    anchors = [(a[0] / stride[0], a[1] / stride[1]) for a in anchors]#将候选框的尺寸同比例缩小
    #将含边框的单元属性拆分
    box_centers,box_sizes,confidence,classes = tf.split(predictions,[2,2,1,num_classes],axis=-1)
    
    """
    对拆分的单元属性进行一一求解，其中box_centers和box_sizes要映射到原始图上的值的大小，最后将求解的属性值再合并起来
    """   
    box_centers = tf.nn.sigmoid(box_centers)
    confidence = tf.nn.sigmoid(confidence)
    
    grid_x = tf.range(grid_size[0],dtype = tf.float32) #定义网格索引0,1,2....n ，shape = (1,13)
    grid_y = tf.range(grid_size[1],dtype = tf.float32) #定义网格索引0,1,2....m,  shape = (1,13)
    
    a, b = tf.meshgrid(grid_x, grid_y)#生成网格矩阵 a0，a1.。。an（共M行）  ， b0，b0，。。。b0（共n个），第二行为b1
    
    x_offset = tf.reshape(a,(-1,1))  
    y_offset = tf.reshape(b,(-1,1))
    
    x_y_offset = tf.concat([x_offset, y_offset], axis=-1)#连接----[dim,2]
    x_y_offset = tf.reshape(tf.tile(x_y_offset, [1, num_anchors]), [1, -1, 2])#按候选框的个数复制xy（【1，n】代表第0维一次，第1维n次）
    
    box_centers = box_centers + x_y_offset#box_centers为0-1，x_y为具体网格的索引，相加后，就是真实位置(0.1+4=4.1，第4个网格里0.1的偏移)
    box_centers = box_centers * stride#真实尺寸像素点

	#公式计算
    anchors = tf.tile(anchors, [dim, 1])
    box_sizes = tf.exp(box_sizes) * anchors#计算边长：hw
    box_sizes = box_sizes * stride#真实边长

    detections = tf.concat([box_centers, box_sizes, confidence], axis=-1)
    classes = tf.nn.sigmoid(classes)
    predictions = tf.concat([detections, classes], axis=-1)#将转化后的结果合起来
    print(predictions.get_shape())
    return predictions#返回真实预测值

加载预训练权重

#加载权重
def load_weights(var_list, weights_file):

    with open(weights_file, "rb") as fp:
        _ = np.fromfile(fp, dtype=np.int32, count=5)#跳过前5个int32
        weights = np.fromfile(fp, dtype=np.float32)

    ptr = 0
    i = 0
    assign_ops = []
    while i < len(var_list) - 1:
        var1 = var_list[i]
        var2 = var_list[i + 1]
        #找到卷积项
        if 'Conv' in var1.name.split('/')[-2]:
            # 找到BN参数项，如果有BN，就加载BN的参数
            if 'BatchNorm' in var2.name.split('/')[-2]:
                # 加载批量归一化参数
                gamma, beta, mean, var = var_list[i + 1:i + 5]
                batch_norm_vars = [beta, gamma, mean, var]
                for var in batch_norm_vars:
                    shape = var.shape.as_list()
                    num_params = np.prod(shape)
                    var_weights = weights[ptr:ptr + num_params].reshape(shape)
                    ptr += num_params
                    assign_ops.append(tf.assign(var, var_weights, validate_shape=True))

                i += 4#已经加载了4个变量，指针移动4
            elif 'Conv' in var2.name.split('/')[-2]:
            #没有BN就设定常规参数
                bias = var2
                bias_shape = bias.shape.as_list()
                bias_params = np.prod(bias_shape)
                bias_weights = weights[ptr:ptr + bias_params].reshape(bias_shape)
                ptr += bias_params
                assign_ops.append(tf.assign(bias, bias_weights, validate_shape=True))

                i += 1#移动指针

            shape = var1.shape.as_list()
            num_params = np.prod(shape)
            #加载权重
            var_weights = weights[ptr:ptr + num_params].reshape((shape[3], shape[2], shape[0], shape[1]))
            var_weights = np.transpose(var_weights, (2, 3, 1, 0))
            ptr += num_params
            assign_ops.append(tf.assign(var1, var_weights, validate_shape=True))
            i += 1

    return assign_ops

NMS对结果去重

V3因为会输出三种不同大小的特征图，且每尺度分配三个锚点框，因此输出的结果达到了(13x13x3)+(26x26x3)+(52x52x3)=10647种结果，因此NMS也需要应用到其中。
和V1的一直原理相似，先找到置信度最大的框，和剩余框计算重叠度，如果重叠度大于阈值就剔除掉下面的框，以此类推

#使用NMS方法，对结果去重
def non_max_suppression(predictions_with_boxes, confidence_threshold, iou_threshold=0.4):

    conf_mask = np.expand_dims((predictions_with_boxes[:, :, 4] > confidence_threshold), -1)
    predictions = predictions_with_boxes * conf_mask

    result = {}
    for i, image_pred in enumerate(predictions):
        shape = image_pred.shape
        print("shape1",shape)
        non_zero_idxs = np.nonzero(image_pred)
        image_pred = image_pred[non_zero_idxs[0]]
        print("shape2",image_pred.shape)
        image_pred = image_pred.reshape(-1, shape[-1])
        #输入锚点框数据

        bbox_attrs = image_pred[:, :5]
        classes = image_pred[:, 5:]
        classes = np.argmax(classes, axis=-1)

        unique_classes = list(set(classes.reshape(-1)))

		#将锚点框按分数从大到小排列
        for cls in unique_classes:
            cls_mask = classes == cls
            cls_boxes = bbox_attrs[np.nonzero(cls_mask)]
            cls_boxes = cls_boxes[cls_boxes[:, -1].argsort()[::-1]]
            cls_scores = cls_boxes[:, -1]
            cls_boxes = cls_boxes[:, :-1]

            while len(cls_boxes) > 0:
                box = cls_boxes[0]
                score = cls_scores[0]
                if not cls in result:
                    result[cls] = []
                result[cls].append((box, score))
                cls_boxes = cls_boxes[1:]
                ious = np.array([_iou(box, x) for x in cls_boxes])
                iou_mask = ious < iou_threshold
                cls_boxes = cls_boxes[np.nonzero(iou_mask)]
                cls_scores = cls_scores[np.nonzero(iou_mask)]

    return result

#JisuanIOU的算法
#定义函数计算两个框的内部重叠情况（IOU）box1，box2为左上、右下的坐标[x0, y0, x1, x2]
def _iou(box1, box2):

    b1_x0, b1_y0, b1_x1, b1_y1 = box1
    b2_x0, b2_y0, b2_x1, b2_y1 = box2

    int_x0 = max(b1_x0, b2_x0)
    int_y0 = max(b1_y0, b2_y0)
    int_x1 = min(b1_x1, b2_x1)
    int_y1 = min(b1_y1, b2_y1)

    int_area = (int_x1 - int_x0) * (int_y1 - int_y0)

    b1_area = (b1_x1 - b1_x0) * (b1_y1 - b1_y0)
    b2_area = (b2_x1 - b2_x0) * (b2_y1 - b2_y0)

    #分母加个1e-05，避免除数为 0
    iou = int_area / (b1_area + b2_area - int_area + 1e-05)#交并比
    return iou

将检测结果显示到图上

#将检测结果显示在图片上
def draw_boxes(boxes, img, cls_names, detection_size):
    draw = ImageDraw.Draw(img)

    for cls, bboxs in boxes.items():
        color = tuple(np.random.randint(0, 256, 3))
        for box, score in bboxs:
            box = convert_to_original_size(box, np.array(detection_size), np.array(img.size))  #转化到原图大小
            draw.rectangle(box, outline=color)
            draw.text(box[:2], '{} {:.2f}%'.format(cls_names[cls], score * 100), fill=color)
            print('{} {:.2f}%'.format(cls_names[cls], score * 100),box[:2])

def convert_to_original_size(box, size, original_size):
    ratio = original_size / size
    box = box.reshape(2, 2) * ratio
    return list(box.reshape(-1))

损失函数

import tensorflow as tf

def _create_mesh_xy(batch_size, grid_h, grid_w, n_box):#生成带序号的网格
    mesh_x = tf.cast(tf.reshape(tf.tile(tf.range(grid_w), [grid_h]), (1, grid_h, grid_w, 1, 1)),tf.float32)
    mesh_y = tf.transpose(mesh_x, (0,2,1,3,4))
    mesh_xy = tf.tile(tf.concat([mesh_x,mesh_y],-1), [batch_size, 1, 1, n_box, 1])
    return mesh_xy

def adjust_pred_tensor(y_pred):#将网格信息融入坐标，置信度做sigmoid。并重新组合
    grid_offset = _create_mesh_xy(*y_pred.shape[:4])
    pred_xy    = grid_offset + tf.sigmoid(y_pred[..., :2])  #计算该尺度矩阵上的坐标sigma(t_xy) + c_xy
    pred_wh    = y_pred[..., 2:4]                           #取出预测物体的尺寸t_wh
    pred_conf  = tf.sigmoid(y_pred[..., 4])                 #对分类概率（置信度）做sigmoid转化
    pred_classes = y_pred[..., 5:]                          #取出分类结果
    #重新组合
    preds = tf.concat([pred_xy, pred_wh, tf.expand_dims(pred_conf, axis=-1), pred_classes], axis=-1)
    return preds

#生成一个矩阵。每个格子里放有3个候选框
def _create_mesh_anchor(anchors, batch_size, grid_h, grid_w, n_box):
    mesh_anchor = tf.tile(anchors, [batch_size*grid_h*grid_w])
    mesh_anchor = tf.reshape(mesh_anchor, [batch_size, grid_h, grid_w, n_box, 2])#每个候选框有2个值
    mesh_anchor = tf.cast(mesh_anchor, tf.float32)
    return mesh_anchor

def conf_delta_tensor(y_true, y_pred, anchors, ignore_thresh):

    pred_box_xy, pred_box_wh, pred_box_conf = y_pred[..., :2], y_pred[..., 2:4], y_pred[..., 4]
    #带有候选框的格子矩阵
    anchor_grid = _create_mesh_anchor(anchors, *y_pred.shape[:4])#y_pred.shape为（2，13，13，3，15）
    true_wh = y_true[:,:,:,:,2:4]
    true_wh = anchor_grid * tf.exp(true_wh)
    true_wh = true_wh * tf.expand_dims(y_true[:,:,:,:,4], 4)#还原真实尺寸，高和宽
    anchors_ = tf.constant(anchors, dtype='float', shape=[1,1,1,y_pred.shape[3],2])#y_pred.shape[3]为候选框个数
    true_xy = y_true[..., 0:2]#获取中心点
    true_wh_half = true_wh / 2.
    true_mins    = true_xy - true_wh_half#计算起始坐标
    true_maxes   = true_xy + true_wh_half#计算尾部坐标

    pred_xy = pred_box_xy
    pred_wh = tf.exp(pred_box_wh) * anchors_

    pred_wh_half = pred_wh / 2.
    pred_mins    = pred_xy - pred_wh_half#计算起始坐标
    pred_maxes   = pred_xy + pred_wh_half#计算尾部坐标

    intersect_mins  = tf.maximum(pred_mins,  true_mins)
    intersect_maxes = tf.minimum(pred_maxes, true_maxes)

    #计算重叠面积
    intersect_wh    = tf.maximum(intersect_maxes - intersect_mins, 0.)
    intersect_areas = intersect_wh[..., 0] * intersect_wh[..., 1]

    true_areas = true_wh[..., 0] * true_wh[..., 1]
    pred_areas = pred_wh[..., 0] * pred_wh[..., 1]
    #计算不重叠面积
    union_areas = pred_areas + true_areas - intersect_areas
    best_ious  = tf.truediv(intersect_areas, union_areas)#计算iou
    #ios小于阈值将作为负向的loss
    conf_delta = pred_box_conf * tf.cast(best_ious < ignore_thresh,tf.float32)
    return conf_delta

def wh_scale_tensor(true_box_wh, anchors, image_size):
    image_size_  = tf.reshape(tf.cast(image_size, tf.float32), [1,1,1,1,2])
    anchors_ = tf.constant(anchors, dtype='float', shape=[1,1,1,3,2])

    #计算高和宽的缩放范围
    wh_scale = tf.exp(true_box_wh) * anchors_ / image_size_
    #物体尺寸占整个图片的面积比
    wh_scale = tf.expand_dims(2 - wh_scale[..., 0] * wh_scale[..., 1], axis=4)
    return wh_scale

#位置loss为box之差乘缩放比，所得的结果，再进行平方求和
def loss_coord_tensor(object_mask, pred_box, true_box, wh_scale, xywh_scale):
    xy_delta    = object_mask   * (pred_box-true_box) * wh_scale * xywh_scale
    loss_xy    = tf.reduce_sum(tf.square(xy_delta),       list(range(1,5)))#按照1，2，3，4（xyhw）规约求和
    return loss_xy

def loss_conf_tensor(object_mask, pred_box_conf, true_box_conf, obj_scale, noobj_scale, conf_delta):
    object_mask_ = tf.squeeze(object_mask, axis=-1)
    conf_delta  = object_mask_ * (pred_box_conf-true_box_conf) * obj_scale + (1-object_mask_) * conf_delta * noobj_scale
    loss_conf  = tf.reduce_sum(tf.square(conf_delta),     list(range(1,4)))#按照1，2，3（候选框）归约求和，0为批次
    return loss_conf


def loss_class_tensor(object_mask, pred_box_class, true_box_class, class_scale):
    true_box_class_ = tf.cast(true_box_class, tf.int64)
    class_delta = object_mask * \
                  tf.expand_dims(tf.nn.softmax_cross_entropy_with_logits_v2(labels=true_box_class_, logits=pred_box_class), 4) * \
                  class_scale

    loss_class = tf.reduce_sum(class_delta,               list(range(1,5)))
    return loss_class

ignore_thresh=0.5
grid_scale=1
obj_scale=5
noobj_scale=1
xywh_scale=1
class_scale=1
def lossCalculator(y_true, y_pred, anchors,image_size): #image_size【h,w】
    y_pred = tf.reshape(y_pred, y_true.shape) #(2, 13, 13, 3, 15)

    object_mask = tf.expand_dims(y_true[..., 4], 4)#(2, 13, 13, 3, 1)
    preds = adjust_pred_tensor(y_pred)#将box与置信度数值变化后重新组合

    conf_delta = conf_delta_tensor(y_true, preds, anchors, ignore_thresh)
    wh_scale =  wh_scale_tensor(y_true[..., 2:4], anchors, image_size)

    loss_box = loss_coord_tensor(object_mask, preds[..., :4], y_true[..., :4], wh_scale, xywh_scale)
    loss_conf = loss_conf_tensor(object_mask, preds[..., 4], y_true[..., 4], obj_scale, noobj_scale, conf_delta)
    loss_class = loss_class_tensor(object_mask, preds[..., 5:], y_true[..., 5:], class_scale)
    loss = loss_box + loss_conf + loss_class
    return loss*grid_scale

def loss_fn(list_y_trues, list_y_preds,anchors,image_size):
    inputanchors = [anchors[12:],anchors[6:12],anchors[:6]]
    losses = [lossCalculator(list_y_trues[i], list_y_preds[i], inputanchors[i],image_size) for i in range(len(list_y_trues)) ]
    return tf.sqrt(tf.reduce_sum(losses)) #将三个矩阵的loss相加再开平方

总结

YOLO V3对比YOLO V2来说速度稍微慢了一些，但是精度有所提高，在保证实时性要求的同时，检测准确率提高不少。并且TOLO V3的结构清晰，可以很快上手

DIAJEY

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
YOLO V3 -- 学习笔记

参考文章： yolo系列之yolo v3【深度解析】-- 木盏参考视频教程：目标检测基础——YOLO系列模型（理论和代码复现）-- PULSE_改进点（对比V2）1.分类网络(darknet-19) --> darknet-53darknet53结构中没有池化层和全连接层。因此在前向传播中，张量的尺寸变换通过改变卷积核的步长实现(如步长为2，即每次卷积后将图片边长缩小为原来的一般，因此V3的结果也是缩小到了原来的1/32)darknet-53比起darknet-19虽然速度变慢了一些，但
复制链接

扫一扫

专栏目录