深入理解YOLO-1

最新推荐文章于 2021-11-09 22:11:03 发布

Always066

最新推荐文章于 2021-11-09 22:11:03 发布

阅读量391

点赞数

分类专栏： YOLO代码通读

本文链接：https://blog.csdn.net/weixin_43869493/article/details/106367017

版权

YOLO代码通读专栏收录该内容

1 篇文章 0 订阅

订阅专栏

深入理解YOLO loss-1

- 1.target
- 2.compose taget

本文是对于yolo loss的一个代码阅读+理解解析
所参考的代码为Pytorch版的yolo v3，附上 repository

1.target

首先，我们要决定一下，数据集在迭代时候会返回什么样子的target
这里就会出现一个棘手的问题，每个图片的物体数量不一样，我要怎么把他们用一个相同的纬度存储起来送进网络中进行前向传播

    def __getitem__(self, index):

        # ---------
        #  Image
        # ---------

        #获得一个训练图片的路径
        img_path = self.img_files[index % len(self.img_files)].rstrip()

        # Extract image as PyTorch tensor
        img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))

        # Handle images with less than three channels
        # 把灰度图变成彩色图像
        if len(img.shape) != 3:
            img = img.unsqueeze(0)
            img = img.expand((3, img.shape[1:]))

        _, h, w = img.shape
        h_factor, w_factor = (h, w) if self.normalized_labels else (1, 1)
        # Pad to square resolution
        img, pad = pad_to_square(img, 0)
        _, padded_h, padded_w = img.shape

        # ---------
        #  Label
        # ---------

        label_path = self.label_files[index % len(self.img_files)].rstrip()

        targets = None
        if os.path.exists(label_path):
            boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5))
            # Extract coordinates for unpadded + unscaled image
            x1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)
            y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)
            x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)
            y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)
            # Adjust for added padding
            x1 += pad[0]
            y1 += pad[2]
            x2 += pad[1]
            y2 += pad[3]
            # Returns (x, y, w, h)
            boxes[:, 1] = ((x1 + x2) / 2) / padded_w
            boxes[:, 2] = ((y1 + y2) / 2) / padded_h
            boxes[:, 3] *= w_factor / padded_w
            boxes[:, 4] *= h_factor / padded_h

            targets = torch.zeros((len(boxes), 6))
            targets[:, 1:] = boxes

        # Apply augmentations
        if self.augment:
            if np.random.random() < 0.5:
                img, targets = horisontal_flip(img, targets)

        #纬度梳理：img是一个图片，大小是正方形的
        #        target是[numboxes,6]
        return img_path, img, targets

通读以上以上代码，我们可以了解到，此项目的作者的思路是把所有box框都拼接在一起，同时拓展一个纬度在第一个轴，用于存储这个Box属于哪一个图片，但是由于dataset还没有bach这个打包，把相关处理放到了dataloader中

    def collate_fn(self, batch):
        #图片归一化，label加标签
        paths, imgs, targets = list(zip(*batch))
        # Remove empty placeholder targets  清空是空的盒子
        targets = [boxes for boxes in targets if boxes is not None]
        # Add sample index to targets
        for i, boxes in enumerate(targets): #之前在boxes前面空了一个位置，这里存放这张图片的索引
            boxes[:, 0] = i
        targets = torch.cat(targets, 0) #把一个batch里面所有的box拼接在一起，用最后一个纬度的第一个数表示这个是哪一张图片的box
        # Selects new image size every tenth batch
        if self.multiscale and self.batch_count % 10 == 0:  #在正常的图片范围内选择一个图片大小
            self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
        # Resize images to input shape
        imgs = torch.stack([resize(img, self.img_size) for img in imgs])
        self.batch_count += 1
        return paths, imgs, targets

2.compose taget

我们参考众多的CSDN博客，yolo是一个基于anchors的目标监测算法，所以我们的target必须要有anchors这个元素，同时要有检测框的x y w h conf class_conf这个几个元素，同时因为yolo是一个滑动窗口算法的升级，我们也需要解决grid也就是格子数这个因素，因此，我们可以清晰得出，yolo的target是这样的纬度：
[ batch_size,anchors,grid,grid,(x+y+w+h+conf+num_classes)]

所以，可以意识到最后一个纬度大小为5+num_classes

参考于上面提到的源码代码中，我们可以发现，构建的函数位于utiles文件中，在model里调用过,这里是一个yolo layer的输出，所以应该有三个这样的yolo layer,同时，每个yolo layer会分到3个anchors

iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(
                pred_boxes=pred_boxes,
                pred_cls=pred_cls,
                target=targets,
                anchors=self.scaled_anchors,
                ignore_thres=self.ignore_thres,
            )

首先我们确认一下这些输入变量的纬度

target :[batch_size,6]------6表示为(这个盒子的图片id，这个盒子的类别，x,y,w,h）
pred_boxes:[N,anchors,grid,grid,4]
pred_cls:[N,anchors,grid,grid,num_classes]
anchors:[3,2]-----这里用图片大小和grid大小算出一个缩放比例，乘在anchors上
ignore_threshold是一个阈值

跳转到这个函数的内部后，我们观察一下这个函数究竟做了那些操作

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
    #pred_boxes [N,anchors,grid,grid,(x+y+w+h)]
    #pred_cls   [N,anchors,grid,grid,num_classes]
    #targer [b.6]img_id,cls,x,y,w,h
    #anchors [3,2]
    #ignore_threshold scale
    ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor  #这里是把两个方法拿了出来
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor

    nB = pred_boxes.size(0) #num_samples
    nA = pred_boxes.size(1) #anchors
    nC = pred_cls.size(-1)  #num_classes
    nG = pred_boxes.size(2) #grid

    # Output tensors
    obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)      #1的部分表示该区域有物体
    noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)    #1的区域表示该区域没有物体
    class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
    iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
    tx = FloatTensor(nB, nA, nG, nG).fill_(0)
    ty = FloatTensor(nB, nA, nG, nG).fill_(0)
    tw = FloatTensor(nB, nA, nG, nG).fill_(0)
    th = FloatTensor(nB, nA, nG, nG).fill_(0)
    tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)

    # Convert to position relative to box
    target_boxes = target[:, 2:6] * nG
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:]
    # Get anchors with best iou
    ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors]) #计算所有boxes和anchors的iou
    best_ious, best_n = ious.max(0)     #求出每个物体的最大iou值和最大iou的anchors框
    # Separate target values
    b, target_labels = target[:, :2].long().t()
    gx, gy = gxy.t()    #所有框的x,y
    gw, gh = gwh.t()     #所有框的w,h
    gi, gj = gxy.long().t()     #求所有xy的向下取整的值
    # Set masks     设置mask
    obj_mask[b, best_n, gj, gi] = 1     #每个框和它最大iou的anchor的坐标位置
    noobj_mask[b, best_n, gj, gi] = 0

    # Set noobj mask to zero where iou exceeds ignore threshold
    for i, anchor_ious in enumerate(ious.t()):  #[3,b] =>[b,3]
        noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0

    # Coordinates
    tx[b, best_n, gj, gi] = gx - gx.floor()
    ty[b, best_n, gj, gi] = gy - gy.floor()
    # Width and height
    tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
    th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
    # One-hot encoding of label
    tcls[b, best_n, gj, gi, target_labels] = 1
    # Compute label correctness and iou at best anchor
    class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)

    tconf = obj_mask.float()
    return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf

这里就完成了target的compose过程，返回的信息有，每个样本的iou分数，类别mask，有物体存在的mask，无物体存在的mask，以及每个grid的x-y-w-h,还有类别信息的one-hot编码信息，还有box的置信度

Always066

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
深入理解YOLO-1

深入理解YOLO loss3.输入的label2.compose taget三级目录本文是对于yolo loss的一个代码阅读+理解解析所参考的代码为Pytorch版的yolo v3，附上repository3.输入的label2.compose taget我们参考众多的CSDN博客或者是原论文，yolo是一个基于anchors的目标监测算法，所以我们的target必须要有anchors这个元素，同时要有检测框的x y w h conf class_conf这个几个元素，同时因为yolo是一个滑动
复制链接

扫一扫

专栏目录