YOLACT 笔记

最新推荐文章于 2023-09-15 16:32:42 发布

w55100

最新推荐文章于 2023-09-15 16:32:42 发布

阅读量1.4k

点赞数

分类专栏：草稿箱

本文链接：https://blog.csdn.net/w55100/article/details/104714564

版权

草稿箱专栏收录该内容

16 篇文章 2 订阅

订阅专栏

取yolact_basic_config的参数，构造一个精简版本。

https://github.com/w55100/YOLACT

1.backbone，basemodel里用的ResNet-101;

2.fpn,特征金字塔，经典结构了。fpn_channels=256;

3.protonet，生成proto-mask(遮罩原型）,mask_dim=32;

4.prediction_module，在DSSD的基础上略作修改;

5.detect

#想吐槽，正在看这篇呢，yolov4就出来了。。。刷新sota那么好玩吗！

该作者从SSD开源里面抄了数据增强Augmentation类，

这个类是基于opencv-python + numpy实现的。

现在这些基本功能其实可以用 torchvision.transformer实现。

只不过SSD是15年的文章那时候torch都没成气候，习惯一下老年人代码吧。

①功能函数
1.jaccard_numpy，计算交并比，基于numpy。

2.PrepareMasks，使用gt_boxes裁剪图像。

②基础变换

1.ConvertFromInts，从np.int矩阵变成np.float32。

2.ToCV2Image，convert tensor to cv2img。

3.ToTensor，convert cv2img to tensor。

4.ToAbsoluteCoords，从百分比(x1,y1,x2,y2)变成绝对坐标。

5.ToPercentCoords，从绝对坐标(x1,y1,x2,y2)变成百分比坐标。

6.ConvertColor，改变颜色空间,基于opencv-python。

③图像进阶处理

---形状处理

1.Expand，将原图片的高宽乘以ratio，将原图片放在扩张后图片的中间，其他位置像素值使用均值填充，相应的bbox也进行移动。

2.RandomSampleCrop，顾名思义了。

3.Resize，顾名思义。

4.Pad，顾名思义。

5.RandomMirror，随机水平镜像，对应torchvision.transformer里面的horizontalflip。

6.RandomFlip，随机垂直翻转。

7.RandomRot90，顾名思义。

---RGB空间颜色处理

1.RandomContrast，随机对比度。

2.RandomBrightness，随机亮度。

3.RandomLightingNoise，随机通道变换，以0.5的概率触发shuffle，每次shuffle从6种排列中抽取一种。

---HSV空间处理,请搭配ColorConvert食用

1.RandomHue，随机色调。

2.RandomSaturation，随机饱和度。

④ 图像高阶处理

1.PhotometricDistort，对上文的基本模块进行了一定组合。

2.BackboneTransform,对上文的基本模块进行了一定组合。

花絮

FPN网络里

什么是backward compatability???

 # For backward compatability, the conv layers are stored in reverse but the input and output is
        # given in the correct order. Thus, use j=-i-1 for the input and output and i for the conv layers.
        j = len(convouts)
        for lat_layer in self.lat_layers:
            j -= 1

Prediction Module里。

哈哈哈，这应该是YOLACT作者在吐槽DSSD作者的代码吧！

what is this ugly lambda????????

又找到一个吐槽SSD作者的地方

/utils/augmentation.py

全都被注释掉了，留下一句

why would you do this

手动脑补how dare you的表情+配音（狗头）

dont shuffle this!

三、Mutibox_Loss

损失函数方面，魔改了SSD的Multibox_loss

match函数里，把双向选择，改成了最大化利用gt_boxes。

因为python的传引用特性，直接原地修改，match函数不需要返回值。


def match(pos_thresh, neg_thresh, truths, priors, labels, crowd_boxes, loc_t, conf_t, idx_t, idx, loc_data):
    """注意这个函数跟SSD里那个match思路不一样。作者已经魔改过了。"""
    """Match each prior box with the ground truth box of the highest jaccard
    overlap, encode the bounding boxes, then return the matched indices
    corresponding to both confidence and location preds.
    Args:
        pos_thresh: (float) IoU > pos_thresh ==> positive.
        neg_thresh: (float) IoU < neg_thresh ==> negative.
        truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors]. #这里应该写错了，truths_shape=[num_obj,4],需要(x,y,w,h)
        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
        labels: (tensor) All the class labels for the image, Shape: [num_obj].
        crowd_boxes: (tensor) All the crowd box annotations or None if there are none.
        loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. Note: -1 means neutral.
        idx_t: (tensor) Tensor to be filled w/ the index of the matched gt box for each prior.
        idx: (int) current batch index.
        loc_data: (tensor) The predicted bbox regression coordinates for this batch.
    Return:
        The matched indices corresponding to 1)location and 2)confidence preds.
    """
    #match函数是对一张图片使用的。

    # 默认 False，直接简化
    # 传入为*x,y,w,h),point_form效果是变成(x1,y1,x2,y2)
    decoded_priors = point_form(priors)  # (num_priors,4)
    # cfg设置默认False,会调用jaccard，直接简化
    overlaps = jaccard(truths, decoded_priors)  # Size [num_objects, num_priors]

    # 对每个prior，找到一个IOU最高的gt_box
    # Size [num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0)

    # 好一个别浪费任何一个gt...节俭环保。
    # We want to ensure that each gt gets used at least once so that we don't
    # waste any training data. In order to do that, find the max overlap anchor
    # with each gt, and force that anchor to use that gt.
    # 只循环num_obj次，每次填充一行为-1。
    for _ in range(overlaps.size(0)):
        # 先找一个全局得分最高的gt_box，多个最大值取先遇到的。
        # Find j, the gt with the highest overlap with a prior
        # In effect, this will loop through overlaps.size(0) in a "smart" order,
        # always choosing the highest overlap first.
        best_prior_overlap, best_prior_idx = overlaps.max(1)
        j = best_prior_overlap.max(0)[1]

        # Find i, the highest overlap anchor with this gt
        i = best_prior_idx[j]

        # Set all other overlaps with i to be -1 so that no other gt uses it
        overlaps[:, i] = -1  # 这列都变-1，下次就不会选中这个prior
        # Set all other overlaps with j to be -1 so that this loop never uses j again
        overlaps[j, :] = -1  # 这行都变-1，下次循环就不会选中该行，也就不会选中该gt_box，实现每次选不同gt的效果。

        # Overwrite i's score to be 2 so it doesn't get thresholded ever
        best_truth_overlap[i] = 2
        # Set the gt to be used for i to be j, overwriting whatever was there
        best_truth_idx[i] = j

    # 结束循环后，若num_obj<num_priors,就有可能出现2个prior共用一个gt_box的情况。

    #为每个prior找到自己的gtbox
    matches = truths[best_truth_idx]  # Shape: [num_priors,4],注意是num_priors个gt_box坐标。
    conf = labels[best_truth_idx] + 1  # Shape: [num_priors]
    # 为什么+1 为什么+1 为什么+1 为什么+1 为什么+1??????????????

    conf[best_truth_overlap < pos_thresh] = -1  # label as neutral
    conf[best_truth_overlap < neg_thresh] = 0  # label as background

    # Deal with crowd annotations for COCO
    crowd_iou_threshold = 0.7  # Default in yolact1.0,与crowdbox的IOU大于阈值则视为中性。
    if crowd_boxes is not None and crowd_iou_threshold < 1:
        # Size [num_priors, num_crowds]
        crowd_overlaps = jaccard(decoded_priors, crowd_boxes, iscrowd=True)
        # Size [num_priors]
        best_crowd_overlap, best_crowd_idx = crowd_overlaps.max(1)
        # Set non-positives with crowd iou of over the threshold to be neutral.
        conf[(conf <= 0) & (best_crowd_overlap > crowd_iou_threshold)] = -1

    # 注意，这里传进去用的是(x,y,w,h)格式的priors，而不是decoded版本。
    # 返回的是num_priors*(x',y',w',h')
    loc = encode(matches, priors, use_yolo_regressors=False)
    loc_t[idx] = loc  # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior
    idx_t[idx] = best_truth_idx  # [num_priors] indices for lookup

w55100

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
YOLACT 笔记

1.backbone，basemodel里用的ResNet-101。2.fpn,特征金字塔，经典结构了。3.protonet，生成proto-mask(遮罩原型）4.prediction_layers，在DSSD的基础上略作修改。5.detectFPN网络里什么是backward compatability??? # For back...
复制链接

扫一扫

专栏目录