SSD代码解读(三)——MultiboxLoss

SSD系列代码解读:(一) Prior Box

SSD系列代码解读:(二) Data Augmentation

SSD系列代码解读:(三) MultiboxLoss

本部分代码仍然是基于pytorch版本的,非官方的caffe实现。个人认为本部分是SSD中最为难懂的一部分。

在解释MultiboxLoss之前,必须要弄清楚SSD的检测head部分数据走向,也就是输出了什么。

for (x, l, c) in zip(pyramid_fea, self.loc, self.conf):
            loc.append(l(x).permute(0, 2, 3, 1).contiguous())
            conf.append(c(x).permute(0, 2, 3, 1).contiguous())

        loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1)
        conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1)
        if test:
            output = (
                loc.view(loc.size(0), -1, 4),  # loc preds
                self.softmax(conf.view(-1, self.num_classes)),  # conf preds
            )
        else:
            output = (
                loc.view(loc.size(0), -1, 4),
                conf.view(conf.size(0), -1, self.num_classes),
            )

贴上head部分的code。以VOC上的SSD300为例,对于conv4_3, loc分支输出 batch * (3*4) * 38 *38 的特征图F,便于后续统一数据(可参考我的这篇笔记), 将F转换为 batch * 38 *38 * (3*4),再reisze为 batch * ( 38 *38 * (3*4) ). 然后concat金字塔6个层的loc分支的所有输出,再resize, 得到 batch * (38 *38 * 3 + 19 * 19 * 6 +…… ) * 4的输出,即为 batch * num_priors * 4 的tensor. conf分支类似操作,只不过最后一维是num_cls.

贴MultiboxLoss代码:

class MultiBoxLoss(nn.Module):
    """SSD Weighted Loss Function
    Compute Targets:
        1) Produce Confidence Target Indices by matching  ground truth boxes
           with (default) 'priorboxes' that have jaccard index > threshold parameter
           (default threshold: 0.5).
        2) Produce localization target by 'encoding' variance into offsets of ground
           truth boxes and their matched  'priorboxes'.
        3) Hard negative mining to filter the excessive number of negative examples
           that comes with using a large number of default bounding boxes.
           (default negative:positive ratio 3:1)
    Objective Loss:
        L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
        Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
        weighted by α which is set to 1 by cross val.
        Args:
            c: class confidences,
            l: predicted boxes,
            g: ground truth boxes
            N: number of matched default boxes
        See: https://arxiv.org/pdf/1512.02325.pdf for more details.
    """


    def __init__(self, num_classes,overlap_thresh,prior_for_matching,bkg_label,neg_mining,neg_pos,neg_overlap,encode_target):
        super(MultiBoxLoss, self).__init__()
        self.num_classes = num_classes
        self.threshold = overlap_thresh # 0.5
        self.background_label = bkg_label # 0
        self.encode_target = encode_target 
        self.use_prior_for_matching  = prior_for_matching
        self.do_neg_mining = neg_mining
        self.negpos_ratio = neg_pos # 3
        self.neg_overlap = neg_overlap # 0.5
        self.variance = [0.1,0.2]

    def forward(self, predictions, priors, targets):
        """Multibox Loss
        Args:
            predictions (tuple): A tuple containing loc preds, conf preds,
            and prior boxes from SSD net.
                conf shape: torch.size(batch_size,num_priors,num_classes)
                loc shape: torch.size(batch_size,num_priors,4)
                priors shape: torch.size(num_priors,4)
            ground_truth (tensor): Ground truth boxes and labels for a batch,
                shape: [batch_size,num_objs,5] (last idx is the label).
        """
        # loc_data.shape = [batch, num_priors, 4]
        # conf_data.shape = [batch, num_priors, num_class]
        loc_data, conf_data = predictions 
        priors = priors # shape = [num_priors, 4] 注意priors存的是中心点和宽高,不是左上角和右下角的坐标
        num = loc_data.size(0) 
        num_priors = (priors.size(0))
        num_classes = self.num_classes

        # match priors (default boxes) and ground truth boxes
        # 这两个tensor是存储所有priors的分类和回归的目标
        loc_t = torch.Tensor(num, num_priors, 4)
        conf_t = torch.LongTensor(num, num_priors)
        for idx in range(num): # batch中的图片一张一张处理
            truths = targets[idx][:,:-1].data
            labels = targets[idx][:,-1].data
            defaults = priors.data
            #匹配部分,看下段代码注释
            match(self.threshold,truths,defaults,self.variance,labels,loc_t,conf_t,idx) 
        if GPU:
            loc_t = loc_t.cuda()
            conf_t = conf_t.cuda()
        # wrap targets
        loc_t = Variable(loc_t, requires_grad=False)
        conf_t = Variable(conf_t,requires_grad=False)

        pos = conf_t > 0 # shape = [batch, num_priors], 所有正的priors

        # Localization Loss (Smooth L1)
        # Shape: [batch,num_priors,4]
        pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
        loc_p = loc_data[pos_idx].view(-1,4)
        loc_t = loc_t[pos_idx].view(-1,4)
        loss_l = F.smooth_l1_loss(loc_p, loc_t, size_average=False)# 回归时,只对正的priors求梯度

        # Compute max conf across batch for hard negative mining
        # 跨batch计算loss_c,意味着难例挖掘是在一个batch里面计算的
        batch_conf = conf_data.view(-1,self.num_classes)
        loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1,1))

        # Hard Negative Mining
        # 去除正prior的loss_c,对负priors排序,进行难例挖掘
        loss_c[pos.view(-1,1)] = 0 # filter out pos boxes for now
        loss_c = loss_c.view(num, -1)
        _,loss_idx = loss_c.sort(1, descending=True)
        _,idx_rank = loss_idx.sort(1) # 对于每个GT, idx_rank是该位置的priors的loss得分排名
        num_pos = pos.long().sum(1,keepdim=True)
        num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1) # 确保3倍正priors的负priors不超过总的priors
        neg = idx_rank < num_neg.expand_as(idx_rank) # 得到用以梯度回传的负priors

        # Confidence Loss Including Positive and Negative Examples
        # 计算正priors和筛选出的负priors总共产生的loss
        pos_idx = pos.unsqueeze(2).expand_as(conf_data)
        neg_idx = neg.unsqueeze(2).expand_as(conf_data)
        conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes)
        targets_weighted = conf_t[(pos+neg).gt(0)]
        loss_c = F.cross_entropy(conf_p, targets_weighted, size_average=False)

        # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N

        N = max(num_pos.data.sum().float(), 1) # 一个batch中所有GT的数目
        loss_l/=N
        loss_c/=N
        return loss_l,loss_c

priors与GT的match过程代码

def point_form(boxes):
    """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
    representation for comparison to point form ground truth data.
    Args:
        boxes: (tensor) center-size default boxes from priorbox layers.
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, :2] - boxes[:, 2:]/2,     # xmin, ymin
                     boxes[:, :2] + boxes[:, 2:]/2), 1)  # xmax, ymax


def center_size(boxes):
    """ Convert prior_boxes to (cx, cy, w, h)
    representation for comparison to center-size form ground truth data.
    Args:
        boxes: (tensor) point_form boxes
    Return:
        boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
    """
    return torch.cat((boxes[:, 2:] + boxes[:, :2])/2,  # cx, cy
                     boxes[:, 2:] - boxes[:, :2], 1)  # w, h


def intersect(box_a, box_b):
    """ We resize both tensors to [A,B,2] without new malloc:
    [A,2] -> [A,1,2] -> [A,B,2]
    [B,2] -> [1,B,2] -> [A,B,2]
    Then we compute the area of intersect between box_a and box_b.
    Args:
      box_a: (tensor) bounding boxes, Shape: [A,4].
      box_b: (tensor) bounding boxes, Shape: [B,4].
    Return:
      (tensor) intersection area, Shape: [A,B].
    """
    A = box_a.size(0)
    B = box_b.size(0)
    max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2),
                       box_b[:, 2:].unsqueeze(0).expand(A, B, 2))
    min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2),
                       box_b[:, :2].unsqueeze(0).expand(A, B, 2))
    inter = torch.clamp((max_xy - min_xy), min=0)
    return inter[:, :, 0] * inter[:, :, 1]


def jaccard(box_a, box_b):
    """Compute the jaccard overlap of two sets of boxes.  The jaccard overlap
    is simply the intersection over union of two boxes.  Here we operate on
    ground truth boxes and default boxes.
    E.g.:
        A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
    Args:
        box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
        box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
    Return:
        jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)]
    """
    inter = intersect(box_a, box_b)
    area_a = ((box_a[:, 2]-box_a[:, 0]) *
              (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter)  # [A,B]
    area_b = ((box_b[:, 2]-box_b[:, 0]) *
              (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter)  # [A,B]
    union = area_a + area_b - inter
    return inter / union  # [A,B]

def matrix_iou(a,b):
    """
    return iou of a and b, numpy version for data augenmentation
    """
    lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
    rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])

    area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
    area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
    area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
    return area_i / (area_a[:, np.newaxis] + area_b - area_i)


def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx):
    """Match each prior box with the ground truth box of the highest jaccard
    overlap, encode the bounding boxes, then return the matched indices
    corresponding to both confidence and location preds.
    Args:
        threshold: (float) The overlap threshold used when mathing boxes.
        truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors].
        priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
        variances: (tensor) Variances corresponding to each prior coord,
            Shape: [num_priors, 4].
        labels: (tensor) All the class labels for the image, Shape: [num_obj].
        loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
        conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
        idx: (int) current batch index
    Return:
        The matched indices corresponding to 1)location and 2)confidence preds.
    """
    # jaccard index
    overlaps = jaccard(
        truths,
        point_form(priors) # 将priors转换为(x1,y1,x2,y2)格式
    ) # shape = [num_gt, num_priors],对每一个GT,得到每个prior与其的IoU
    # (Bipartite Matching)
    # [1,num_objects] best prior for each ground truth
    best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True)
    # [1,num_priors] best ground truth for each prior
    best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True)
    best_truth_idx.squeeze_(0)
    best_truth_overlap.squeeze_(0)
    best_prior_idx.squeeze_(1)
    best_prior_overlap.squeeze_(1)
    # 被GT匹配了的prior,直接指定为正prior,值置2是确保不会被筛选掉,其实0.5以上即可
    best_truth_overlap.index_fill_(0, best_prior_idx, 2)  # ensure best prior
    # TODO refactor: index  best_prior_idx with long tensor
    # ensure every gt matches with its prior of max overlap
    # 被指定的正priors,在idx张量中对应的将这些priors指定的GT序号写上
    for j in range(best_prior_idx.size(0)):
        best_truth_idx[best_prior_idx[j]] = j
    matches = truths[best_truth_idx]          # Shape: [num_priors,4]
    conf = labels[best_truth_idx]          # Shape: [num_priors]
    # 除了被GT指定的priors为正,IoU大于0.5的也被指定为正的priors
    conf[best_truth_overlap < threshold] = 0  # label as background
    # 编码offset,对每个priors,需要回归的值。
    loc = encode(matches, priors, variances)
    loc_t[idx] = loc    # [num_priors,4] encoded offsets to learn
    conf_t[idx] = conf  # [num_priors] top class label for each prior

def encode(matched, priors, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
    # encode variance
    g_cxcy /= (variances[0] * priors[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]


def encode_multi(matched, priors, offsets, variances):
    """Encode the variances from the priorbox layers into the ground truth boxes
    we have matched (based on jaccard overlap) with the prior boxes.
    Args:
        matched: (tensor) Coords of ground truth for each prior in point-form
            Shape: [num_priors, 4].
        priors: (tensor) Prior boxes in center-offset form
            Shape: [num_priors,4].
        variances: (list[float]) Variances of priorboxes
    Return:
        encoded boxes (tensor), Shape: [num_priors, 4]
    """

    # dist b/t match center and prior's center
    g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2] - offsets[:,:2]
    # encode variance
    #g_cxcy /= (variances[0] * priors[:, 2:])
    g_cxcy.div_(variances[0] * offsets[:, 2:])
    # match wh / prior wh
    g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
    g_wh = torch.log(g_wh) / variances[1]
    # return target for smooth_l1_loss
    return torch.cat([g_cxcy, g_wh], 1)  # [num_priors,4]

multiboxLoss整个流程如下:

1. 对batch中的每一张图片进行match。对每一个GT,计算所有的priors与其的IoU,与GT的IoU最大的priors直接视为正,此外,为了扩大正priors数目,与GT的IoU大于0.5的也视为正样本,其他为负样本(背景);接下来进行offset编码过程。这一步输出所有priors待回归offset和类别;

2. 回归部分,直接对所有的正priors进行smooth_l1 loss计算;

3. 分类部分, 先跨batch计算所有priors的loss_c,然后进行难例挖掘,选出1:3的正负priors比,然后将所有的正priors和选出的priors进行交叉熵损失函数计算。

4. 回归损失和分类损失求和,并除以一个批次中所有的GT的总数。

总的来说,匹配部分是比较复杂的,对于所有的priors,判断其正负的原则是依据IoU,从而直接得到分类部分的预测值(网络实际输出值需拟合的),对于回归部分,则需要进行encode过程,其中SSD在encode部分除了一个variance,为了缩小inliers的回归梯度,对于outliers,梯度是不变的,固定为1。参考 issue

 

对于SSD中VGG16的fc6的6空洞卷积和fc7的1*1卷积,论文中说这样的改动是为了加速。如果用原来的VGG16网络,在POOL5做s=2的下采样并做19*19分支的检测,后续接两个全连接层速度很慢,如果直接将全连接层改成卷积层,感受野会大大降低,因为fc的感受野是全部,效果一定会差。所以改成3*3带6空洞的卷积,此时相当于13*13的卷积,原全连接层只是作用在10*10的map上,所以这样设置感受野是匹配的。至于pool5 的S=2去除了,个人觉得原因有二,一是13*13的卷积在10*10上作用不好,二是保持高分辨率对检测好。

  • 3
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
提供的源码资源涵盖了Java应用等多个领域,每个领域都包含了丰富的实例和项目。这些源码都是基于各自平台的最新技术和标准编写,确保了在对应环境下能够无缝运行。同时,源码中配备了详细的注释和文档,帮助用户快速理解代码结构和实现逻辑。 适用人群: 适合毕业设计、课程设计作业。这些源码资源特别适合大学生群体。无论你是计算机相关专业的学生,还是对其他领域编程感兴趣的学生,这些资源都能为你提供宝贵的学习和实践机会。通过学习和运行这些源码,你可以掌握各平台开发的基础知识,提升编程能力和项目实战经验。 使用场景及目标: 在学习阶段,你可以利用这些源码资源进行课程实践、课外项目或毕业设计。通过分析和运行源码,你将深入了解各平台开发的技术细节和最佳实践,逐步培养起自己的项目开发和问题解决能力。此外,在求职或创业过程中,具备跨平台开发能力的大学生将更具竞争力。 其他说明: 为了确保源码资源的可运行性和易用性,特别注意了以下几点:首先,每份源码都提供了详细的运行环境和依赖说明,确保用户能够轻松搭建起开发环境;其次,源码中的注释和文档都非常完善,方便用户快速上手和理解代码;最后,我会定期更新这些源码资源,以适应各平台技术的最新发展和市场需求。 所有源码均经过严格测试,可以直接运行,可以放心下载使用。有任何使用问题欢迎随时与博主沟通,第一时间进行解答!
### 回答1: 金属缺陷检测是一种常见的工业应用,使用深度学习模型能够有效地提高检测效率和准确率。其中,SSD (Single Shot MultiBox Detector) 是一种常用的目标检测算法,下面是金属缺陷检测的 SSD 代码示例: ``` # 导入相关库 import torch import torch.nn as nn import torch.nn.functional as F import torchvision # 定义 SSD 模型 class SSD(nn.Module): def __init__(self, num_classes): super(SSD, self).__init__() # 基础网络部分 self.base_net = torchvision.models.vgg16(pretrained=True).features # 辅助网络部分 self.aux_net = nn.Sequential( nn.Conv2d(512, 1024, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(1024, 1024, kernel_size=1), nn.ReLU(inplace=True) ) # 物体分类部分 self.classification = nn.Sequential( nn.Conv2d(1024, 1024, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(1024, 1024, kernel_size=1), nn.ReLU(inplace=True), nn.Conv2d(1024, 4 * num_classes, kernel_size=3, padding=1) ) # 边界框回归部分 self.localization = nn.Sequential( nn.Conv2d(1024, 1024, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(1024, 1024, kernel_size=1), nn.ReLU(inplace=True), nn.Conv2d(1024, 4 * 4, kernel_size=3, padding=1) ) def forward(self, x): # 前向传播过程 sources = list() for i in range(23): x = self.base_net[i](x) sources.append(x) for i in range(23, len(self.base_net)): x = self.base_net[i](x) sources.append(x) x = self.aux_net(x) sources.append(x) detections = [] for source in sources: classification = self.classification(source) classification = classification.permute(0, 2, 3, 1).contiguous() classification = classification.view(classification.size(0), -1, 4) localization = self.localization(source) localization = localization.permute(0, 2, 3, 1).contiguous() localization = localization.view(localization.size(0), -1, 4) detection = torch.cat((localization, classification), dim=2) detections.append(detection) detections = torch.cat(detections, dim=1) return detections ``` 这个代码实现了一个简单的 SSD 模型,用于金属缺陷检测。其中,模型采用了 VGG16 作为基础网络部分,并添加了辅助网络、物体分类部分和边界框回归部分。在前向传播过程中,模型将不同层级的特 ### 回答2: 金属缺陷检测的SSD(单一样本检测)代码主要涉及以下几个步骤: 1. 数据准备:首先,我们需要收集一批金属缺陷和正常金属样本的图像数据。这些样本应该涵盖各种金属缺陷,如裂纹、气泡、孔洞等。然后,将这些图像数据按照一定的规则划分为训练集和测试集。 2. 模型搭建:使用深度学习框架,如TensorFlow或PyTorch,搭建一个基于SSD的金属缺陷检测模型。通过定义模型的结构、选择合适的卷积神经网络(CNN)作为基础网络,以及添加检测头等,实现模型的搭建。 3. 数据预处理:对于每个样本,我们需要将其图像数据转换为合适的输入格式,通常是将图像转换为张量,并进行归一化处理。这样可以确保模型能够接收和处理这些数据。 4. 模型训练:将训练集的数据输入模型中,利用反向传播算法和优化器对模型进行训练。在训练过程中,我们需要指定损失函数,如交叉熵损失函数,并设置适当的学习率、批次大小和迭代次数。通过迭代优化模型参数,使模型能够从数据中学习出金属缺陷的特征。 5. 模型评估和测试:使用测试集的数据对训练好的模型进行评估和测试。通过计算模型在测试集上的准确率、召回率和F1分数等指标,评估模型的性能。 6. 部署和应用:将训练好的模型部署到实际环境中,用于金属缺陷的实时检测。在实际应用中,我们可以将模型集成到现有的金属检测系统中,实现自动化和准确的金属缺陷检测。 需要注意的是,以上只是一个简单的金属缺陷检测SSD代码框架,具体实现涉及到更多的细节和技术细节,如数据增强、批量归一化、非极大值抑制(NMS)等,以及模型优化和改进。因此,实际的金属缺陷检测SSD代码会更为复杂和庞大。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值