Fairmot理解与MindSpore框架下的实现

最新推荐文章于 2023-08-17 17:08:27 发布

qq_42878061

最新推荐文章于 2023-08-17 17:08:27 发布

阅读量243

点赞数

文章标签：深度学习

本文链接：https://blog.csdn.net/qq_42878061/article/details/129015485

版权

1.Fairmot介绍

FairMOT是由华中科技大学和微软亚洲研究院提出的多目标跟踪（MOT）baseline，作者分析现存one-shot目标追踪算法的问题所在，提出了三个观点：
（1）anchors对于Re-ID并不友好，应该采用anchor-free算法。
（2）多层特征的融合。
（3）对于one-shot方法，Re-ID的特征向量采用低维度更好。
Fairmot网络架构在MOT15、MOT16、MOT17、MOT20等数据集上以30fps的帧数达到了目前的SOTA水平。
多目标跟踪一直是计算机视觉的一个长期目标，目标是估计视频中多个目标的轨迹，该任务的成功解决将有利于许多应用，如动作识别、运动视频分析、老年护理和人机交互。现存的SOTA方法当中大部分都是采用two-step方法两步走：
（1）通过目标检测算法检测到目标。
（2）再经过Re-ID模型进行匹配并根据特征上定义的特定度量将其链接到一个现有的轨迹。
尽管随着近年来目标检测算法与Re-ID的发展，two-step方法在目标跟踪上也有明显的性能提升，但是two-step方法不会共享检测算法与Re-ID的特征图，所以其速度很慢，很难在视频速率下进行推理。随着two-step方法的成熟，更多的研究人员开始研究同时检测目标和学习Re-ID特征的one-shot算法，当特征图在目标检测与Re-ID之间共享之后，可以大大的减少推理时间，但在精度上就会比two-step方法低很多。所以作者针对one-shot方法进行分析，提出了上述三个方面的因素。
一些SOTA的跟踪算法通常是two-step算法，他们将检测目标和Re-ID分成了两个任务：
（1）首先通过检测算法获取到物体的位置(预测框)。
（2）将预测的物体裁剪下来进行缩放传入身份特征提取器来获取Re-ID特征，连接框形成多条轨迹。
连接框形成轨迹的标准做法就是：根据Re-ID特征和框的IOU来计算一个代价矩阵，再利用卡尔曼滤波和匈牙利算法实现连接轨迹的任务。有一小部分研究使用了更复杂的关联策略，如群体模型和RNNs。
two-step方法的好处就是，可以在两个任务当中分别使用合适的模型，并且可以将预测的框进行裁剪和缩放传入Re-ID特征提取器当中，有助于处理对象比例变化。并且跟踪效果也很好，但是速度很慢，难以以视频速率进行推理。
One-shot方法核心思想是在一个网络中同时完成目标检测和身份嵌入(Re-ID feature)，通过共享大部分计算量来减少推理时间。
（1）Track-RCNN通过添加一个Re-ID head的部分为每个候选区域来回归框和Re-ID的部分。
（2）JDE则是实现在YOLOV3框架的基础上并实现了视频速率的推理。
然而，单步one-shot方法的跟踪精度往往低于two-step跟踪方法。论文发现这是因为学习的ReID特性不是最优的，这导致了大量的ID切换。

2.细节

Fairmot框架如下，首先将输入图像送入编码器-解码器网络，以提取高分辨率特征图（步长=4）；然后添加两个简单的并行 head，分别预测边界框和 Re-ID 特征；最后提取预测目标中心处的特征进行边界框时序联结。

在这里插入图片描述

采用 anchor-free 目标检测方法，估计高分辨率特征图上的目标中心。去掉锚点这一操作可以缓解歧义问题，使用高分辨率特征图可以帮助 Re-ID 特征与目标中心更好地对齐。
添加并行分支来估计像素级 Re-ID 特征，这类特征用于预测目标的 ID。具体而言，学习既能减少计算时间又能提升特征匹配稳健性的低维 Re-ID 特征。在这一步中，Fairmot用深层聚合算子（Deep Layer Aggregation，DLA）来改进主干网络 ResNet-34 ，从而融合来自多个层的特征，处理不同尺度的目标。

主干网络

采用ResNet-34 作为主干网络，以便在准确性和速度之间取得良好的平衡。为了适应不同规模的对象，将深层聚合（DLA）的一种变体应用于主干网络。
与原始DLA 不同，它在低层聚合和低层聚合之间具有更多的跳跃连接，类似于特征金字塔网络（FPN）。此外，上采样模块中的所有卷积层都由可变形的卷积层代替，以便它们可以根据对象的尺寸和姿势动态调整感受野。这些修改也有助于减轻对齐问题。

物体检测分支

Fairmot将目标检测视为高分辨率特征图上基于中心的包围盒回归任务。特别是将三个并行回归头（regression heads）附加到主干网络以分别估计热图，对象中心偏移和边界框大小。通过对主干网络的输出特征图应用3×3卷积（具有256个通道）来实现每个回归头（head），然后通过1×1卷积层生成最终目标。
（1）Heatmap Head：该head负责估计对象中心的位置。这里采用基于热图的表示法，热图的尺寸为1×H×W。随着热图中位置和对象中心之间的距离，响应呈指数衰减。
（2）Center Offset Head：该head负责更精确地定位对象。ReID功能与对象中心的对齐精准度对于性能至关重要。
（3）Box Size Head：该部分负责估计每个锚点位置的目标边界框的高度和宽度，与Re-ID功能没有直接关系，但是定位精度将影响对象检测性能的评估。

ID嵌入分支 Identity Embedding Branch

id嵌入分支的目标是生成可以区分不同对象的特征。理想情况下，不同对象之间的距离应大于同一对象之间的距离。为了实现该目标，Fairmot在主干特征之上应用了具有128个内核的卷积层，以提取每个位置的身份嵌入特征。

损失函数

（1）Heatmap loss:Fairmot按照高斯分布将物体的中心映射到了heatmap上，然后使用变形的focal loss进行预测的heatmap和实际真实的heatmap损失函数的求解，公式如下：
$L_{heatmap}=-\frac{1}{N}\sum_{xy}{\left\{\begin{array}{c} \left(1-\hat{M}_{xy}\right)^{\alpha}\log\left(\hat{M}_{xy}\right),if\,\,M_{xy}=1\\ \left(1-\hat{M}_{xy}\right)^{\beta}\left(\hat{M}_{xy}\right)^{\alpha}\log\left(1-\hat{M}_{xy}\right),otherwise\\ \end{array}\right.}$
$\hat{M}_{xy}$ 是预测的heatmap特征图， $M_xy$ 是heatmap的ground-truth, $N$ 为一个图中物体总数量。
（2）Offset and Size loss:Fairmot用了两个L1损失就实现了Offset和Size损失：
$L_{box}=\sum_{i=1}^N{\left\|o^i-\hat{o}^i-\hat{o}^i\right\|_1-\left\|S^i-\hat{S}^i\right\|_1}$
其中， $N$ 为一个图中物体总数量， $S$ 表示Size 框的大小， $O$ 表示Offset 中心点的偏差。
（3）Identity Embedding Loss：FairMOT中的Embedding也是需要借助分类（按照物体ID为不同物体分配不同的类别）进行学习的。其中分类用到softmax损失：
$L_{identity}=-\sum_{i=1}^N{\sum_{k=1}^K{L^i\left(k\right)\log\left(p\left(k\right)\right)}}$
其中， $N$ 为一个图中物体总数量， $K$ 是类别数量。即，这部分需要对图片中每个物体进行分类识别，这里分类识别是具体认识到是指那一个物体，具有相同身份的所有对象实例都被视为一个类。

from typing import Tuple, Union
import numpy as np
from mindspore import nn
from mindspore import ops

from src.utils.check_param import Rel, Validator
from src.utils.class_factory import ClassFactory, ModuleType
from src.models.layers import DeformConv2d, FairMOTMultiHead

class BasicBlock(nn.Cell):
“”"
DLA中的残差快
“”"

def __init__(self, cin, cout, stride=1, dilation=1):
    super(BasicBlock, self).__init__()
    self.conv_bn_act=nn.Conv2dBnAct(cin,cout,kernel_size=3, stride=stride, pad_mode='pad', padding=dilation, has_bias=False,         dilation=dilation, has_bn=True, momentum=0.9,ctivation='relu', after_fake=False)
    self.conv_bn = nn.Conv2dBnAct(cout, cout, kernel_size=3, stride=1, pad_mode='same',
                                  has_bias=False, dilation=dilation, has_bn=True,
                                  momentum=0.9, activation=None)
    self.relu = ops.ReLU()

def construct(self, x, residual=None):
    if residual is None:
        residual = x
    out = self.conv_bn_act(x)
    out = self.conv_bn(out)
    out += residual
    out = self.relu(out)
    return out

class Root(nn.Cell):
“”"
获取HDA节点
“”"

def __init__(self, in_channels, out_channels, kernel_size, residual):
    super(Root, self).__init__()
    self.conv = nn.Conv2d(in_channels, out_channels, 1, stride=1, has_bias=False,
                          pad_mode='pad', padding=(kernel_size - 1) // 2)
    self.bn = nn.BatchNorm2d(out_channels)
    self.relu = ops.ReLU()
    self.residual = residual
    self.cat = ops.Concat(axis=1)

def construct(self, x):
    children = x
    x = self.conv(self.cat(x))
    x = self.bn(x)
    if self.residual:
        x += children[0]
    x = self.relu(x)
    return x

class Tree(nn.Cell):
“”"
构建深度聚合网络.
“”"

def __init__(self, levels, block, in_channels, out_channels, stride=1, level_root=False,
             root_dim=0, root_kernel_size=1, dilation=1, root_residual=False):
    super(Tree, self).__init__()
    self.levels = levels
    if root_dim == 0:
        root_dim = 2 * out_channels
    if level_root:
        root_dim += in_channels
    if self.levels == 1:
        self.tree1 = block(in_channels, out_channels, stride, dilation=dilation)
        self.tree2 = block(out_channels, out_channels, 1, dilation=dilation)
    else:
        self.tree1 = Tree(levels - 1, block, in_channels, out_channels, stride, root_dim=0,
                          root_kernel_size=root_kernel_size,dilation=dilation, root_residual=root_residual)
        self.tree2 = Tree(levels - 1, block, out_channels, out_channels, root_dim=root_dim +out_channels,root_kernel_size=root_kernel_size, dilation=dilation, root_residual=root_residual)
    if self.levels == 1:
        self.root = Root(root_dim, out_channels, root_kernel_size, root_residual)
    self.level_root = level_root
    self.root_dim = root_dim
    self.downsample = None
    self.project = None
    if stride > 1:
        self.downsample = nn.MaxPool2d(stride, stride=stride)
    if in_channels != out_channels:
        self.project = nn.Conv2dBnAct(in_channels, out_channels, kernel_size=1, stride=1, pad_mode='same',has_bias=False, has_bn=True, momentum=0.9,
                    activation=None, after_fake=False)

def construct(self, x, residual=None, children=None):

    children = () if children is None else children
    bottom = self.downsample(x) if self.downsample else x
    residual = self.project(bottom) if self.project else bottom
    if self.level_root:
        children += (bottom,)
    x1 = self.tree1(x, residual)
    if self.levels == 1:
        x2 = self.tree2(x1)
        ida_node = (x2, x1) + children
        x = self.root(ida_node)
    else:
        children += (x1,)
        x = self.tree2(x1, children=children)
    return x

class DLA34(nn.Cell):
“”"
构建下采样深度聚合网络
“”"

def __init__(self, levels, channels, block=None, residual_root=False):
    super(DLA34, self).__init__()
    self.channels = channels
    self.base_layer=nn.Conv2dBnAct(3, channels[0], kernel_size=7, stride=1, pad_mode='same',has_bias=False, has_bn=True, momentum=0.9, activation='relu', after_fake=False)
    self.level0 = self._make_conv_level(channels[0], channels[0], levels[0])
    self.level1 = self._make_conv_level(channels[0], channels[1], levels[1], stride=2)
    self.level2 = Tree(levels[2], block, channels[1], channels[2], 2,
                       level_root=False, root_residual=residual_root)
    self.level3 = Tree(levels[3], block, channels[2], channels[3], 2,
                       level_root=True, root_residual=residual_root)
    self.level4 = Tree(levels[4], block, channels[3], channels[4], 2,
                       level_root=True, root_residual=residual_root)
    self.level5 = Tree(levels[5], block, channels[4], channels[5], 2,
                       level_root=True, root_residual=residual_root)
    self.dla_fn = [self.level0, self.level1, self.level2, self.level3, self.level4, self.level5]

def _make_conv_level(self, cin, cout, convs, stride=1, dilation=1):
    modules = []
    for i in range(convs):
        modules.append(nn.Conv2dBnAct(cin, cout, kernel_size=3, stride=stride if i == 0 else 1, pad_mode='pad', padding=dilation, has_bias=False, dilation=dilation, has_bn=True, momentum=0.9, activation='relu', after_fake=False))
        cin = cout
    return nn.SequentialCell(modules)

def construct(self, x):
    y = []
    x = self.base_layer(x)
    for i in range(len(self.channels)):
        x = self.dla_fn[i](x)
        y.append(x)
    return y

class DlaDeformConv(nn.Cell):
“”"
具有bn和relu的可变形卷积v2。.
“”"

def __init__(self, cin, cout):
    super(DlaDeformConv, self).__init__()
    self.actf = nn.SequentialCell([
        nn.BatchNorm2d(cout),
        nn.ReLU()])
    self.conv = DeformConv2d(cin, cout, kernel_size=3, stride=1, has_bias=True)

def construct(self, x):
    x = self.conv(x)
    x = self.actf(x)
    return x

class IDAUp(nn.Cell):
“”“IDA上采样.”“”

def __init__(self, o, channels, up_f):
    super(IDAUp, self).__init__()
    proj_list = []
    up_list = []
    node_list = []
    for i in range(1, len(channels)):
        c = channels[i]
        f = int(up_f[i])
        proj = DlaDeformConv(c, o)
        node = DlaDeformConv(o, o)
        up = nn.Conv2dTranspose(o, o, f * 2, stride=f, pad_mode='pad', padding=f // 2,                                                                                                                                  group=o)
        proj_list.append(proj)
        up_list.append(up)
        node_list.append(node)
    self.proj = nn.CellList(proj_list)
    self.up = nn.CellList(up_list)
    self.node = nn.CellList(node_list)

def construct(self, layers, startp, endp):
    for i in range(startp + 1, endp):
        upsample = self.up[i - startp - 1]
        project = self.proj[i - startp - 1]
        layers[i] = upsample(project(layers[i]))
        node = self.node[i - startp - 1]
        layers[i] = node(layers[i] + layers[i - 1])
    return layers

class DLAUp(nn.Cell):
“”“DLA上采样.”“”
def init(self, startp, channels, scales, in_channels=None):
super(DLAUp, self).init()
self.startp = startp
channels = list(channels)
if in_channels is None:
in_channels = list(channels)
scales = np.array(scales, dtype=int)
self.ida = []
for i in range(len(channels) - 1):
j = -i - 2
self.ida.append(IDAUp(channels[j], in_channels[j:],
scales[j:] // scales[j]))
scales[j + 1:] = scales[j]
in_channels[j + 1:] = [channels[j] for _ in channels[j + 1:]]
self.ida_nfs = nn.CellList(self.ida)

def construct(self, layers):
    out = [layers[-1]]  # start with 32
    for i in range(len(layers) - self.startp - 1):
        ida = self.ida_nfs[i]
        layers = ida(layers, len(layers) - i - 2, len(layers))
        out.append(layers[-1])
    a = []
    i = len(out)
    while i > 0:
        a.append(out[i - 1])
        i -= 1
    return a

@ClassFactory.register(ModuleType.MODEL)
class DLASegConv(nn.Cell):
“”"
DLA的backbone网络
“”"

def __init__(self,
             down_ratio: int,
             last_level: int,
             out_channel: int = 0,
             stage_levels: Tuple[int] = (1, 1, 1, 2, 2, 1),
             stage_channels: Tuple[int] = (16, 32, 64, 128, 256, 512)):
    super(DLASegConv, self).__init__()
    Validator.check('down_ratio', down_ratio, 'given_ratio', [2, 4, 8, 16], rel=Rel.IN)
    self.first_level = int(np.log2(down_ratio))
    self.last_level = last_level
    self.base = DLA34(stage_levels, stage_channels, block=BasicBlock)
    channels = stage_channels
    scales = [2 ** i for i in range(len(channels[self.first_level:]))]
    self.dla_up = DLAUp(self.first_level, channels[self.first_level:], scales)
    if out_channel == 0:
        out_channel = channels[self.first_level]
    self.ida_up = IDAUp(out_channel, channels[self.first_level:self.last_level],
                        [2 ** i for i in range(self.last_level - self.first_level)])

def construct(self, image):
    x = self.base(image)
    x = self.dla_up(x)
    y = []
    for i in range(self.last_level - self.first_level):
        y.append(x[i])
    y = self.ida_up(y, 0, len(y))
    return y[-1]

@ClassFactory.register(ModuleType.MODEL)
class FairmotDla34(nn.Cell):
“”"
TODO: Fairmot网络.
“”"

def __init__(self,
             down_ratio: int = 4,
             last_level: int = 5,
             head_channel: int = 256,
             head_conv2_ksize: Union[int, Tuple[int]] = 1,
             hm: int = 1,
             wh: int = 4,
             feature_id: int = 128,
             reg: int = 2):
    super().__init__()
    backbone_output_channel = 64
    self.backbone = DLASegConv(down_ratio=down_ratio,
                               last_level=last_level)
    self.head = FairMOTMultiHead(heads={'hm': hm, 'wh': wh, 'feature_id': feature_id, 'reg': reg},in_channel=backbone_output_channel,head_conv=head_channel,kernel_size=head_conv2_ksize)

def construct(self, x):
    x = self.backbone(x)
    x = self.head(x)
    return x

4.参考内容

论文：https://arxiv.org/pdf/2004.01888v2.pdf

博客：
https://blog.csdn.net/weixin_42398658/article/details/110873083
e_id, ‘reg’: reg},in_channel=backbone_output_channel,head_conv=head_channel,kernel_size=head_conv2_ksize)

def construct(self, x):
    x = self.backbone(x)
    x = self.head(x)
    return x

4.参考内容

论文：https://arxiv.org/pdf/2004.01888v2.pdf

博客：
https://blog.csdn.net/weixin_42398658/article/details/110873083
https://blog.csdn.net/qq_41204464/article/details/122893061

qq_42878061

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Fairmot理解与MindSpore框架下的实现

Fairmot理解与MindSpore框架下的实现
复制链接

扫一扫