【ReID】Hierarchical and Efficient Learning for Person Re-Identification

刚出的arvix,Hierarchical and Efficient Learning for Person Re-Identification。文章提出了结合global 和local feature和由多种loss联合监督的恢复feature的多层级高效网络。还改进了Random Erasing方法,将方形擦除区域改为了多边形随机擦除(Random Polygon Erasing (RPE)),还提出了一种新的的名为Efficiency Score (ES) 的度量方法来评估模型效率。

论文一览:

痛点

1)大多数模型关注于嵌入复杂的模块来提高网络表现,但忽视了网络效率这一指标,文章提出的Hierarchical
and Efficient Network (HENet)既兼顾了高效的多层级网络信息提取,兼顾了网络的高效。

2)不同的损失函数设计时有不同的设计目标,文章将多种损失函数进行相互补充,用以提升model表现。

3)Random Erasing(RE)提出时是为了解决物体被遮挡的问题,提高模型鲁棒性。
文章指出当遇到一些不规则的物体遮挡(例如背包和自行车)时,RE的处理未免太过简单。因此文章设计了Random Polygon Erasing (RPE),来解决不规则遮挡问题。

4)文章提出了Efficiency Score (ES)的度量方法来衡量网络在实际应用中的效率。

模型

HENet结构如下图:

主要分为3各分支,G1分支主要学习global feature,P4分支将feature map切分为4切片。其中G1分支的feature经过Conv1x1后求Triplet loss和cross entropy loss。

P4分支切片之前也将导出global feature一支求Triplet loss,还有一支经过FC层求Cross Entropy loss(CE loss)。P4分支切片之后得到4个local features则求Online Instance Matching Loss(OIM Loss)[1],作者认为仅使用CE loss可能会导致分类器矩阵中的梯度变化很大,而在训练阶段,无参数的OIM Loss会利用额外的未标记数据,弥补这一点的不足。

R分支意在恢复feature map为原图,经过pooling和conv1x1得到recovery feature,之后使用一个decoder重建低分辨率的图像,并求pixelwise的Reconstruction Loss,即均方差MSE Loss。文章认为CE loss等分支学习图像的局部,而R分支可以学习图像的整个部分,可以看作一种对抗训练,迫使网络忽略背景,学习人体区域。MSE Loss最后计算于原image的distance。

这个R分支(recovery branch)让我想起了EANet:

https://juejin.im/post/5e81a03c6fb9a03c42378752

网络多一个分支,但不具有实际的意义,仅仅作为一种附加约束。只是EANet选择的约束是语义分割,而HENet选择的约束则是更为直接的原图像生成。

文章提出的随机多边形擦除如下图:

其伪代码如下:

源码URL:https://github.com/zhangzjn/HENet

我的源码注释如下:

class RandomPolygonErasing(object):
    """ Randomly selects a polygon region in an image and erases its pixels.
            by zhangzjn, 2019.1.1
            See https://arxiv.org/pdf/2005.08812.pdf
        Args:
             probability: The probability that the Random Erasing operation will be performed.
             pt_num: The number of vertices that make up the random polygon.
             sl: Minimum proportion of erased area against input image.
             sh: Maximum proportion of erased area against input image.
             r1: Minimum aspect ratio of erased area.
             mean: Erasing value.
        """

    def __init__(self, probability=0.5, pt_num=20, sl=0.02, sh=0.45, r=0.35, mean=[0.4914, 0.4822, 0.4465]):
        self.probability = probability
        self.mean = mean
        assert pt_num >= 3, 'pt_num less than 3 ...'
        self.pt_num = pt_num  # 参照点数量
        self.sl = sl  # 初始化擦除区域面积的最小比例
        self.sh = sh  # 初始化擦除区域面积的最大比例
        self.r = r  # 初始化最小长宽比

    def __call__(self, img):
        if random.uniform(0, 1) > self.probability:
            return img

        def generate_pt_list():  # 定义参考点采样函数
            while True:
                area = img.size()[1] * img.size()[2]  # 原图像面积
                target_area = random.uniform(self.sl, self.sh) * area  # 随机初始化erasing面积
                aspect_ratio = random.uniform(self.r, 1 / self.r)  # 随机初始化erasing面积长宽比
                r_w = int(round(math.sqrt(target_area / aspect_ratio)))  # erasing区域的宽
                r_h = int(round(math.sqrt(target_area * aspect_ratio)))  # erasing区域的长
                pt_list = []  # 存放参考点的List

                if r_w >= img.size()[2] or r_h >= img.size()[1]:
                    continue

                # 采样中心参考点(元组)
                center_pt = (
                random.randint(r_w // 2, img.size()[2] - r_w // 2), random.randint(r_h // 2, img.size()[1] - r_h // 2)
                )
                pt_list.append(center_pt)  # 存入List

                w_min = max([center_pt[0] - r_w // 2, 0])  # 取参考点width方向随机采样范围的最小值
                w_max = min([center_pt[0] + r_w // 2, img.size()[2]])  # 取参考点width方向随机采样范围的最大值
                h_min = max([center_pt[1] - r_h // 2, 0])  # 取参考点height方向随机采样范围的最小值
                h_max = min([center_pt[1] + r_h // 2, img.size()[1]])  # 取参考点height方向随机采样范围的最大值
                for _ in range(self.pt_num - 1):  # for循环采样20个参考点
                    x = int(random.randint(w_min, w_max))  # 取参考点x坐标
                    y = int(random.randint(h_min, h_max))  # 取参考点y坐标
                    pt_list.append((x, y))  # 作为元组存入list
                return pt_list  # 最后返回list,包含1个中心参考点,20个参考点

        mask = Image.fromarray(np.zeros((img.shape[1], img.shape[2])))  # 产生与原image等大小的全0 mask
        draw = ImageDraw.Draw(mask)  # ImageDraw.Draw()创建绘制对象
        pts = generate_pt_list()  # 采样参考点,得到List[Tuple_1(x1, y1), Tuple_2(x2, y2)...]

        # for simplicity
        for i in range(self.pt_num - 2):
            for j in range(0, self.pt_num):
                # draw.polygon输入若干个点(tuple),从首到尾依次连接这些点,再连接首尾得到多边形,fill填充颜色或像素,这里默认为(1).
                # 画多边形(三角形),一级for循环取pts[i]和pts[i+1]先固定两点,二级for循环遍历list所有点pts[j]绘成一系列三角形
                # 通过这一系列三角形mask最终组成我们想要的多边形mask。
                draw.polygon([pts[i], pts[i + 1], pts[j]], fill=(1))
        mask = transforms.ToTensor()(mask)
        mask_neg = 1 - mask  # 图像取反,此时擦除区域是0,非擦除区域是1
        for cnt in range(3):  # 逐通道打上mask(先清空擦除区域,将mask*预设值,再加到img中)
            img[cnt] = img[cnt] * mask_neg + mask * self.mean[cnt]
        return img

实验

测得SOTA与历年SOTA的对比:

三个benchmark中测的SOTA如下:

不同分支的分离实验如下:

不同擦除的分离实验与横向对比如下:其中(K)中的常数为选取顶点数量:

可以看到RPE的表现要比原来的RE效果普遍要好。

不同成分分支与loss分离实验如下:

写作

"
(2.Related Work, 2.1 Deep Person ReID最后一句) We employ stripe-based idea to design
our model, which is easy to follow and has strong feature
extraction ability for practical application.
"

哈哈哈还挺实诚

问题

实验没测MSMT17,且也没有对比几个月前的Circle Loss。目前来看Circle Loss应该才是真正的SOTA

Fig. 2的网络结构图画的挺好的,就是相关性不够完整和清楚

写作的问题,没有指出Cross Entropy loss在文后缩写为CE loss,看得我蒙蔽了一阵。

没有源码也没有联系方式,很难受。

参考文献

[1] Xiao, T., Li, S.,Wang, B., Lin, L.,Wang, X., 2017. Joint detection and identification
feature learning for person search, in: CVPR, IEEE. pp. 3376–3385.

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Deep person re-identification is the task of recognizing a person across different camera views in a surveillance system. It is a challenging problem due to variations in lighting, pose, and occlusion. To address this problem, researchers have proposed various deep learning models that can learn discriminative features for person re-identification. However, achieving state-of-the-art performance often requires carefully designed training strategies and model architectures. One approach to improving the performance of deep person re-identification is to use a "bag of tricks" consisting of various techniques that have been shown to be effective in other computer vision tasks. These techniques include data augmentation, label smoothing, mixup, warm-up learning rates, and more. By combining these techniques, researchers have been able to achieve significant improvements in re-identification accuracy. In addition to using a bag of tricks, it is also important to establish a strong baseline for deep person re-identification. A strong baseline provides a foundation for future research and enables fair comparisons between different methods. A typical baseline for re-identification consists of a deep convolutional neural network (CNN) trained on a large-scale dataset such as Market-1501 or DukeMTMC-reID. The baseline should also include appropriate data preprocessing, such as resizing and normalization, and evaluation metrics, such as mean average precision (mAP) and cumulative matching characteristic (CMC) curves. Overall, combining a bag of tricks with a strong baseline can lead to significant improvements in deep person re-identification performance. This can have important practical applications in surveillance systems, where accurate person recognition is essential for ensuring public safety.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

锥栗

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值