行人重识别(Re-ID)与跟踪(tracking)区别

总结自:https://www.zhihu.com/question/68584669/answer/265070848

作者:陈狄

链接:https://www.zhihu.com/question/68584669/answer/326110383
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

从任务的角度来看,两者最主要的区别如下:

  • 行人再识别:给定一张切好块的行人图像 (probe image, 即图像大部分内容只包含这个人), 从一大堆切好块的图像 (gallery images) 中找到跟probe image中同一身份的人的图像。这些图像通常是由不同摄像头拍摄的不连续帧
  • 行人跟踪:给定一张切好块的行人图像 (probe image), 从一段全景视频 (panorama track, 视野中只有一小部分是这个行人) 中找到 probe 所在的位置。这段全景视频是由单个摄像头拍摄的连续帧
<img src="https://pic4.zhimg.com/v2-f0d33f0901e6fcdda2c91c42891af7d7_b.jpg" data-size="normal" data-rawwidth="718" data-rawheight="388" class="origin_image zh-lightbox-thumb" width="718" data-original="https://pic4.zhimg.com/v2-f0d33f0901e6fcdda2c91c42891af7d7_r.jpg"> 行人再识别(左)与行人跟踪(右)

在视频监控领域,我们的最终目标是要做到多目标跨摄像头跟踪 (Multi-target Multi-camera Tracking, 简称MTMC Tracking). 而行人再识别和行人跟踪都是为了达到这个最终目标的子任务。

<img src="https://pic1.zhimg.com/v2-bc7f0942b1393224d2e19221a85b1fe0_b.jpg" data-size="normal" data-rawwidth="617" data-rawheight="456" class="origin_image zh-lightbox-thumb" width="617" data-original="https://pic1.zhimg.com/v2-bc7f0942b1393224d2e19221a85b1fe0_r.jpg"> 从行人再识别 (Re-ID) 到跨时段跨摄像头跟踪 (MTMC Tracking)

简单画了个图~ 如上,Re-ID 在图中的第三象限,处理的是静态图像,并且是已经切好块的patch.

然而在实际应用中,摄像头拍摄到的都是全景图像,于是就需要加入行人检测 (Pedestrian Detection) 技术,从全景图像中找到行人的位置,再将包含行人的图像块切出来。此时就形成了位于第二象限的新任务:行人搜索 (Person Search).

行人搜索处理的对象依然是静态图像,在实际应用中摄像头拍摄到的通常是动态的视频。如果能将时序信息 (Temporal Information) 利用起来,加上行人跟踪 (Tracking) 技术,特别是 Tracking by Detection 技术,就能大致实现位于第一象限的最终目标 MTMC Tracking.

另一方面,在 Re-ID 的基础上,如果不考虑行人检测,直接将时序信息利用起来,就形成了位于第四象限的任务:基于视频的行人再识别 (Video-based Re-ID), 有时也被称作 Multi-shot Re-ID. 同样地,将这个任务扩展到全景视频上也能够达到最终目标。

目前大量的工作都集中在第三象限的 Re-ID 上,相比之下 Person Search 和 Video-based Re-ID 的工作就少了很多。直接解决 MTMC Tracking 的工作更是少之又少。各位同僚们一起努力吧~


  • 5
    点赞
  • 52
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem. Abstract—This paper presents a robust Joint Discriminative appearance model based Tracking method using online random forests and mid-level feature (superpixels). To achieve superpixel- wise discriminative ability, we propose a joint appearance model that consists of two random forest based models, i.e., the Background-Target discriminative Model (BTM) and Distractor- Target discriminative Model (DTM). More specifically, the BTM effectively learns discriminative information between the target object and background. In contrast, the DTM is used to suppress distracting superpixels which significantly improves the tracker’s robustness and alleviates the drifting problem. A novel online random forest regression algorithm is proposed to build the two models. The BTM and DTM are linearly combined into a joint model to compute a confidence map. Tracking results are estimated using the confidence map, where the position and scale of the target are estimated orderly. Furthermore, we design a model updating strategy to adapt the appearance changes over time by discarding degraded trees of the BTM and DTM and initializing new trees as replacements. We test the proposed tracking method on two large tracking benchmarks, the CVPR2013 tracking benchmark and VOT2014 tracking challenge. Experimental results show that the tracker runs at real-time speed and achieves favorable tracking performance compared with the state-of-the-art methods. The results also sug- gest that the DTM improves tracking performance significantly and plays an important role in robust tracking.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值