行人Reid的入门基本知识

0. 国际惯例,名词解释。

行人Reid,行人重识别。即在同一或不同摄像头下通过对行人检测框的图像进行feature提取、match等来判断两张图或多张是不是同一个人。既然是判断哪个是真孙悟空,那就得有真孙悟空的照片吧,就是groundtruth,在reid中叫query image。那些待验真身的图片叫做gallery image。通俗的来完了,还是得要官方的解释下:

gallery set –—参考图像集,即公认的标准行人库,测试用;

query set —– 待查询图像集,测试用

很拗口,推荐不看。

1.数据集

炼丹嘛,没数据怎么搞。

Market-1501、CUHK03、DukeMTMC-reID,MSMT17是SOTA衡量Re-ID技术的比较主流的数据集。大致介绍下几大数据集:

1.1  Market-1501

基本可以说是最早,也是大家paper用的最多的数据集了。

官方是这么说的:

  • 该数据集在清华大学校园中采集,图像来自6个不同的摄像头,包含5个高分辨率(1280x1080 HD,fps: 25)和1个低分辨率(720x576 SD,fps: 25);
  • 该数据集有1501个类别,共36036张图片,其中训练集有751个ID:共12936张,测试集有750个ID:共19732张。所以在训练集中,每个ID平均有17.2张训练图片,在测试集中,每个ID平均包含26.3张图片;
  • 每个类别的图片最多能被六个摄像头捕捉,最少能被两个摄像头捕捉;
  • 在开放环境中,多摄像头组成的捕捉系统使样本包含多种属性、信息和环境背景;
  • Market-1501数据集的图像是由检测器自动检测并切割,包含一些检测误差,较为接近真实使用情况。
  • ref:http://www.liangzheng.org/Project/project_reid.html

--market1501文件结构

  • "bounding_box_train" – 751个ID,12936张图片,训练集;
  • "bounding_box_test" – 750个ID,19732张图片,测试集,也是所谓的gallery参考图像集;
  • "query" – 750个ID,共3368张图片,即待查询图片。test中750个ID在每个摄像头中随机选择一张图像作为query,因此一个ID的query最多有 6 个,ps:与test中的图不重复,在参考建立自己的数据集时,可以先建好test,然后按需要从test中剪切得到query;
  • "gt_query" – bla bla...个人感觉没什么用
  • "gt_bbox" – bla bla...个人感觉没什么用

--命名规则

0001_c1s1_001051_00.jpg,其中:

0001表示ID的编号,C1表示第一个camera1,s1表示第一个视频片段,001051:帧号 ,00表示手工标注的bbox,如果是01则是DPM检测器得到的bbox。

其他不多说,知道太细反而不好,对了,再多说一句,market的train和test有几张脏数据,或者错误标注。大家实际项目用的话需要清洗下,如果是写paper就无所谓了,不过清一下或者可以给你的paper涨点哈。自己清过一次,有需要的话大家私我,我发一份,涉及版权问题,就不放链接了 = =*。

1.2  DukeMTMC-reID——https://github.com/layumi/DukeMTMC-reID_evaluation

感觉没什么新的东西,就是比我们清华大学的多了一些相机覆盖,体量大了一些。

--目录结构

基本跟market相似。

  • “bounding_box_test” – 测试集,包含702人,共17,661张图像(随机采样,702 ID + 408 distractor ID)
  • “bounding_box_train” – 训练集,包含702人,共16,522张图像(随机采样)
  • “query” – 为测试集中的702人在每个摄像头中随机选择一张图像作为query,共2,228张图像

--命名规则

0001_c2_f0046182.jpg ,相比于market少了s号而已,大同小异。

1.3  CUHK03——https://drive.google.com/file/d/0B7TOZKXmIjU3OUhfd3BPaVRHZVE/view

港中文的数据集。大同小异

1.4  MSMT17

大同小异。

3. 评估指标

重点来了,一个做deep learning的博客,不写点数学总觉得不太专业。我当时看这部分的时候,连翻了好几个这个领域大佬的博客,看完还是有点ran。

先来说下几个reid中比较常见的数据指标:mAP,rank1.

就通俗解释下这两个,详细数学计算大家参考下类似的这种:https://blog.csdn.net/u013698770/article/details/60776102

先说rank1. 就是我从一大堆孙悟空照片里,第一张就match到真悟空的概率。同理rank2,就是第二张拿对的概率。

那么mAP就是第几次x对应概率加权求平均的一个精度值。用来表示mean average precision。

这个图我觉得做的很好,share给大家:

4. 结语

最后关于reid多说两句,现阶段reid工业界面刚开始落地,还有很多的实际问题需要解决,比如遮挡问题,跨域问题,衣服特征对结果精度的影响过大的问题,希望后续入坑这个课题的同学多做一些针对实际场景的work,reid还有很长的路要走。共勉。

Human parsing has been extensively studied recently (Yamaguchi et al. 2012; Xia et al. 2017) due to its wide applications in many important scenarios. Mainstream fashion parsing models (i.e., parsers) focus on parsing the high-resolution and clean images. However, directly applying the parsers trained on benchmarks of high-quality samples to a particular application scenario in the wild, e.g., a canteen, airport or workplace, often gives non-satisfactory performance due to domain shift. In this paper, we explore a new and challenging cross-domain human parsing problem: taking the benchmark dataset with extensive pixel-wise labeling as the source domain, how to obtain a satisfactory parser on a new target domain without requiring any additional manual labeling? To this end, we propose a novel and efficient crossdomain human parsing model to bridge the cross-domain differences in terms of visual appearance and environment conditions and fully exploit commonalities across domains. Our proposed model explicitly learns a feature compensation network, which is specialized for mitigating the cross-domain differences. A discriminative feature adversarial network is introduced to supervise the feature compensation to effectively reduces the discrepancy between feature distributions of two domains. Besides, our proposed model also introduces a structured label adversarial network to guide the parsing results of the target domain to follow the high-order relationships of the structured labels shared across domains. The proposed framework is end-to-end trainable, practical and scalable in real applications. Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem. Abstract—This paper presents a robust Joint Discriminative appearance model based Tracking method using online random forests and mid-level feature (superpixels). To achieve superpixel- wise discriminative ability, we propose a joint appearance model that consists of two random forest based models, i.e., the Background-Target discriminative Model (BTM) and Distractor- Target discriminative Model (DTM). More specifically, the BTM effectively learns discriminative information between the target object and background. In contrast, the DTM is used to suppress distracting superpixels which significantly improves the tracker’s robustness and alleviates the drifting problem. A novel online random forest regression algorithm is proposed to build the two models. The BTM and DTM are linearly combined into a joint model to compute a confidence map. Tracking results are estimated using the confidence map, where the position and scale of the target are estimated orderly. Furthermore, we design a model updating strategy to adapt the appearance changes over time by discarding degraded trees of the BTM and DTM and initializing new trees as replacements. We test the proposed tracking method on two large tracking benchmarks, the CVPR2013 tracking benchmark and VOT2014 tracking challenge. Experimental results show that the tracker runs at real-time speed and achieves favorable tracking performance compared with the state-of-the-art methods. The results also sug- gest that the DTM improves tracking performance significantly and plays an important role in robust tracking.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值