【论文学习】:Multi-Person Pose Estimation for PoseTrack with Enhanced Part Affinity Fields

论文地址:https://posetrack.net/workshops/iccv2017/pdfs/ML_Lab.pdf

这篇文章,初见简直惊艳以及钦佩,加上参考文献才4面纸,效果竟然拿到了2017年ICCV Challenge 1: Single-Frame Person Pose Estimation的第一名。

ps:先简单看下论文说的什么。

本文给我最大的启示是:人体姿势估计和语义分割之间有着紧密联系。一些方法[[8]  Newell A, Huang Z, Deng J. Associative Embedding: End-to-End Learning for Joint Detection and Grouping[J]. 2016. [9]  He K, Gkioxari G, Dollár P, et al. Mask R-CNN[J]. 2017.9]可以完成这两种任务,并在COCO关键点基准上实现最先进的性能。或许姿势估计网络可以受益于语义分割中使用的算法。

1.介绍

Human pose estimation attracts increasing attentions, not only from researchers, but also from many corporations. One of the main applications is to understand human activity and interactions, which is mentioned frequently in existing literature. But now it comes to some specific scenarios. For example, self-driving car companies use it to understand the pedestrian’s action and intention. Elder care robot can detect the fall down events by analyzing user’s body pose. Some companies have already developed a prototype or demo using human pose estimation.

人体姿态估计不仅受到研究人员的关注,而且也受到许多公司的关注,其中一个主要的应用是了解人类的活动和相互作用,这在现有文献中经常提到。但是现在又到了一些特定的场景。例如,自动驾驶汽车公司利用它来了解行人的行为和意图。分析用户的身体位置。一些公司已经开发了一个原型或演示使用人体姿态估计。

In order to apply this technology to self-driving car or care robot, we need to address some challenging problems, such as human pose estimation for multi-person, due to multi-person interaction and occlusion. PoseTrack [1] dataset provides numerous images clipped from videos. In the images, multi-person interact with each other. This benchmark presents the common scene in daily life, and would act as a persuasive index for algorithm robustness.

为了将这一技术应用于无人驾驶汽车或护理机器人,我们需要解决一些具有挑战性的问题,如多人交互和遮挡造成的多人姿态估计等。PoeTrack[1]数据集提供了大量从视频中剪裁出来的图像。在图像中,多人之间进行交互,这一基准显示了日常生活中常见的场景,并可作为算法鲁棒性的一个有说服力的指标。

In this work, we present an improved approach based on Cao’s [2] framework, which is the champion of COCO 2016 keypoints challenge [3], and discuss some potential weakness of this method. First, to enjoy the benefits of more training data, we pre-train the model on COCO dataset. Second, we extend the original Part Affinity Fields (PAFs) mechanism to redundant PAFs, which reveals an essential defect of PAFs. Third, by rethinking the network structure

在本文中,我们提出了一种基于CaO[[2]  Cao Z, Simon T, Wei S E, et al. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields[J]. 2016.]框架的改进方法,该框架是coco 2016关键点挑战[3]的倡导者,并讨论了该方法的一些潜在弱点:第一,为了享受更多训练数据的好处,我们对coco数据集进行了预训练;第二,将原有的部分亲和域(PAFs)机制扩展到冗余PAFS,揭示了PAFS的一个本质缺陷。通过对网络结构本身的重新思考,我们发现了一些微小的修改,可以导致显着的改进。提交是通过这三种修改来实现的。

Additionally, inspired by semantic segmentation, we design some experiments that exploit semantic segmentation framework, such as Deeplab [4] and SDN [5]. We also have tried DenseNet [6] and the holealgorithm. But limited to the deadline, we have not used this framework as the final version. Thus, the submission result is unrelated to these experiments. Although we have not obtained an outperforming result, some conclusions might be helpful for future work of multi-person pose estimation.

此外,在语义切分的启发下,我们设计了一些利用语义分割框架的实验,如Depplab[Chen L C, Papandreou G, Kokkinos I, et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, PP(99):1-1.]和SDN[Fu J, Liu J, Wang Y, et al. Stacked Deconvolutional Network for Semantic Segmentation[J]. 2017.],我们还尝试了密度集[Huang G, Liu Z, Weinberger K Q, et al. Densely Connected Convolutional Networks[J]. 2016.]和HoleAlgorithm,但限于deadline,我们没有使用这个框架作为最终版本,因此提交结果与这些实验无关。虽然我们还没有得到一个好的结果,但一些结论可能会对未来的多人姿态估计工作有所帮助。

2. RELATED WORK 

We briefly review the two categories of multi-person pose estimation approaches. Then the advantages and disadvantages of them are discussed.

我们简要回顾两类多人姿势估计方法。 然后讨论了它们的优缺点。

Most of the multi-person pose estimation approaches can be categorized into top-down approach and bottom-up approach. Top-down approachis the most common method, which uses person detector and performs single person estimation for each individual. Some methods [1, 7] concatenate detector and person estimation in sequence, and others [8, 9] predict person bounding box and joints simultaneously, in a unified network. Bottom-up approach [2, 10] first predicts individual body jo

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值