Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation(2017)

最新推荐文章于 2024-04-08 16:30:52 发布

qq_43452156

最新推荐文章于 2024-04-08 16:30:52 发布

阅读量549

点赞数

分类专栏： human pose estimation

本文链接：https://blog.csdn.net/qq_43452156/article/details/104389118

版权

本文提出了一种名为Adversarial PoseNet的结构感知卷积网络，该网络在训练中考虑了人体关节的几何约束关系。通过引入生成对抗网络（GANs），包括多任务生成网络、姿态判别器和置信度判别器，该模型旨在隐式学习人体结构的先验知识。在处理遮挡和复杂背景时，该方法能更好地估计人体姿态。

摘要由CSDN通过智能技术生成

Abstract

1、提出a novel structure-aware convolutional network，能够在深度网络的训练过程中隐式地考虑先验知识（人体关节的几何约束关系）；
2、Generator预测的pose和occlusion heatmaps输入到discriminator预测与真实姿态的相似度；
3、Conditional Generative Adversarial Networks(GANs)训练策略
目标：If the pose generator(G) generates results that the discriminator fails to distinguish from real ones,the network successfully learns the priors.

Introduction

1、简要介绍Human pose estimation的重要性及其应用，并概括研究难点；
2、介绍DCNN网络【30，29，31，6，33，19，4】在该领域的应用，并介绍其思想、优缺点。缺点：当人体部位被遮挡或者背景与人体部位相似时很难回归精确的heatmaps。
3、介绍Human vision的优点，并介绍无法将人体结构先验知识加入DCNNs的原因【30】。
4、提出解决DCNN产生不合理的human pose的思想：考虑人体关节点结构的先验知识——从大量训练数据中学习real body joints distribution。
5、直接学习先验知识是困难的，因此考虑隐式学习。思想：we have a ‘discriminator’ which can tell whether the predicted pose is geometrically reasonable。判断依据：If the DCNN regressor is able to ‘deceive’ the ‘discriminator’ that its predictions are all reasonable,the network would have successfully learned the priors of the human body structure。受【23，38，26，11，8】GANs的启发：design the ‘discriminator’ as the discriminator network while the regression network functions as a generative network.训练策略：training the generator in the adversarial manner against the discriminator exactly meets our intention.
6、思想的具体思路：discriminator应该被输入有效的信息以进行分类；generator应该有能力模拟复杂特征。因此，设计一个多任务学习网络 $G$ ，能够同时输出pose heatmaps和occlusion heatmaps，之后使用pose discriminator（ $P$ ）判断身体配置是否合理。
7、初步结果显示：correct locations correspond to highly conficent heatmaps。因此设计一个discriminator计算pose heatmaps的置信度。The generator is asked to “fool” both the pose and confidence discriminators by training $G$ and { $P, C$ } in the generative adversarial manner.Thus,the human body structure is implied in the $P$ net by guiding $G$ to the direction that is close to ground-truth heatmaps and satisfies joint-connectivity constraints of the human body.The learned $G$ net is expected to be more robust to occlusions and cluttered backgrounds where the precise description for different body parts are required.增加判别器的目的是进一步提高网络性能，优化预测结果。

主要贡献

1、介绍主要贡献及其结果
2、两种实验：（1）直接使用GAN网络（2）设计级联多任务网络
3、评估实验结果

Related Work

Our work is closely related to work using heatmap based DCNN methods for human pose estimation and Generative Adversarial Networks.

1、综述Human Pose Estimation：

本文的工作主要与从图像中回归heatmaps进行Human Pose Estimation相关【34，19，29，33，6，22，14，30】。 $G$ 网络是一个具有“卷积-反卷积”结构的全卷积的多任务网络，因此两种任务的特征连接成第二个堆叠网络的特征。

2、Generative Adversarial Network：

Human pose estimation可以看作由RGB图像到多通道heatmaps的转换，可以由 $G$ 网络完成。而 $P$ 网络不仅需要区分真假样本，还需要结合几何约束。（这是采用与传统GAN网络不同训练策略的原因）

2 The Proposed Adversarial PoseNet

在这里插入图片描述网络由三部分组成：pose generator network $G$ ，pose discriminator network $P$ 和confidence discriminator $C$ 。C网络是一个bottom-up和top-down网络，输入为RGB图像，输出为32个heatmaps。其中16个heatmaps为16个人体关键点的字体预测，另一半为the corresponding occlusion predictions。The values in each heatmap are confidence scores in the range of [0,1] where a Gaussian blur is done around the grpund truth position。单独使用 $G$ 网络时，通过前向和反向传播自我更新，可能会产生low confidence和错误的pose estimation。因此，增加 $C$ 和 $P$ 网络提高预测精度。

2.1 Multi-Task Generative Network

在这里插入图片描述 1、人体部位是否被遮挡对于推理人体姿态的几何信息非常重要。因此，提出多任务生成网络以有效结合pose estimation和occlusion predictions。
2、多任务生成网络的目标：
学习函数 $g$ 能够将图像 $x$ 映射为相应的pose heatmaps $y$ 和occlusion heatmaps $z$ ,即
$g(x)=\{\hat y,\hat z\}$
—— $\hat y$ 和 $\hat z$ 是预测的heatmaps。
3、【33】较大的上下文信息对于定位身体部位很重要，即神经元的上下文区域（感受野应该很大）。为了实现这一目标使用“encoder-decoder”结构。
4、local evidence对于识别面部和手的特征很有用，人体姿态估计需要对整个人体图像有一个整体的理解。To capture this information at each scale，在encoder和decoder的mirrored layer增加skip connections。Inspired by 【19】, our network is also stacked to provide the network with a mechanism for re-evaluation of initial estimates and features across the entire image. In each module of the G net, a residual block 【12】 is used for the convolution operator.给定原始图像 $x$ ,级联多任务生成网络的basic block表示如下：
$\begin{cases} \{Y_n,Z_n,X\}=g_n(Y_{n-1},Z_{n-1},X),if\space n\ge 2\\ \{Y_n,Z_n,X\}=g_n(X),if\space n=1 \end {cases}$