Abstract
1、提出a novel structure-aware convolutional network,能够在深度网络的训练过程中隐式地考虑先验知识(人体关节的几何约束关系);
2、Generator预测的pose和occlusion heatmaps输入到discriminator预测与真实姿态的相似度;
3、Conditional Generative Adversarial Networks(GANs)训练策略
目标:If the pose generator(G) generates results that the discriminator fails to distinguish from real ones,the network successfully learns the priors.
Introduction
1、简要介绍Human pose estimation的重要性及其应用,并概括研究难点;
2、介绍DCNN网络【30,29,31,6,33,19,4】在该领域的应用,并介绍其思想、优缺点。缺点:当人体部位被遮挡或者背景与人体部位相似时很难回归精确的heatmaps。
3、介绍Human vision的优点,并介绍无法将人体结构先验知识加入DCNNs的原因【30】。
4、提出解决DCNN产生不合理的human pose的思想:考虑人体关节点结构的先验知识——从大量训练数据中学习real body joints distribution。
5、直接学习先验知识是困难的,因此考虑隐式学习。思想:we have a ‘discriminator’ which can tell whether the predicted pose is geometrically reasonable。判断依据:If the DCNN regressor is able to ‘deceive’ the ‘discriminator’ that its predictions are all reasonable,the network would have successfully learned the priors of the human body structure。受【23,38,26,11,8】GANs的启发:design the ‘discriminator’ as the discriminator network while the regression network functions as a generative network.训练策略:training the generator in the adversarial manner against the discriminator exactly meets our intention.
6、思想的具体思路:discriminator应该被输入有效的信息以进行分类;generator应该有能力模拟复杂特征。因此,设计一个多任务学习网络 G G G,能够同时输出pose heatmaps和occlusion heatmaps,之后使用pose discriminator( P P P)判断身体配置是否合理。
7、初步结果显示:correct locations correspond to highly conficent heatmaps。因此设计一个discriminator计算pose heatmaps的置信度。The generator is asked to “fool” both the pose and confidence discriminators by training G G G and {
P , C P,C P,C} in the generative adversarial manner.Thus,the human body structure is implied in the P P P net by guiding G G G to the direction that is close to ground-truth heatmaps and satisfies joint-connectivity constraints of the human body.The learned G G G net is expected to be more robust to occlusions and cluttered backgrounds where the precise description for different body parts are required.增加判别器的目的是进一步提高网络性能,优化预测结果。
主要贡献
1、介绍主要贡献及其结果
2、两种实验:(1)直接使用GAN网络(2)设计级联多任务网络
3、评估实验结果
Related Work
Our work is closely related to work using heatmap based DCNN methods for human pose estimation and Generative Adversarial Networks.
1、综述Human Pose Estimation:
本文的工作主要与从图像中回归heatmaps进行Human Pose Estimation相关【34,19,29,33,6,22,14,30】。 G G G网络是一个具有“卷积-反卷积”结构的全卷积的多任务网络,因此两种任务的特征连接成第二个堆叠网络的特征。
2、Generative Adversarial Network:
Human pose estimation可以看作由RGB图像到多通道heatmaps的转换,可以由 G G G网络完成。而 P P P网络不仅需要区分真假样本,还需要结合几何约束。(这是采用与传统GAN网络不同训练策略的原因)
2 The Proposed Adversarial PoseNet
网络由三部分组成:pose generator network G G G,pose discriminator network P P P 和confidence discriminator C C C。C网络是一个bottom-up和top-down网络,输入为RGB图像,输出为32个heatmaps。其中16个heatmaps为16个人体关键点的字体预测,另一半为the corresponding occlusion predictions。The values in each heatmap are confidence scores in the range of [0,1] where a Gaussian blur is done around the grpund truth position。单独使用 G G G网络时,通过前向和反向传播自我更新,可能会产生low confidence和错误的pose estimation。因此,增加 C C C和 P P P网络提高预测精度。
2.1 Multi-Task Generative Network
1、人体部位是否被遮挡对于推理人体姿态的几何信息非常重要。因此,提出多任务生成网络以有效结合pose estimation和occlusion predictions。
2、多任务生成网络的目标:
学习函数 g g g能够将图像 x x x映射为相应的pose heatmaps y y y 和occlusion heatmaps z z z,即
g ( x ) = { y ^ , z ^ } g(x)=\{\hat y,\hat z\} g(x)={
y^,z^}
—— y ^ \hat y y^和 z ^ \hat z z^是预测的heatmaps。
3、【33】较大的上下文信息对于定位身体部位很重要,即神经元的上下文区域(感受野应该很大)。为了实现这一目标使用“encoder-decoder”结构。
4、local evidence对于识别面部和手的特征很有用,人体姿态估计需要对整个人体图像有一个整体的理解。To capture this information at each scale,在encoder和decoder的mirrored layer增加skip connections。Inspired by 【19】, our network is also stacked to provide the network with a mechanism for re-evaluation of initial estimates and features across the entire image. In each module of the G net, a residual block 【12】 is used for the convolution operator.给定原始图像 x x x,级联多任务生成网络的basic block表示如下:
{ { Y n , Z n , X } = g n ( Y n − 1 , Z n − 1 , X ) , i f n ≥ 2 { Y n , Z n , X } = g n ( X ) , i f n = 1 \begin{cases} \{Y_n,Z_n,X\}=g_n(Y_{n-1},Z_{n-1},X),if\space n\ge 2\\ \{Y_n,Z_n,X\}=g_n(X),if\space n=1 \end {cases} {
{
Yn,Zn,X}=gn(Yn−1,Z