READING NOTE: Convolutional Pose Machines

TITLE: Convolutional Pose Machines

AUTHOR: Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh

ASSOCIATION: CMU

FROM: arXiv:1602.00134

CONTRIBUTIONS

  1. learning implicit spatial models via a sequential composition of convolutional architectures
  2. a systematic approach to designing and training such an architecture to learn both image features and image-dependent spatial models for structured prediction tasks, without the need for any graphical model style inference.

METHOD

The following figure shows the comparison of traditional Pose Machine and Convolutional Pose Machine

Pose Machines

A pose machine consists of a sequence of multi-class predictors, g t ()  , that are trained to predict the location of each part in each level of the hierarchy. In each stage  t{1...T}  , the classifiers g t   predict beliefs for assigning a location to each part Y p =z,zZ  , where Z  is the set of all locations in an image.

As illustrated in the figure (a) and (b), the image is first sent to Stage  1  and a belief map is predicted. Then the belief map and image features x    are combined to sent to the following stage. As the procedure repeats, final result is predicted from the last Stage  T  .

Convolutional Pose Machines

Convolutional Neural Network is naturally a sequence of stages if multiple losses and predictors are inserted at the intermediate layers. The (c) and (d) in the figure illustrated a convolutional pose machine. The sub-network in (c) plays the role of first stage. The shared network at the top-left corner in (d) is used to extract image features x   , which will be combined with the output of every Stage  t1  and sent to Stage  t  <script id="MathJax-Element-95" type="math/tex">t</script>. In addition, the stacked convolutional layers’ perceptual field increases as deepening, which means that more contextual infomation is taken into consideration helping refine the output.

When training, every stage has its own loss function to predict parts. These losses work similar with the auxiliary classifiers in GoogleNet, which helps alleviate the problem caused by the vanishing of gradient. The network can be trained end-to-end. Compared with traditional pose machine, CMP is much easier to train. The visualization of the network can be found here

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值