Deep Learning Models and Code for Pose Estimation

The task of pose estimation aims to map human pixels of an RGB image or video to the 3D surface of the human body. Pose estimation is a multifacted task, and involves several other problems: object detection, pose estimation, segmentation, and more.

Applications of pose estimation include problems that require going beyond plain landmark localization, such as graphics, augmented reality (AR), or human-computer interaction (HCI). Pose estimation also involves many aspects of 3D-based object recognition.

In this post, we share several open sourced deep learning models and code for pose estimation. If we missed out an implementation that you think deserves to be shared, leave it in the comments below.


DensePose

 

GitHub | Dataset | Paper

The inspiration for this post came from Facebook Research, who released their code, models, and dataset for DensePose earlier last week. Facebook shared DensePose-COCO, a large-scale ground-truth dataset for human pose estimation. The dataset consists of image-to-surface correspondences manually annotated on 50K COCO (Common Objects in Context) images. This is an amazingly comprehensive resource for deep learning researchers. It provides a good source of data for the task of pose estimation, part segmentation, and more.

The DensePose paper proposes DensePose-RCNN, a variant of Mask-RCNN, to densely regress part-specific UV coordinates within every human region at multiple frames per second. It is based on DenseReg. The goal of the model is to determine the surface location of each pixel, and its corresponding 2D paremeterization of the part it belongs to.

DensePose adopts the architecture of Mask-RCNN with the Feature Pyramid Network (FPN) features, and ROI-Align pooling. Additionally, they introduce a fully-convolutional network on top of the ROI-pooling. For more in-depth technical details, check out the DensePose paper.

 

OpenPose

 

GitHub | Dataset

OpenPose is a real-time multi-person keypoint detection library for body, face, and hands estimation by the CMU Perceptual Computing Lab.

OpenPose provides both 2D and 3D multi-person keypoint detection, as well as a calibration toolbox for estimation of domain specific parameters. OpenPose allows a wide variety of input: image, video, webcam, IP camera, and more. It also produces output in a wide variety of formats: images and keypoints (PNG, JPG, AVI), keypoint saving in readable formats (JSON, XML, YML), and even as an array class. Input and output parameters are also adjustable to suit a wide variety of needs.

OpenPose provides a C++ API, and works on both CPU and GPU - including versions compatible with AMD graphic cards.

 

Realtime Multi-Person Pose Estimation

 

GitHub | Paper

This implementation is highly related to OpenPose, and features models related to the implementation in a wide variety of frameworks. The authors of this paper present a bottom-up approach for realtime multi-person pose estimation, without using any person detector.

This approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. For more technical details about the implementation and theory, refer to the paper.

One of the best features of this approach is that is has been implemented in many different frameworks, and code and models are readily available for your framework of choice:

 

AlphaPose

AlphaPose

GitHub | Paper

Alpha Pose is an accurate multi-person pose estimator, and claims to be the first open source system. AlphaPose performs both pose estimation and pose tracking on images, videos, or lists of images. It produces a variety of outputs, including image with keypoint displays in PNG, JPEG, and AVI formats, as well as keypoint output in JSON format, making it a great tool for more application focused uses.

At present, there is both a TensorFlow implementation and a PyTorch implementation.

AlphaPose uses a regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. There are three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). For more technical details, refer to the paper.

 

Human Body Pose Estimation

 

Website | GitHub | Dataset | ArtTrack Paper | DeeperCut Paper

This code repository presents a TensorFlow implementation of the Human Body Pose Estimation algorithm, presented in the ArtTrack and DeeperCut papers. The model trained makes use of the MPII Human Pose Database, a rich collection of images for evaluation of articulated human pose estimation.

This project considers the task of articulated human pose estimation of multiple people in real world images. Their approach solves both the tasks of detection and pose estimation, which differs from previous approaches that first detect people and subsequently estimate their body pose. CNN-based part detectors and an integer linear program is used in their implementation. For more technical details, refer to the ArtTrack and DeeperCut papers.

 

DeepPose

 

Paper

DeepPose is a relatively older paper from 2014, that proposes a method for human pose estimation based on Deep Neural Networks (DNNs), formulated as a DNN-based regression problem towards body joints. It reasons about pose in a holistic fashion and has a simple but yet powerful formulation.

DeepPose does not appear to have an official implementation available online. However, there have been efforts to replicate its results:

DeepPose is interesting as it is the first application of deep learning to human pose estimation, and achieved state of the art results at the time of its inception, providing a baseline for many of the other more recent implementations.


Pose estimation is an increasingly popular problem within the computer vision community. With the recent release of new pose estimation datasets such as DensePose-COCO by Facebook Research, there now exists more resources for work in this area. In my opinion, there are many directions that you can take pose estimation, and the release of these resources is sure to spurge new interest in the field. Hopefully, we'll see many new and innovative ideas and implementations soon.

Did we miss your favorite model or implementation for pose estimation? Post it in the comments below, and we'll update the post accordingly!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值