基于单幅深度图像的实时人体部位动作识别(摘要+概述)

基于单幅深度图像的实时人体部位动作识别

Jamie Shotton      Andrew Fitzgibbon       Mat Cook       Toby Sharp      Mark Finocchio      Richard Moore      Alex Kipman      Andrew Blake

Microsoft ResearchCambridge & Xbox Incubation

 摘要

我们提出了一种新的方法,可以从单幅深度图中快速准确地预测人体关节点的3D位置,而不适用时域信息。我们采用的识别方法是,设计一个人体部位的中间表征体,用来将不同姿态的估计问题映射为简单的像素分类问题。我们巨大的而多样性很高的训练数据库,让我们的分类器对于不同人体部位的估计保持姿态、体形、衣着等的不变性(译者注:巨大的训练样本,分类器对不同的姿态体形和衣着等等都能够稳定的识别人体部位)。最后,我们通过映射的分类结果,找到对应模式,生成了几个人体节点可能的3D位置的置信分数。

 此系统在消费级硬件上以每秒200帧的速度运行。我们的评估,显示在合成数据和真实测试数据集上都有较高的准确率,并调查到一些训练影响参数。与相关工作比较,我们实现了先进的准确性,并且展示整体骨架最近邻匹配提高了普遍准确性。

概述
人体跟踪在不同领域得到应用,例如人机交互、远程交互和健康护理等。近年来,由于实时深度图像方面的提高,人体跟踪取的了很大进展。Kinect使得这一行为可以再消费级硬件上以可交互的帧率运行。

其他系统通过逐帧跟踪可以达到较高的速率,不过,他们的问题是,需要经常重新初始化,鲁棒性不佳。本文提到的方法使用逐帧初始化并恢复的方法来避免其他系统面临的障碍。

逐帧算法实现了局部姿态识别,把人体分成不同部位,执行单像素分类任务然后为骨架节点找到可选的3D位置。下图1展示了该算法的3个主要部分,从作为输入的深度图像到最终的3D节点可能位置。


Abstract
We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.
The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

Introduction
Having applications in different areas, like human-computer interaction, telepresence and health-care, human body tracking is an area where many achievements were made in the last years because of the improvements in real-time depth cameras. Kinect made that possible on consumer hardware, running at interactive rates.
Other systems are able to achieve high speeds by tracking from frame to frame, however, they have problems to re-initialize quickly and are not robust.  The approachpresented in the paper uses per-frame initialization and recovery to avoid the same obstacles that other systems face.
This per-frame algorithm realizes pose recognition in parts, dividing the human body into parts, performing a per-pixel classification task and then detecting 3D position candidates for skeletal joints. The Figure I below shows the three main parts of the algorithm, from the depth image as input to the final 3D joint proposals.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值