论文阅读：Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition

最新推荐文章于 2022-01-27 09:36:49 发布

小吴同学真棒

最新推荐文章于 2022-01-27 09:36:49 发布

阅读量633

点赞数

分类专栏：人工智能学习

本文链接：https://blog.csdn.net/qq_36627158/article/details/117702735

版权

骨架动作识别图卷积网络 A-Links S-Links 未来姿势预测

关键词由CSDN通过智能技术生成

学习同时被 2 个专栏收录

115 篇文章 7 订阅

订阅专栏

人工智能

72 篇文章 5 订阅

订阅专栏

Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition

（2019 CVPR）

Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian

Notes

Contributions

we propose the A-link inference module (AIM) to infer actional links which capture action-specific latent dependencies. The actional links are combined with structural links as generalized skeleton graphs.
We propose the actional-structural graph convolution network (AS-GCN) to extract useful spatial and temporal information based on the multiple graphs.
We introduce an additional future pose prediction head to predict future poses, which also improves the recognition performance by capturing more detailed action patterns.
The AS-GCN outperforms several state-of-the-art meth- ods on two large-scale data sets; As a side product, AS- GCN is also able to precisely predict the future poses.

Method

Actional Links (A-links)

To capture richer dependencies, we introduce an encoder-decoder structure, called A-link inference module, to capture action-specific latent dependencies, i.e. actional links, directly from actions.

1、Encoder. The functionality of an encoder is to estimate the states of the A-links given the 3D joint positions across time; that is,

where C is the number of A-link types. The encoder produces A-links by propagating information between joints and links iteratively to learn link features.

2、Decoder. The functionality of the decoder to predict the future 3D joint positions conditioned on the A-links inferred by the encoder and previous poses; that is,

The decoder predict future joint positions based on the inferred A-links.

3、AGC. Given the input Xin, the AGC is

where W is the trainable weight to capture feature importance. Note that we use the AIM to warm-up A-links in the pretraining process; during the training of action recognition and pose prediction, the A-links are further optimized by forward-passing the encoder of AIM only.

Structural Links (S-links)

With the L-order polynomial, we define the structural graph convolution (SGC), which can directly reach the L-hop neighbors to increase the receptive field. The SGC is formulated as

where M and W are the trainable weights to capture edge weights and feature importance.

Actional-Structural Graph Convolution Block

AS-GCN (Backbone network)

Multitasking of AS-GCN

1、Action recognition head. To classify actions, we construct a recognition head following the backbone network. We apply the global averaging pooling on the joint and temporal dimensions of the feature maps output by the backbone network, and obtain the feature vector, which is finally fed into a softmax classifier to obtain the predicted class-label. The loss function for action recognition is the standard cross entropy loss

2、Future pose prediction head. To predict future poses, we construct a prediction module followed by the backbone network. We use several AS-GCN blocks to decode the high-level feature maps extracted from the historical data and obtain the predicted future 3D joint positions.