论文阅读：An end-to-end spatio-temporal attention model for human action recognition from skeleton data

最新推荐文章于 2024-01-05 09:30:00 发布

小吴同学真棒

最新推荐文章于 2024-01-05 09:30:00 发布

阅读量1.1k

点赞数 3

分类专栏：人工智能学习文章标签：骨架点动作识别 RNN LSTM Attention 骨架点

本文链接：https://blog.csdn.net/qq_36627158/article/details/116208940

版权

学习同时被 2 个专栏收录

115 篇文章

订阅专栏

人工智能

72 篇文章

订阅专栏

该论文提出了一种端到端的时空注意力模型，用于从骨架数据中识别人类动作。模型由主LSTM网络、空间注意力子网和时间注意力子网组成。空间注意力子网利用LSTM学习关节间的关系，强调关键关节；时间注意力子网同样通过LSTM学习帧间关系，突出重要视频帧。此外，采用交替联合训练策略并设计了正则化损失函数以防止过拟合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

创新点（Main Contributions）

作者提出了一种使用注意力机制去学习骨架点数据时间-空间特征的框架，来做动作识别的任务。

整个框架是由三部分组成，主 LSTM 网络、空间维度上的注意力子网 和 时间维度上的注意力子网。

其中，在空间维度上的注意力子网中，作者使用其中的 LSTM 网络来学习当前帧节点和之前的帧节点之间的关系，形成对当前输入帧关节点数据的 attention map，自动挖掘出当前帧数据里哪些骨架点对动作识别的影响最大；

在时间维度上的注意力子网中，作者使用其中的 LSTM 网络来学习当前帧和之前的帧之间的关系，形成对当前输入帧数据的 attention map，自动学习哪些视频帧对动作识别的贡献最大。

此外，作者采用一种交替的联合训练方式来训练网络，并设计了一个正则化的损失函数来防止模型训练得过拟合。

Proposed Method

Spatial Attention

在每个时间戳（time step） t，输入为：

the scoresfor indicating the importance of the K joints, and they are jointly obtained as

For the k th joint, the activation as the joint-selection gate is computed as:

Instead of assigning equal degrees of importance to all the joints $x_t$ , the input to the main LSTM network is modulated to

Temperal Attention

The activation as the frame-selection gate can be computed as

For the sequence level classification, based on the output $z_t$ of the main LSTM network and the temporal attention value $\beta _t$ at each time step t.