[论文阅读]Spatial Temporal Graph Convolutional Networks for Skeleton-Based ActionRecognition

本文介绍了ST-GCN,一种用于基于人体骨架关键点的动作识别的时空图卷积网络模型。与传统GCN相比,ST-GCN能更好地捕捉关节之间的空间关系和时间序列信息,解决了平均操作无法建模相对位置变化的问题。通过定义不同的空间和时间划分策略,ST-GCN在动作识别任务上表现优越。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

一.概括

二.传统GCN与本文ST-GCN之间的区别

         三.实现ST-GCN的细节


一.概括

传统的a骨骼建模方法通常依靠手工制作部分或遍历规则,从而导致表达能力有限和泛化困难.过往方法能力的限制在于不能精确的提取关节之间的空间关系,而这些空间关系对于理解人类活动是重要的。

本文旨在开发一种有原则的、有效的动态骨骼建模方法,并将其用于动作识别。提出了一种新的 ST-GCN,即时空图卷积网络模型,用于解决基于人体骨架关键点的人类动作识别问题。通过自动从数据中自动学习空间和时间模式,超越了以往方法的局限性。

1.空间上:将骨架之间的关键点作为空间关系的输入(存在着不同点之间邻域大小不定的困难,考虑到用基于Graph的CNN网络模型);

2.时间上:使用视频数据,利用图片之间的时序关系。

如何构成时空图:

1. 在每一帧内部,按照人体的自然骨架连接关系构造空间图;

2. 在相邻两帧的相同关键点连接起来,构成时序边;

3. 所有输入帧中关键点构成节点集(node set),步骤 1、2 中的所有边构成边集(edge set),即构成所需的时空图。

### Skeleton-Based Action Recognition Research and Techniques In the field of skeleton-based action recognition, researchers have developed various methods to interpret human actions from skeletal data. These approaches leverage deep learning models that can effectively capture spatial-temporal features inherent in sequences of joint positions over time. One prominent technique involves utilizing recurrent neural networks (RNNs), particularly long short-term memory (LSTM) units or gated recurrent units (GRUs). Such architectures are adept at handling sequential information due to their ability to maintain a form of memory across timesteps[^1]. This characteristic makes them suitable for modeling temporal dependencies present within motion capture datasets. Convolutional Neural Networks (CNNs) also play an essential role when applied on graphs representing skeletons as nodes connected by edges denoting limb segments between joints. Graph Convolutional Networks (GCNs) extend traditional CNN operations onto non-Euclidean domains like point clouds or meshes formed around articulated bodies during movement execution phases[^2]. Furthermore, some studies integrate both RNN variants with GCN layers into hybrid frameworks designed specifically for this task domain; these combined structures aim to simultaneously exploit local appearance cues alongside global structural patterns exhibited throughout entire pose configurations captured frame-by-frame via sensors such as Microsoft Kinect devices or other depth cameras capable of tracking multiple individuals performing diverse activities indoors under varying lighting conditions without requiring any wearable markers attached directly onto participants' limbs/skin surfaces. ```python import torch.nn.functional as F from torch_geometric.nn import GCNConv class ST_GCN(torch.nn.Module): def __init__(self, num_features, hidden_channels, class_num): super(ST_GCN, self).__init__() self.conv1 = GCNConv(num_features, hidden_channels) self.fc1 = Linear(hidden_channels, class_num) def forward(self, x, edge_index): h = self.conv1(x, edge_index) h = F.relu(h) h = F.dropout(h, training=self.training) z = self.fc1(h) return F.log_softmax(z, dim=1) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值