【论文笔记】Group-Skeleton-Based Human Action Recognition in Complex Events

GS-GCN算法提出了一种新的方法,用于复杂事件中的动作识别,考虑了不同人之间的潜在行为关系。通过MS-G3D提取多人骨骼特征,并结合MLP将距离值嵌入特征中,以增强动作关系的表示。使用焦点损失进行训练以解决类别不平衡问题,提高识别准确性。
摘要由CSDN通过智能技术生成

Group-Skeleton-Based Human Action Recognition in Complex Events


一种新的基于GCN的算法GS-GCN,用于复杂事件中的动作识别,针对复杂事件挑战中大规模以人为中心的视频分析的解决方案。与仅考虑单个人的行为的常规方法不同,所提出的方法研究了不同人之间的潜在行为关系。使用多个MS-G3D从多个人中同时提取骨骼特征。由于近距离的人可以具有更强的动作关系,因此使用MLP将距离值嵌入到提取的特征中。经过特征融合步骤后,对焦点损失进行训练,以对不同的动作进行分类。第一个将群组骨架数据与GCN结合起来以进行动作识别。

现有skeleton-based忽视了不同人之间潜在的动作关系,而一个人的动作很可能受到另一个人的影响。

  1. group-skeleton-based:利用MS-G3D提取多人骨骼特征。除了传统关键点坐标外,文章也把关键点速度值输入到网络中以获得更好的性能。
  2. 用多层感知器MLP将参考人与其他人之间的距离值嵌入提取的特征中。
  3. 所有特征被送入到另一个MS-G3D来进行特征融合分类。

为了避免分类不平衡问题,网络进行有焦点损失的训练。

RGB图像提取空间特征在其他帧中可能丢失细节;直接使用RGB帧可能会引入来自不同背景和任务外观的干扰,这些对于动作分类来说可能是噪声。

人运动较大时光流序列容易受到遮挡问题影响。

skeleton-based:GCN可以有效获取不规则的骨架关键点并在时空域提取特征;没有考虑到视频中不同人物之间潜在动作关系。

在这里插入图片描述

首先检测视频中的任务并预测他们的姿势;然后将关键点位置和速度值输入MS-G3D提取特征,由于距离较近的人应该有较强的动作关系,还在提取的特征中嵌入了参照人与其他人之间的距离值。另一个MS-G3D融合所有功能。最后,通过全连通层输出分类结果。

  1. Action Recognition Using Group-Skeleton Data

    k k k个人的第 i i i个关键点的速度值 v i k v_i^k vik v i k ( t ) = p i k ( t ) − p i k ( t − d ) v_{i}^{k}(t)=p_{i}^{k}(t)-p_{i}^{k}(t-d) vik(t)=pik(t)pik(td),其中 p i k p_{i}^{k} pik是第 k k k个人的第 i i i个关键点的坐标, t t t表示帧索引, d d d表示计算关键点速度的帧间隔。参考人k=0。如果在较长时间间隔内某些关键点移回原始位置,d=3。将所有有效的 i i i, t t t p i k ( t ) p_i^k(t) pik

### Skeleton-Based Action Recognition Research and Techniques In the field of skeleton-based action recognition, researchers have developed various methods to interpret human actions from skeletal data. These approaches leverage deep learning models that can effectively capture spatial-temporal features inherent in sequences of joint positions over time. One prominent technique involves utilizing recurrent neural networks (RNNs), particularly long short-term memory (LSTM) units or gated recurrent units (GRUs). Such architectures are adept at handling sequential information due to their ability to maintain a form of memory across timesteps[^1]. This characteristic makes them suitable for modeling temporal dependencies present within motion capture datasets. Convolutional Neural Networks (CNNs) also play an essential role when applied on graphs representing skeletons as nodes connected by edges denoting limb segments between joints. Graph Convolutional Networks (GCNs) extend traditional CNN operations onto non-Euclidean domains like point clouds or meshes formed around articulated bodies during movement execution phases[^2]. Furthermore, some studies integrate both RNN variants with GCN layers into hybrid frameworks designed specifically for this task domain; these combined structures aim to simultaneously exploit local appearance cues alongside global structural patterns exhibited throughout entire pose configurations captured frame-by-frame via sensors such as Microsoft Kinect devices or other depth cameras capable of tracking multiple individuals performing diverse activities indoors under varying lighting conditions without requiring any wearable markers attached directly onto participants' limbs/skin surfaces. ```python import torch.nn.functional as F from torch_geometric.nn import GCNConv class ST_GCN(torch.nn.Module): def __init__(self, num_features, hidden_channels, class_num): super(ST_GCN, self).__init__() self.conv1 = GCNConv(num_features, hidden_channels) self.fc1 = Linear(hidden_channels, class_num) def forward(self, x, edge_index): h = self.conv1(x, edge_index) h = F.relu(h) h = F.dropout(h, training=self.training) z = self.fc1(h) return F.log_softmax(z, dim=1) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值