行为识别论文笔记(一）

最新推荐文章于 2024-06-19 13:16:41 发布

狒狒非废

最新推荐文章于 2024-06-19 13:16:41 发布

阅读量1k

点赞数

文章标签： action recognition AI paper

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_34145804/article/details/83031529

版权

一、Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection（IEEE2018)

关键词：LSTM\spatio-temporal\attention model\action recognition&detection

针对问题：将RGB video-based中曾用过的attention model移植到skeleton-based data中

相关工作：

RGB-based methods:易失3D信息

skeleton-based methods:

Kinect，易失表面信息

actionlet线性组合、加入成对关节距离（低程度的非线性）、连接矩阵中加入稀疏约束、adaboost选取关键帧

论文工作：

action recognition:

空间attention model:joint-selection gates----分配joints重要性----作用于输入端

时间attention model:frame-selection gates----分配frame的attention----作用于输出端

以上两种gates模仿了LSTM中的sigmond函数构成的门

时空域特征结合：1 空间和时间和主网络三个网络结合

2 损失函数的正则化中体现了时空模型的两个参数

action detection:

与滑动窗口不同，选择action proposal based on temporal model 9个candidates选一

IOU

(NMS)非极大值抑制，candidates融合

去噪。

待读文献：

Real-time human pose recognition in parts from single depth images

Multi-region two-stream R-CNN for action detection

Neural machine translation by jointly learning to align and translate

二、Temporal Segment Networks: Towards Good Practices for Deep Action Recognition(

关键词：two-stream\long-range temporal structure\input modality

针对问题：

Two-stream 网络针对single-frame or short snippets难以学习时间跨度长的视频

数据量少，如何提取更多视频信息

论文工作：

数据输入形式

RGB image就是video中的某一帧；RGB difference是相邻两帧的差，可以用来表达动作信息；optical flow field和warped optical flow field是视频的光流信息。分 spatial和temporal两路输入。提取各种信息，解决数据量少的问题。

sparse temporal sampling：Figure1的最左边是一个Video，用V表示，将V分成K份（文中K采用3），用(S1,S2,…,Sk)表示。

Tk是一个snippet，是从(S1,S2,…,Sk)中对应的视频片段Sk中随机采样出来的结果，每个snippet包含一帧图像和两个optical flow特征图。

F(Tk;W)是该snippet属于每个类的得分，g文中采用的是均值函数，H是softmax函数，g和H之间对两个网络的结果进行融合（spatial：temporal=1:1.5；temporal stream 1 for optical flow and 0.5 for warped）。

K个spatial convnet的参数是共享的，K个temporal convnet的参数也是共享的。

cross-modality pre-training：光流输入归一化到0`255(模拟RGB格式），Imagenet预训练卷积网络，微调第一层。三通道权重取平均值。

partial BN:微调第一层BN

三、RPAN：An end-to-end recurrent pose-attention network for action recognition（2017CVPR）

关键词：end-to-end\pose attention\RNN

针对问题：video-level category RNN无法学习复杂的动作结构

论文工作：

特征生成：TSN网络作为CNN提取特征（basenet)

姿态注意：Pose-Attention Mechanism

第t个视频帧在不同空间位置上的特征向量

每个关节点在每个空间点(K1xK2的大小)上的打分

归一化

13个关节合并到5个part

关节特征合并到part特征

最后Max,Mean or Concat将part特征结合成整体的分类

RNN:LSTM

Loss Function: action分类和pose估计联合损失函数，端到端

M：Heat maps for all the joints(heatmaps是难点）

参考资料：

https://blog.csdn.net/u014380165/article/details/79029309

https://blog.csdn.net/neu_chenguangq/article/details/79164830

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
行为识别论文笔记(一）

一、Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection（IEEE2018)关键词：LSTM\spatio-temporal\attention model\action recognition&amp;detection针对问题：将RGB video-based中曾用过的...
复制链接

扫一扫

狒狒非废 CSDN认证博客专家 CSDN认证企业博客

码龄9年

3: 原创

76万+: 周排名

67万+: 总排名

4218: 访问

: 等级

57: 积分

4: 粉丝

1: 获赞

4: 评论

11: 收藏

私信

关注

热门文章

最新评论

Faster-RCNN的Keras实现
Miku_01_: 大佬想问一下from keras_frcnn.RoiPoolingConv import RoiPoolingConv这一句报红是什么原因啊？说是没有keras_frcnn这个库。
Faster-RCNN的Keras实现
云飞扬ali: 博主你好，请问为什么我在运行test_frcnn.py没有方框生成。
Faster-RCNN的Keras实现
zrx281731: 博主你好，如果改model_all.save_weights()函数为保存为模型加结构，save（）函数是改在原来model_all？还是model_classifier或者model_rpn？？，麻烦了
Faster-RCNN的Keras实现
sun_ching: 您好，请问classifier_min_overlap到底指的是什么呢

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。