ST-GCN动作识别讲解
本项目借助fsd-10花样滑冰数据集,然后参考了基于飞桨实现花样滑冰选手骨骼点动作识别大赛baseline,但是注意,我只是为了借助这个比赛的数据集来更好的结合代码讲解ST-GCN
1.项目独特价值:
本项目重点在于ST-GCN结合代码讲解,并且ST-GCN完整的从Paddlevideo抽取,避免了ST-GCN和PaddleVideo高耦合的问题,让大家更好理解关键代码,为了防止学习ST-GCN,还需要下载PaddleVideo,从此对这个行为说NO!!!
可直接一键运行感受,进行更好理解
2. 整体结构
ST-GCN是AAAI 2018中提出的经典的基于骨骼的行为识别模型,不仅为解决基于人体骨架关键点的人类动作识别问题提供了新颖的思路,在标准的动作识别数据集上也取得了较大的性能提升。算法整体框架如下图所示:
时空图卷积网络模型ST-GCN通过将图卷积网络(GCN)和时间卷积网络(TCN)结合起来,扩展到时空图模型,设计出了用于行为识别的骨骼点序列通用表示,该模型将人体骨骼表示为图,其中图的每个节点对应于人体的一个关节点。图中存在两种类型的边,即符合关节的自然连接的空间边(spatial edge)和在连续的时间步骤中连接相同关节的时间边(temporal edge)。在此基础上构建多层的时空图卷积,它允许信息沿着空间和时间两个维度进行整合。
ST-GCN的网络结构大致可以分为三个部分,首先,对网络输入一个五维矩阵 ( N , C , T , V ; M ) \left(N,C,T,V;M\right) (N,C,T,V;M).其中N为视频数据量;C为关节特征向量,包括 ( x , y , a c c ) \left(x,y,acc\right) (x,y,acc)(注意:在实际中我们输入的C就是x和y坐标,是没有置信度的!!!);T为视频中抽取的关键帧的数量;V表示关节的数量,在本项目中采用25个关节数量;M则是一个视频中的人数(实际M就是1),然后再对输入数据进行Batch Normalization批量归一化,接着,通过设计ST-GCN单元,引入ATT注意力模型并交替使用GCN图卷积网络和TCN时间卷积网络,对时间和空间维度进行变换,在这一过程中对关节的特征维度进行升维,对关键帧维度进行降维,最后,通过调用平均池化层、全连接层,并后接SoftMax层输出,对特征进行分类。
3. ATT部分
3.1 文字介绍
在运动过程中,不同的躯干重要性是不同的。例如腿的动作可能比脖子重要,通过腿部我们甚至能判断出跑步、走路和跳跃,但是脖子的动作中可能并不包含多少有效信息。因此,ST-GCN 对不同躯干进行了加权(每个 st-gcn 单元都有自己的权重参数用于训练)。
3.2 核心代码
'''
如果启用attention部分那么,每个st_gcn_block都会有一个针对于self.A的重要性权重,这个self.A就是一个矩阵,记录的是骨骼点连接信息
'''
if edge_importance_weighting:
self.edge_importance = nn.ParameterList([
self.create_parameter(
shape=self.A.shape,
default_initializer=nn.initializer.Constant(1))
for i in self.st_gcn_networks
])
for gcn, importance in zip(self.st_gcn_networks, self.edge_importance):
x, _ = gcn(x, paddle.multiply(self.A, importance))
4 GCN部分
图卷积网络(Graph Convolutional Network,GCN)借助图谱的理论来实现空间拓扑图上的卷积,提取出图的空间特征,具体来说,就是将人体骨骼点及其连接看作图,再使用图的邻接矩阵、度矩阵和拉普拉斯矩阵的特征值和特征向量来研究该图的性质。
在原论文中,作者提到他们使用了「Kipf, T. N., and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In ICLR 2017」中的GCN架构,其图卷积数学公式如下:
f o u t = Λ − 1 2 ( A + I ) Λ − 1 2 f i n W f_{out}=\Lambda^{-\ \frac{1}{2}}\left(A+I\right)\Lambda^{-\ \frac{1}{2}}f_{in}W fout=Λ− 21(A+I)Λ− 21finW
其中,
f
o
u
t
f_{out}
fout为输出,A为邻接矩阵,I为单位矩阵,
A
i
i
=
∑
j
(
A
i
j
+
I
i
j
)
A^{ii}=\ \sum_{j}{(A^{ij}+I^{ij})}
Aii= ∑j(Aij+Iij), W是需要学习的空间矩阵。
原文中,作者根据不同的动作划分为了三个子图
(
A
1
,
A
2
,
A
3
)
\left(A_1,A_2,A_3\right)
(A1,A2,A3),分别表达向心运动、离心运动和静止的动作特征。(看完这段话,你是不是有点懵逼,就是怎么就把骨骼点连接信息划分成三个子图了呢?这一步是怎么处理呢?我很懂你这个疑惑,于是我特意后面逐步将A进行一步步可视化解释,并且附上讲解,请保留这个疑惑继续往下看)
在实际的应用中,最简单的图卷积已经能达到很好的效果,所以实现中采用的是 D − 1 A D^{-1}A D−1A图卷积核(我把这步骤称之为归一化)(用来替换 Λ − 1 2 ( A + I ) Λ − 1 2 \Lambda^{-\ \frac{1}{2}}\left(A+I\right)\Lambda^{-\ \frac{1}{2}} Λ− 21(A+I)Λ− 21)。D为度矩阵。(请放心,后面每一步代码都有注释分块讲解,这个为def normalize_digraph(A)实现)
def zero(x):
return 0
def iden(x):
return x
def einsum(x, A):
"""paddle.einsum will be implemented in release/2.2.
self.kernel_size = 3
x.shape = (n, self.kernel_size, (kc // self.kernel_size) ==out_channels, t, v)
"""
x = x.transpose((0, 2, 3, 1, 4))
n, c, t, k, v = x.shape
k2, v2, w = A.shape
assert (k == k2 and v == v2), "Args of einsum not match!"
x = x.reshape((n, c, t, k * v))
A = A.reshape((k * v, w))
y = paddle.matmul(x, A)
return y
class ConvTemporalGraphical(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size, #kernel_size为3,分别与三个子图对应
t_kernel_size=1,
t_stride=1,
t_padding=0,
t_dilation=1):
super().__init__()
self.kernel_size = kernel_size
self.conv = nn.Conv2D(in_channels,
out_channels * kernel_size,
kernel_size=(t_kernel_size, 1),
padding=(t_padding, 0),
stride=(t_stride, 1),
dilation=(t_dilation, 1))
def forward(self, x, A):
'''
A.shape = [3,25,25],这里的A已经是归一化过的
'''
assert A.shape[0] == self.kernel_size
x = self.conv(x)
n, kc, t, v = x.shape #【n,out_channels * kernel_size,t,v】
x = x.reshape((n, self.kernel_size, kc // self.kernel_size, t, v))
x = einsum(x, A)
return x, A
class st_gcn_block(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size, ##【9,3】
stride=1,
dropout=0,
residual=True):
super(st_gcn_block, self).__init__()
assert len(kernel_size) == 2
assert kernel_size[0] % 2 == 1
padding = ((kernel_size[0] - 1) // 2, 0)
self.gcn = ConvTemporalGraphical(in_channels, out_channels,
kernel_size[1])
5. TCN部分
ST-GCN单元通过GCN学习空间中相邻关节的局部特征,而时序卷积网络(Temporal convolutional network,TCN)则用于学习时间中关节变化的局部特征。卷积核先完成一个节点在其所有帧上的卷积,再移动到下一个节点,如此便得到了骨骼点图在叠加下的时序特征。对于TCN网络,项目中通过使用 9 × 1 9\times1 9×1的卷积核进行实现。为了保持总体的特征量不变,当关节点特征向量维度©成倍变化时,步长取2,其余情况步长取1。
'''
TCN模型实现 ,kernel_size[0]即为9
TCN的input的Tensor的shape为[bs*人数,channels,帧数维度,骨骼节点数维度]
帧数维度对应着时间,骨骼节点数维度对应空间,所以TCN的kernel_size为(9,1),只涉及时间
'''
self.tcn = nn.Sequential(
nn.BatchNorm2D(out_channels),
nn.ReLU(),
nn.Conv2D(
out_channels,
out_channels,
(kernel_size[0], 1),
(stride, 1),
padding,
),
nn.BatchNorm2D(out_channels),
nn.Dropout(dropout),
)
# 导包
import paddle
import paddle.nn as nn
import paddle.nn.functional as F
import numpy as np
from weight_init import weight_init_
6. 模型组网架构(Backbone部分)
def zero(x):
return 0
def iden(x):
return x
def einsum(x, A):
"""paddle.einsum will be implemented in release/2.2.
self.kernel_size = 3
x.shape = (n, self.kernel_size, (kc // self.kernel_size) ==out_channels, t, v)
"""
x = x.transpose((0, 2, 3, 1, 4))
n, c, t, k, v = x.shape
k2, v2, w = A.shape
assert (k == k2 and v == v2), "Args of einsum not match!"
x = x.reshape((n, c, t, k * v))
A = A.reshape((k * v, w))
y = paddle.matmul(x, A)
return y
def get_hop_distance(num_node, edge, max_hop=1):
A = np.zeros((num_node, num_node))
for i, j in edge:
A[j, i] = 1
A[i, j] = 1
# compute hop steps
hop_dis = np.zeros((num_node, num_node)) + np.inf
transfer_mat = [np.linalg.matrix_power(A, d) for d in range(max_hop + 1)]
arrive_mat = (np.stack(transfer_mat) > 0)
for d in range(max_hop, -1, -1):
hop_dis[arrive_mat[d]] = d
return hop_dis
def normalize_digraph(A):
Dl = np.sum(A, 0)
num_node = A.shape[0]
Dn = np.zeros((num_node, num_node))
for i in range(num_node):
if Dl[i] > 0:
Dn[i, i] = Dl[i]**(-1)
AD = np.dot(A, Dn)
return AD
class Graph():
'''
记录数据集的骨骼点连接信息,就是如果还是基于骨骼点进行动作识别,不同数据集骨骼点标注不同,变动的就是这个部分
'''
def __init__(self,
layout='openpose',
strategy='uniform',
max_hop=1,
dilation=1):
self.max_hop = max_hop
self.dilation = dilation
self.get_edge(layout)
self.hop_dis = get_hop_distance(self.num_node,
self.edge,
max_hop=max_hop)
self.get_adjacency(strategy)
def __str__(self):
return self.A
def get_edge(self, layout):
# edge is a list of [child, parent] paris
if layout == 'fsd10':
self.num_node = 25
self_link = [(i, i) for i in range(self.num_node)]
neighbor_link = [(1, 8), (0, 1), (15, 0), (17, 15), (16, 0),
(18, 16), (5, 1), (6, 5), (7, 6), (2, 1), (3, 2),
(4, 3), (9, 8), (10, 9), (11, 10), (24, 11),
(22, 11), (23, 22), (12, 8), (13, 12), (14, 13),
(21, 14), (19, 14), (20, 19)]
self.edge = self_link + neighbor_link
self.center = 8
elif layout == 'ntu-rgb+d':
self.num_node = 25
self_link = [(i, i) for i in range(self.num_node)]
neighbor_1base = [(1, 2), (2, 21), (3, 21), (4, 3), (5, 21), (6, 5),
(7, 6), (8, 7), (9, 21), (10, 9), (11, 10),
(12, 11), (13, 1), (14, 13), (15, 14), (16, 15),
(17, 1), (18, 17), (19, 18), (20, 19), (22, 23),
(23, 8), (24, 25), (25, 12)]
neighbor_link = [(i - 1, j - 1) for (i, j) in neighbor_1base]
self.edge = self_link + neighbor_link
self.center = 21 - 1
else:
raise ValueError("Do Not Exist This Layout.")
def get_adjacency(self, strategy):
valid_hop = range(0, self.max_hop + 1, self.dilation)
adjacency = np.zeros((self.num_node, self.num_node))
for hop in valid_hop:
adjacency[self.hop_dis == hop] = 1
normalize_adjacency = normalize_digraph(adjacency)
if strategy == 'spatial':
A = []
for hop in valid_hop:
a_root = np.zeros((self.num_node, self.num_node))
a_close = np.zeros((self.num_node, self.num_node))
a_further = np.zeros((self.num_node, self.num_node))
for i in range(self.num_node):
for j in range(self.num_node):
if self.hop_dis[j, i] == hop:
if self.hop_dis[j, self.center] == self.hop_dis[
i, self.center]:
a_root[j, i] = normalize_adjacency[j, i]
elif self.hop_dis[j, self.center] > self.hop_dis[
i, self.center]:
a_close[j, i] = normalize_adjacency[j, i]
else:
a_further[j, i] = normalize_adjacency[j, i]
if hop == 0:
A.append(a_root)
else:
A.append(a_root + a_close)
A.append(a_further)
A = np.stack(A)
self.A = A
else:
raise ValueError("Do Not Exist This Strategy")
class ConvTemporalGraphical(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size,
t_kernel_size=1,
t_stride=1,
t_padding=0,
t_dilation=1):
super().__init__()
self.kernel_size = kernel_size
self.conv = nn.Conv2D(in_channels,
out_channels * kernel_size,
kernel_size=(t_kernel_size, 1),
padding=(t_padding, 0),
stride=(t_stride, 1),
dilation=(t_dilation, 1))
def forward(self, x, A):
assert A.shape[0] == self.kernel_size
x = self.conv(x)
n, kc, t, v = x.shape #【n,out_channels * kernel_size,t,v】
x = x.reshape((n, self.kernel_size, kc // self.kernel_size, t, v))
x = einsum(x, A)
return x, A
class st_gcn_block(nn.Layer):
def __init__(self,
in_channels,
out_channels,
kernel_size, ##【9,3】
stride=1,
dropout=0,
residual=True):
super(st_gcn_block, self).__init__()
assert len(kernel_size) == 2
assert kernel_size[0] % 2 == 1
padding = ((kernel_size[0] - 1) // 2, 0)
self.gcn = ConvTemporalGraphical(in_channels, out_channels,
kernel_size[1])
self.tcn = nn.Sequential(
nn.BatchNorm2D(out_channels),
nn.ReLU(),
nn.Conv2D(
out_channels,
out_channels,
(kernel_size[0], 1),
(stride, 1),
padding,
),
nn.BatchNorm2D(out_channels),
nn.Dropout(dropout),
)
if not residual:
self.residual = zero
elif (in_channels == out_channels) and (stride == 1):
self.residual = iden
else:
self.residual = nn.Sequential(
nn.Conv2D(in_channels,
out_channels,
kernel_size=1,
stride=(stride, 1)),
nn.BatchNorm2D(out_channels),
)
self.relu = nn.ReLU()
def forward(self, x, A):
res = self.residual(x)
x, A = self.gcn(x, A)
x = self.tcn(x) + res
return self.relu(x), A
class STGCN_BackBone(nn.Layer):
"""
ST-GCN model from:
`"Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition" <https://arxiv.org/abs/1801.07455>`_
Args:
in_channels: int, channels of vertex coordinate. 2 for (x,y), 3 for (x,y,z). Default 2.
edge_importance_weighting: bool, whether to use edge attention. Default True.
data_bn: bool, whether to use data BatchNorm. Default True.
"""
def __init__(self,
in_channels=2,
edge_importance_weighting=True,
data_bn=True,
layout='fsd10',
strategy='spatial',
**kwargs):
super(STGCN_BackBone, self).__init__()
self.data_bn = data_bn
# load graph
self.graph = Graph(
layout=layout,
strategy=strategy,
)
A = paddle.to_tensor(self.graph.A, dtype='float32')
# print("A.shape",A[0][0],A[1][0],A[2][0]) #[3,25,25]
self.register_buffer('A', A)
# build networks
spatial_kernel_size = A.shape[0]
temporal_kernel_size = 9
kernel_size = (temporal_kernel_size, spatial_kernel_size) #【9,3】
self.data_bn = nn.BatchNorm1D(in_channels *
A.shape[1]) if self.data_bn else iden
kwargs0 = {k: v for k, v in kwargs.items() if k != 'dropout'}
self.st_gcn_networks = nn.LayerList((
st_gcn_block(in_channels,
64,
kernel_size,
1,
residual=False,
**kwargs0),
st_gcn_block(64, 64, kernel_size, 1, **kwargs),
st_gcn_block(64, 64, kernel_size, 1, **kwargs),
st_gcn_block(64, 64, kernel_size, 1, **kwargs),
st_gcn_block(64, 128, kernel_size, 2, **kwargs),
st_gcn_block(128, 128, kernel_size, 1, **kwargs),
st_gcn_block(128, 128, kernel_size, 1, **kwargs),
st_gcn_block(128, 256, kernel_size, 2, **kwargs),
st_gcn_block(256, 256, kernel_size, 1, **kwargs),
st_gcn_block(256, 256, kernel_size, 1, **kwargs),
))
# initialize parameters for edge importance weighting
if edge_importance_weighting:
self.edge_importance = nn.ParameterList([
self.create_parameter(
shape=self.A.shape,
default_initializer=nn.initializer.Constant(1))
for i in self.st_gcn_networks
])
else:
self.edge_importance = [1] * len(self.st_gcn_networks)
self.pool = nn.AdaptiveAvgPool2D(output_size=(1, 1))
def init_weights(self):
"""Initiate the parameters.
"""
for layer in self.sublayers():
if isinstance(layer, nn.Conv2D):
weight_init_(layer, 'Normal', mean=0.0, std=0.02)
elif isinstance(layer, nn.BatchNorm2D):
weight_init_(layer, 'Normal', mean=1.0, std=0.02)
elif isinstance(layer, nn.BatchNorm1D):
weight_init_(layer, 'Normal', mean=1.0, std=0.02)
def forward(self, x):
# data normalization
# print("X input.shape",x.shape) #[16, 2, 350, 25, 1]
N, C, T, V, M = x.shape #[bs,XY,帧数,节点数,人数]
x = x.transpose((0, 4, 3, 1, 2)) # N, M, V, C, T
x = x.reshape((N * M, V * C, T)) #[bs*人数,节点数*XY,帧数]
if self.data_bn:
x.stop_gradient = False
x = self.data_bn(x)
x = x.reshape((N, M, V, C, T))#[bs,人数,节点数,XY,帧数]
x = x.transpose((0, 1, 3, 4, 2)) # N, M, C, T, V [bs,人数,XY,帧数,节点数]
x = x.reshape((N * M, C, T, V))#[bs*人数,XY,帧数,节点数]
# forward
for gcn, importance in zip(self.st_gcn_networks, self.edge_importance):
x, _ = gcn(x, paddle.multiply(self.A, importance))
x = self.pool(x) # NM,C,T,V --> NM,C,1,1
C = x.shape[1]
x = paddle.reshape(x, (N, M, C, 1, 1)).mean(axis=1) # N,C,1,1
return x
7. A的详细一步步可视化解释,下方代码块即为可视化代码
维度符号表示 | 维度值大小 | 维度含义 | 补充说明 |
---|---|---|---|
N | 样本数 | 代表N个样本 | 无 |
C | 3 | 分别代表每个关节点的x,y坐标和置信度 | 每个x,y均被放缩至-1到1之间 |
T | 1000 | 代表动作的持续时间长度,共有1000帧 | 有的动作的实际长度可能超过1000,于是我们就抽取其中1000帧,代码见autopadding |
V | 25 | 代表25个关节点 | 具体关节点的含义可看下方的骨架示例图 |
M | 1 | 代表1个运动员个数 | 无 |
骨架示例图(注意8号索引关键点为人体中心):

1. 第一步 hop_dis可视化
就是这个hop_dis记录着骨骼点的连接信息,对角线均为0表示(自身点与自身点相连),自身点与其他相邻骨骼点相连为1表示,其余都为inf.可以按列看,也可以按行看,因为是对称矩阵.
2. 第二步 normalize_digraph可视化
邻接矩阵加对角矩阵后归一化实现(A*(D^-1) 第n列代表第n个骨骼点所相连的相邻骨骼点(包括自己))
3. 第三步 A的可视化
我们需要把normalize_digraph分成三个子集,如何分成三个子集呢?第一个子集是按照对角矩阵(就是自己骨骼点和自己骨骼点,(i,i)),第二三子集就是把自己骨骼点和非自己骨骼点(j,i)按照与中心骨骼点(center = 8)的距离大小比较是hop_dis[j,center]>=hop_dis[i,center]还是hop_dis[j,center]<hop_dis[i,center]
。
子集1
子集2
子集3
neighbor_link = [(1, 8), (0, 1), (15, 0), (17, 15), (16, 0),
(18, 16), (5, 1), (6, 5), (7, 6), (2, 1), (3, 2),
(4, 3), (9, 8), (10, 9), (11, 10), (24, 11),
(22, 11), (23, 22), (12, 8), (13, 12), (14, 13),
(21, 14), (19, 14), (20, 19)]
self_link = [(i, i) for i in range(25)]
hop_dis = get_hop_distance(25,neighbor_link+self_link)
def float2str(x):
return str(float(x))
#查看hop
with open("hop_dis.txt","w") as f:
for i in hop_dis:
f.write(" ".join(list(map(float2str,list(i))))+"\n")
#---------------------------------------------------------------------------------------------------------------------------------------------------
def normalize_digraph(A):
'''
实现A*(D^-1) 第n列代表第n个骨骼点所相连的相邻骨骼点(包括自己)
'''
Dl = np.sum(A, 0)
num_node = A.shape[0]
Dn = np.zeros((num_node, num_node))
for i in range(num_node):
if Dl[i] > 0:
Dn[i, i] = Dl[i]**(-1)
AD = np.dot(A, Dn)
return AD
adjacency = np.zeros((25,25))
for hop in range(2):
adjacency[hop_dis == hop] = 1
normalize_adjacency = normalize_digraph(adjacency)
def float2str(x):
return '{:.2f}'.format(float(x))
#查看 normalize_adjacency
with open("normalize_adjacency.txt","w") as f:
for i in normalize_adjacency:
f.write(" ".join(list(map(float2str,list(i))))+"\n")
#---------------------------------------------------------------------------------------------------------------------------------------------------
center = 8
A = []
for hop in range(2):
a_root = np.zeros((25, 25))
a_close = np.zeros((25, 25))
a_further = np.zeros((25, 25))
for i in range(25):
for j in range(25):
if hop_dis[j, i] == hop:
if hop_dis[j, center] == hop_dis[
i, center]:
a_root[j, i] = normalize_adjacency[j, i]
elif hop_dis[j, center] > hop_dis[
i, center]:
a_close[j, i] = normalize_adjacency[j, i]
else:
a_further[j, i] = normalize_adjacency[j, i]
if hop == 0:
A.append(a_root)
else:
A.append(a_root + a_close)
A.append(a_further)
A = np.stack(A)
print("说明normalize_adjacency的信息已经全部在A中,只是分成了三个子集罢了",(A.sum(axis=0) == normalize_adjacency).all())
#查看 A
with open("A.txt","w") as f:
for j in A:
for i in j:
f.write(" ".join(list(map(float2str,list(i))))+"\n")
f.write("\n\n\n")
说明normalize_adjacency的信息已经全部在A中,只是分成了三个子集罢了 True
import paddle
x = paddle.randn([16,2,350,25,1])
STGCN_BackBone()(x).shape
W0711 16:20:30.751370 279 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 10.1
W0711 16:20:30.755715 279 gpu_context.cc:306] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
[16, 256, 1, 1]
8. 模型组网架构(Head部分)
class STGCNHead(nn.Layer):
"""
Head for ST-GCN model.
Args:
in_channels: int, input feature channels. Default: 256.
num_classes: int, number classes. Default: 10.
"""
def __init__(self, in_channels=256, num_classes=10, **kwargs):
super().__init__()
self.fcn = nn.Conv2D(in_channels=in_channels,
out_channels=num_classes,
kernel_size=1)
def init_weights(self):
"""Initiate the parameters.
"""
for layer in self.sublayers():
if isinstance(layer, nn.Conv2D):
weight_init_(layer, 'Normal', std=0.02)
def forward(self, x):
"""Define how the head is going to run.
"""
x = self.fcn(x)
x = paddle.reshape_(x, (x.shape[0], -1)) # N,C,1,1 --> N,C
return x
9. 模型组网架构(framework部分)
class STGCN_framework(nn.Layer):
def __init__(self,num_classes = 30):
super().__init__()
self.backbone = STGCN_BackBone()
self.head = STGCNHead(num_classes = num_classes)
def forward(self,data):
feature = self.backbone(data)
cls_score = self.head(feature)
return cls_score
x = paddle.randn([16,2,350,25,1])
STGCN_framework()(x).shape
[16, 30]
10. 数据Dataset预处理
'''
查看数据集数据
'''
import numpy as np
np.load("/home/aistudio/data/data104925/train_data.npy")[0].shape
(3, 2500, 25, 1)
import numpy as np
class AutoPadding(object):
"""
为了处理帧数问题,window_size为最后你想要的帧数,如果现有帧数小于window_size,就在现有帧前后补0,如果是现有帧数大于window_size,就在现有帧上抽取
Sample or Padding frame skeleton feature.
Args:
window_size: int, temporal size of skeleton feature.
random_pad: bool, whether do random padding when frame length < window size. Default: False.
"""
def __init__(self, window_size, random_pad=False):
self.window_size = window_size
self.random_pad = random_pad
def get_frame_num(self, data):
C, T, V, M = data.shape
for i in range(T - 1, -1, -1):
tmp = np.sum(data[:, i, :, :])
if tmp > 0:
T = i + 1
break
return T
def __call__(self, results):
# data = results['data']
data = results
C, T, V, M = data.shape
T = self.get_frame_num(data)
if T == self.window_size:
data_pad = data[:, :self.window_size, :, :]
elif T < self.window_size:
begin = random.randint(0, self.window_size -
T) if self.random_pad else 0
data_pad = np.zeros((C, self.window_size, V, M))
data_pad[:, begin:begin + T, :, :] = data[:, :T, :, :]
else:
if self.random_pad:
index = np.random.choice(T, self.window_size,
replace=False).astype('int64')
else:
index = np.linspace(0, T-1, self.window_size).astype("int64")
data_pad = data[:, index, :, :]
# results['data'] = data_pad
# return results
return data_pad
label = np.load("/home/aistudio/data/data104925/train_label.npy")
print(label.shape)
#查看全部label分布
with open("preds.txt","w") as f:
for i in label:
f.write(str(int(i))+"\n")
(2922,)
#每7个取一个当验证集
train_index = []
valid_index= []
for i in range(2922):
if i%7 !=1:
train_index.append(i)
else:
valid_index.append(i)
train_index =np.array(train_index).astype("int64")
valid_index = np.array(valid_index).astype("int64")
import paddle
import numpy as np
import paddle.nn.functional as F
from visualdl import LogWriter
from tqdm import tqdm
log_writer = LogWriter("./log/gnet")
class Dataset(paddle.io.Dataset):
def __init__(self,is_train = True):
data = np.load("/home/aistudio/data/data104925/train_data.npy").astype("float32") #[2922, 3, 2500, 25, 1]
label = np.load("/home/aistudio/data/data104925/train_label.npy")
self.autopad = AutoPadding(window_size= 1000)
self.train_data = data[train_index,:,:,:,:]
self.valid_data = data[valid_index,:,:,:,:]
self.train_label = label[train_index]
self.valid_label = label[valid_index]
self.is_train = is_train
if self.is_train == True:
self.size = len(self.train_data)
else:
self.size = len(self.valid_data)
def __getitem__(self, index):
if self.is_train == True:
one_row = self.train_data[index]
one_label = self.train_label[index]
else:
one_row = self.valid_data[index]
one_label = self.valid_label[index]
one_row = one_row[:2, :, :, :]#舍弃置信度
one_row = self.autopad(one_row).astype("float32")
return one_row,one_label
def __len__(self):
return self.size
print(len(Dataset()))
2504
for i in Dataset():
print(i[0].dtype,i[1])
break
float32 27
BATCH_SIZE =64
train_dataset = Dataset()
data_loader = paddle.io.DataLoader(train_dataset,batch_size=BATCH_SIZE,shuffle =True,drop_last=True)
for data in data_loader:
print(data[0].shape,data[1].shape)
break
[64, 2, 1000, 25, 1] [64]
11.测试acc
我为0.52
def valid_accurary(valid_loader,classifer_net):
with paddle.set_grad_enabled(False):
acc_all = 0
num = 0
for one in valid_loader:
img_data,cls=one
# print()
out = classifer_net(img_data)
# print(out.shape)
# out = nn.Softmax()(out)
# out = paddle.multinomial(out, num_samples=1, replacement=False, name=None)
acc = paddle.metric.accuracy(out,cls.unsqueeze(1))
acc_all+=acc.numpy()[0]
num+=1
# if out[0] == cls:
# right +=1
# print("right",right)
return acc_all/num
##STGCN_fsd.pdparams这是官方github我下载的参数文件
#https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md
# stgcn = STGCN_framework()
# stgcn.set_state_dict(paddle.load("STGCN_fsd.pdparams"))
# valid_dataset = Dataset(is_train=False)
# valid_loader = paddle.io.DataLoader(valid_dataset,batch_size=64,shuffle =True,drop_last=True)
# valid_accurary(valid_loader,stgcn)# 0.09
'''
下方为我训练的模型在验证集上的acc
'''
stgcn = STGCN_framework()
stgcn.set_state_dict(paddle.load("model/Gmodel_state0.47607655502392343.pdparams"))
valid_dataset = Dataset(is_train=False)
valid_loader = paddle.io.DataLoader(valid_dataset,batch_size=64,shuffle =True,drop_last=True)
valid_accurary(valid_loader,stgcn)# 0.52
0.5208333333333334
12.自己进行训练
就是因为花样滑冰训练的top1 acc准确率是 把每一次训练时候的模型输出和label进行一个计算,然后我拿官网提供的参数文件在我自己验证集上只有0.1不到,我有点不理解.
import paddle.nn as nn
stgcn = STGCN_framework()
stgcn.set_state_dict(paddle.load("model/Gmodel_state0.47607655502392343.pdparams"))
crossEntropyLoss =nn.CrossEntropyLoss()
scheduler_G = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate =0.05, T_max =60, eta_min=0, last_epoch=- 1, verbose=False)
optimizer = paddle.optimizer.Momentum(learning_rate=scheduler_G, momentum=0.9, parameters=stgcn.parameters(), use_nesterov=False, weight_decay=1e-4, grad_clip=None, name=None)
import os
epoches =100
i = 0
v_acc_max = 0
for epoch in range(epoches):
print("epoch",epoch)
for data in tqdm(data_loader):
one_data,cls=data
out = stgcn(one_data)
optimizer.clear_grad()
loss = crossEntropyLoss(out,cls)
loss.backward()
optimizer.step()
log_writer.add_scalar(tag='train/loss', step=i, value=loss.numpy()[0])
if i%100 == 3:
print("loss",loss.numpy()[0],v_acc_max)
i+=1
# break
if epoch%2 == 0:
stgcn.eval()
v_acc = valid_accurary(valid_loader,stgcn)
stgcn.train()
print("epoch loss",loss.numpy()[0],v_acc)
log_writer.add_scalar(tag='train/v_acc', step=i, value=v_acc)
if v_acc > v_acc_max:
v_acc_max = v_acc
save_param_path_model = os.path.join("model", 'Gmodel_state'+str(v_acc_max)+'.pdparams')
paddle.save(stgcn.state_dict(), save_param_path_model)
scheduler_G.step()
# break
项目仅为搬运,原作地址:https://aistudio.baidu.com/aistudio/projectdetail/4224807