Pytorch GRU网络前向传递/Python实现(可运行)
一、背景
对于训练好的神经网络网络模型,实际使用时,只需要进行前向传递的计算过程即可,而不需要考虑反向传播过程。对于一些Hybrid模型如rnnoise降噪算法来说,为了将算法落地,需要在一些低功耗设备上进行神经网络的运算,这时候往往需要使用C语言。本文是个人的笔记,将简单介绍如何将GRU网络部署在Python语言上,进而拓展至C语言上(后续工作)。
二、Pytorch中的GRU网络信息
GRU网络的具体原理本文不再赘述,下面看看pytorch中GRU网络的计算过程:
r
t
=
σ
(
W
i
r
x
t
+
b
i
r
+
W
h
r
h
(
t
−
1
)
+
b
h
r
)
z
t
=
σ
(
W
i
z
x
t
+
b
i
z
+
W
h
z
h
(
t
−
1
)
+
b
h
z
r
t
=
t
a
n
h
(
W
i
n
x
t
+
b
i
n
+
r
t
∗
(
W
h
n
h
(
t
−
1
)
+
b
h
r
)
)
h
t
=
(
1
−
z
t
)
∗
n
t
+
z
t
∗
h
(
t
−
1
)
r_t=\sigma(W_{ir}x_t+b_{ir}+W_{hr}h_{(t-1)}+b_{hr})\\ z_t=\sigma(W_{iz}x_t+b_{iz}+W_{hz}h_{(t-1)}+b_{hz}\\ r_t=tanh(W_{in}x_t+b_{in}+r_t*(W_{hn}h_{(t-1)}+b_{hr}))\\ h_t=(1-z_t)*n_t+z_t*h_{(t-1)}
rt=σ(Wirxt+bir+Whrh(t−1)+bhr)zt=σ(Wizxt+biz+Whzh(t−1)+bhzrt=tanh(Winxt+bin+rt∗(Whnh(t−1)+bhr))ht=(1−zt)∗nt+zt∗h(t−1)
上面式子中
σ
\sigma
σ便是sigmoid函数,
x
t
x_t
xt表示发当前时刻的输入,
h
t
−
1
h_{t-1}
ht−1表示上一时刻的隐状态。同时可以看出,共有6个权重项
W
W
W与六个bias项
b
b
b。Pytorch对参数信息的介绍如下:
- ~GRU.weight_ih_l[k] – the learnable input-hidden weights of the k t h k^{th} kth layer (W_ir|W_iz|W_in), of shape (3hidden_size, input_size) for k = 0. Otherwise, the shape is (3hidden_size, num_directions * hidden_size)
- ~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the k t h k^{th} kth layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size)
- ~GRU.bias_ih_l[k] – the learnable input-hidden bias of the k t h k^{th} kth layer (b_ir|b_iz|b_in), of shape (3*hidden_size)
- ~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the k t h k^{th} kth layer (b_hr|b_hz|b_hn), of shape (3*hidden_size)
只考虑单层,可以看出,对于每一层GRU网络,参数都由weight_ih, weight_hh, bias_ih和bias_hh四部分组成。以weight_ih和bias_ih为例:weight_ih是一个(3hidden_size, input_size)形状的矩阵,它由W_ir, W_iz, W_in这三个(input_size, hidden_size)形状的矩阵在dim0方向叠加而成。bias_ih是一个(3hidden_size, 1)形状的矩阵,它由b_ir, b_iz, b_in这三个(hidden_size, 1)形状的矩阵在dim0方向叠加而成。 x t x_t xt作为输入其形状为(input_size, 1),因此 W i r x t + b i r W_{ir}x_t+b_{ir} Wirxt+bir最终是一个(hidden_size, 1)的矩阵。类似的道理: W h r h ( t − 1 ) + b h r W_{hr}h_{(t-1)}+b_{hr} Whrh(t−1)+bhr的结果也是一个(hidden_size, 1)的矩阵,后面计算 z t z_t zt和 r t r_t rt情况完全相同。
三、代码演示
-
网络参数提取
下面的代码构建了一个input_size=10, hidden_size=5的单层GRU网络(batch_first=True),初始参数随机,我们对网络参数进行输出:
import torch from torch import nn import torch.nn.functional as F class GRUtest(nn.Module): def __init__(self, input, hidden, act): super().__init__() self.gru = nn.GRU(input, hidden, batch_first=True) if act == 'sigmoid': # 激活函数后面未使用 self.act = nn.Sigmoid() elif act == 'tanh': self.act = nn.Tanh() elif act == 'relu': self.act = nn.ReLU() def forward(self, x): self.gru.flatten_parameters() gru_out, gru_state = self.gru(x) return gru_out, gru_state if __name__ == '__main__': insize = 10 hsize = 5 net1 = GRUtest(insize, hsize, 'tanh') for name, parameters in net1.named_parameters(): print(name) print(parameters)
运行结果如下:
gru.weight_ih_l0 Parameter containing: tensor([[-0.2723, 0.3715, 0.2461, 0.1564, -0.3429, 0.3451, 0.1402, 0.3094, -0.1759, 0.0948], ... [-0.2211, -0.3684, 0.1786, -0.0130, -0.0834, -0.0744, -0.3496, 0.1268, 0.0111, -0.3086]], requires_grad=True) gru.weight_hh_l0 Parameter containing: tensor([[ 0.1683, -0.0090, -0.4325, 0.2406, 0.2392], ... [ 0.1703, 0.3895, 0.1127, -0.1311, 0.1465], [-0.0391, -0.3496, -0.1727, 0.2034, 0.0147]], requires_grad=True) gru.bias_ih_l0 Parameter containing: tensor([ 0.1650, -0.2618, 0.4228, -0.1866, 0.0954, -0.2185, -0.2157, 0.2003, -0.1248, -0.2836, -0.1828, 0.3261, 0.2692, 0.2722, -0.3817], requires_grad=True) gru.bias_hh_l0 Parameter containing: tensor([ 0.2106, 0.1117, -0.3007, 0.0141, 0.0894, -0.2416, -0.1887, 0.3648, -0.0361, -0.0047, -0.2830, -0.2674, 0.4117, 0.1664, -0.0708], requires_grad=True)
可以看出输出恰好就是四个矩阵,分别对应上面提到的weight_ih, weight_hh, bias_ih, bias_hh
-
前向计算python代码
为了验证计算结果,我们首先将一个随机的生成的GRU网络的参数输出并保存下来,接着使用pytorch自带的load函数加载模型、利用输出的参数自己写前向函数,比较这两种方法的结果。有一点需要注意:GRU没有输出门,也即对于某一层GRU网络而言,当 x t x_t xt进入网络后,经过一系列计算,隐状态 h t − 1 h_{t-1} ht−1被更新为 h t h_t ht, h t h_t ht就是这一层的输出,将每一时刻的 h h h拼接在一起就是GRU网络的总输出。代码如下:
import torch from torch import nn import numpy as np import torch.nn.functional as F weight_ih = torch.tensor([[ 0.3162, 0.0833, 0.1223, 0.4317, -0.2017, 0.1417, -0.1990, 0.3196, 0.3572, -0.4123], [ 0.3818, 0.2136, 0.1949, 0.1841, 0.3718, -0.0590, -0.3782, -0.1283, -0.3150, 0.0296], [-0.0835, -0.2399, -0.0407, 0.4237, -0.0353, 0.0142, -0.0697, 0.0703, 0.3985, 0.2735], [ 0.1587, 0.0972, 0.1054, 0.1728, -0.0578, -0.4156, -0.2766, 0.3817, 0.0267, -0.3623], [ 0.0705, 0.3695, -0.4226, -0.3011, -0.1781, 0.0180, -0.1043, -0.0491, -0.4360, 0.2094], [ 0.3925, 0.2734, -0.3167, -0.3605, 0.1857, 0.0100, 0.1833, -0.4370, -0.0267, 0.3154], [ 0.2075, 0.0163, 0.0879, -0.0423, -0.2459, -0.1690, -0.2723, 0.3715, 0.2461, 0.1564], [-0.3429, 0.3451, 0.1402, 0.3094, -0.1759, 0.0948, 0.4367, 0.3008, 0.3587, -0.0939], [ 0.3407, -0.3503, 0.0387, -0.2518, -0.1043, -0.1145, 0.0335, 0.4070, 0.2214, -0.0019], [ 0.3175, -0.2292, 0.2305, -0.0415, -0.0778, 0.0524, -0.3426, 0.0517, 0.1504, 0.3823], [-0.1392, 0.1610, 0.4470, -0.1918, 0.4251, -0.2220, 0.1971, 0.1752, 0.1249, 0.3537], [-0.1807, 0.1175, 0.0025, -0.3364, -0.1086, -0.2987, 0.1977, 0.0402, 0.0438, -0.1357], [ 0.0022, -0.1391, 0.1285, 0.4343, 0.0677, -0.1981, -0.2732, 0.0342, -0.3318, -0.3361], [-0.2911, -0.1519, 0.0331, 0.3080, 0.1732, 0.3426, -0.2808, 0.0377, -0.3975, 0.2565], [ 0.0932, 0.4326, -0.3181, 0.3586, 0.3775, 0.3616, 0.0638, 0.4066, 0.2987, 0.3337]]) weight_hh = torch.tensor([[-0.0291, -0.3432, -0.0056, 0.0839, -0.3046], [-0.2565, -0.4288, -0.1568, 0.3896, 0.0765], [-0.0273, 0.0180, 0.2789, -0.3949, -0.3451], [-0.1487, -0.2574, 0.2307, 0.3160, -0.4339], [-0.3795, -0.4355, 0.1687, 0.3599, -0.3467], [-0.2070, 0.1423, -0.2920, 0.3799, 0.1043], [-0.1245, 0.0290, 0.1394, -0.1581, -0.3465], [ 0.0030, 0.0081, 0.0090, -0.0653, 0.2871], [-0.1248, -0.0433, 0.1839, -0.2815, 0.1197], [-0.0989, 0.2145, -0.2426, 0.0165, 0.0438], [-0.3598, -0.3252, 0.1715, -0.1302, 0.2656], [-0.4418, -0.2211, -0.3684, 0.1786, -0.0130], [-0.0834, -0.0744, -0.3496, 0.1268, 0.0111], [-0.3086, 0.1683, -0.0090, -0.4325, 0.2406], [ 0.2392, -0.0843, -0.3088, 0.0180, 0.3375]]) bias_ih = torch.tensor([ 0.4094, -0.3376, -0.2020, 0.3482, 0.2186, 0.2768, -0.2226, 0.3853, -0.3676, -0.0215, 0.0093, 0.0751, -0.3375, 0.4103, 0.4395]) bias_hh = torch.tensor([-0.3088, 0.0165, -0.2382, 0.4288, 0.2494, 0.2634, 0.1443, -0.0445, 0.2518, 0.0076, -0.1631, 0.2309, 0.1403, -0.1159, -0.1226]) class GRUtest(nn.Module): # pytorch中的gru def __init__(self, input, hidden, act): super().__init__() self.gru = nn.GRU(input, hidden, batch_first=True) if act == 'sigmoid': self.act = nn.Sigmoid() elif act == 'tanh': self.act = nn.Tanh() elif act == 'relu': self.act = nn.ReLU() def forward(self, x): self.gru.flatten_parameters() gru_out, gru_state = self.gru(x) return gru_out, gru_state class GRULayer: def __init__(self, input_size, hidden_size, act): self.bias_ih = bias_ih.reshape(-1) self.bias_hh = bias_hh.reshape(-1) self.weight_ih = weight_ih.reshape(-1) self.weight_hh = weight_hh.reshape(-1) self.nb_input = input_size self.nb_neurons = hidden_size self.activation = act def compute_gru(gru, state, input): M = gru.nb_input N = gru.nb_neurons r = torch.zeros(N) z = torch.zeros(N) n = torch.zeros(N) h_new = torch.zeros(N) for i in range(N): sum = gru.bias_ih[0*N + i] + gru.bias_hh[0*N + i] for j in range(M): sum += input[j] * gru.weight_ih[0*M*N + i*M + j] for j in range(N): sum += state[j] * gru.weight_hh[0*N*N + i*N + j] r[i] = torch.sigmoid(sum) for i in range(N): sum = gru.bias_ih[1*N+i] + gru.bias_hh[1*N+i] for j in range(M): sum += input[j] * gru.weight_ih[1*M*N + i*M + j] for j in range(N): sum += state[j] * gru.weight_hh[1*N*N + i*N + j] z[i] = torch.sigmoid(sum) for i in range(N): sum = 0 sum += gru.bias_ih[2*N+i] tmp = 0 for j in range(M): sum += input[j] * gru.weight_ih[2*M*N + i*M + j] for j in range(N): tmp += state[j] * gru.weight_hh[2*N*N + i*N + j] sum += r[i]*(tmp + gru.bias_hh[2*N+i]) n[i] = torch.tanh(sum) for i in range(N): h_new[i] = (1 - z[i]) * n[i] + z[i] * state[i] state[i] = h_new[i] b = torch.randn((1, 5, 10)) if __name__ == '__main__': insize = 10 hsize = 5 net1 = GRUtest(insize, hsize, 'tanh') model_ckpt1 = torch.load('./nn_test.pkl') #根据路径需要及进行修改 net1.load_state_dict(model_ckpt1.state_dict()) gru = GRULayer(insize, hsize, 'tanh') # 自己写的gru类,包含gru参数 out = torch.zeros((5, 5)) #用以保存计算结果 state = torch.zeros(5) #用来储存gidden_state的变量,初始化为0 for i in range(5): input = b[0][i] compute_gru(gru, state, input) out[i] = state print("自己实现前向计算结果:") print(out) print("pytorch实现前向计算结果:") torch_out, _ = net1(b) print(torch_out)
计算结果为:
自己实现前向计算结果: tensor([[-0.1810, 0.1028, -0.2076, -0.0975, 0.1328], [-0.2521, -0.4217, 0.1996, 0.4948, 0.2553], [-0.1471, 0.2741, 0.0375, -0.1926, -0.1080], [-0.7646, 0.0691, -0.1276, 0.0147, -0.0271], [-0.6323, 0.1059, 0.0936, 0.1193, -0.2436]]) pytorch实现前向计算结果: tensor([[[-0.1810, 0.1028, -0.2076, -0.0975, 0.1328], [-0.2522, -0.4217, 0.1996, 0.4948, 0.2553], [-0.1471, 0.2741, 0.0375, -0.1926, -0.1079], [-0.7646, 0.0691, -0.1276, 0.0147, -0.0271], [-0.6323, 0.1059, 0.0937, 0.1193, -0.2436]]], grad_fn=<TransposeBackward1>)
可以看出,结果十分一致
四、拓展到C语言
按照上述python代码容易写出。本人代码正在整理,待更新
五、补充说明
模型nn_test.pkl见百度网盘:https://pan.baidu.com/s/1wu-i_1X1YuDJygcxPsKi2w 提取码:razn