Pytorch GRU网络前向传递Python实现

Pytorch GRU网络前向传递/Python实现(可运行)

一、背景

​ 对于训练好的神经网络网络模型,实际使用时,只需要进行前向传递的计算过程即可,而不需要考虑反向传播过程。对于一些Hybrid模型如rnnoise降噪算法来说,为了将算法落地,需要在一些低功耗设备上进行神经网络的运算,这时候往往需要使用C语言。本文是个人的笔记,将简单介绍如何将GRU网络部署在Python语言上,进而拓展至C语言上(后续工作)。

二、Pytorch中的GRU网络信息

​ GRU网络的具体原理本文不再赘述,下面看看pytorch中GRU网络的计算过程:
r t = σ ( W i r x t + b i r + W h r h ( t − 1 ) + b h r ) z t = σ ( W i z x t + b i z + W h z h ( t − 1 ) + b h z r t = t a n h ( W i n x t + b i n + r t ∗ ( W h n h ( t − 1 ) + b h r ) ) h t = ( 1 − z t ) ∗ n t + z t ∗ h ( t − 1 ) r_t=\sigma(W_{ir}x_t+b_{ir}+W_{hr}h_{(t-1)}+b_{hr})\\ z_t=\sigma(W_{iz}x_t+b_{iz}+W_{hz}h_{(t-1)}+b_{hz}\\ r_t=tanh(W_{in}x_t+b_{in}+r_t*(W_{hn}h_{(t-1)}+b_{hr}))\\ h_t=(1-z_t)*n_t+z_t*h_{(t-1)} rt=σ(Wirxt+bir+Whrh(t1)+bhr)zt=σ(Wizxt+biz+Whzh(t1)+bhzrt=tanh(Winxt+bin+rt(Whnh(t1)+bhr))ht=(1zt)nt+zth(t1)
上面式子中 σ \sigma σ便是sigmoid函数, x t x_t xt表示发当前时刻的输入, h t − 1 h_{t-1} ht1表示上一时刻的隐状态。同时可以看出,共有6个权重项 W W W与六个bias项 b b b。Pytorch对参数信息的介绍如下:

  • ~GRU.weight_ih_l[k] – the learnable input-hidden weights of the k t h k^{th} kth layer (W_ir|W_iz|W_in), of shape (3hidden_size, input_size) for k = 0. Otherwise, the shape is (3hidden_size, num_directions * hidden_size)
  • ~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the k t h k^{th} kth layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size)
  • ~GRU.bias_ih_l[k] – the learnable input-hidden bias of the k t h k^{th} kth layer (b_ir|b_iz|b_in), of shape (3*hidden_size)
  • ~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the k t h k^{th} kth layer (b_hr|b_hz|b_hn), of shape (3*hidden_size)

只考虑单层,可以看出,对于每一层GRU网络,参数都由weight_ih, weight_hh, bias_ih和bias_hh四部分组成。以weight_ih和bias_ih为例:weight_ih是一个(3hidden_size, input_size)形状的矩阵,它由W_ir, W_iz, W_in这三个(input_size, hidden_size)形状的矩阵在dim0方向叠加而成。bias_ih是一个(3hidden_size, 1)形状的矩阵,它由b_ir, b_iz, b_in这三个(hidden_size, 1)形状的矩阵在dim0方向叠加而成。 x t x_t xt作为输入其形状为(input_size, 1),因此 W i r x t + b i r W_{ir}x_t+b_{ir} Wirxt+bir最终是一个(hidden_size, 1)的矩阵。类似的道理: W h r h ( t − 1 ) + b h r W_{hr}h_{(t-1)}+b_{hr} Whrh(t1)+bhr的结果也是一个(hidden_size, 1)的矩阵,后面计算 z t z_t zt r t r_t rt情况完全相同。

三、代码演示

  1. 网络参数提取

    下面的代码构建了一个input_size=10, hidden_size=5的单层GRU网络(batch_first=True),初始参数随机,我们对网络参数进行输出:

    import torch
    from torch import nn
    import torch.nn.functional as F
    
    
    class GRUtest(nn.Module):
        def __init__(self, input, hidden, act):
            super().__init__()
            self.gru = nn.GRU(input, hidden, batch_first=True)
            if act == 'sigmoid':             # 激活函数后面未使用
                self.act = nn.Sigmoid()
            elif act == 'tanh':
                self.act = nn.Tanh()
            elif act == 'relu':
                self.act = nn.ReLU()
    
        def forward(self, x):  
            self.gru.flatten_parameters()
            gru_out, gru_state = self.gru(x)   
            return gru_out, gru_state
        
    if __name__ == '__main__':
        insize = 10
        hsize = 5
        net1 = GRUtest(insize, hsize, 'tanh')
        for name, parameters in net1.named_parameters():
            print(name)
            print(parameters)
    

    运行结果如下:

    gru.weight_ih_l0
    Parameter containing:
    tensor([[-0.2723,  0.3715,  0.2461,  0.1564, -0.3429,  0.3451,  0.1402,  0.3094,
             -0.1759,  0.0948],
           ...
            [-0.2211, -0.3684,  0.1786, -0.0130, -0.0834, -0.0744, -0.3496,  0.1268,
              0.0111, -0.3086]], requires_grad=True)
    gru.weight_hh_l0
    Parameter containing:
    tensor([[ 0.1683, -0.0090, -0.4325,  0.2406,  0.2392],
            ...
            [ 0.1703,  0.3895,  0.1127, -0.1311,  0.1465],
            [-0.0391, -0.3496, -0.1727,  0.2034,  0.0147]], requires_grad=True)
    gru.bias_ih_l0
    Parameter containing:
    tensor([ 0.1650, -0.2618,  0.4228, -0.1866,  0.0954, -0.2185, -0.2157,  0.2003,
            -0.1248, -0.2836, -0.1828,  0.3261,  0.2692,  0.2722, -0.3817],
           requires_grad=True)
    gru.bias_hh_l0
    Parameter containing:
    tensor([ 0.2106,  0.1117, -0.3007,  0.0141,  0.0894, -0.2416, -0.1887,  0.3648,
            -0.0361, -0.0047, -0.2830, -0.2674,  0.4117,  0.1664, -0.0708],
           requires_grad=True)
    

    可以看出输出恰好就是四个矩阵,分别对应上面提到的weight_ih, weight_hh, bias_ih, bias_hh

  2. 前向计算python代码

    为了验证计算结果,我们首先将一个随机的生成的GRU网络的参数输出并保存下来,接着使用pytorch自带的load函数加载模型、利用输出的参数自己写前向函数,比较这两种方法的结果。有一点需要注意:GRU没有输出门,也即对于某一层GRU网络而言,当 x t x_t xt进入网络后,经过一系列计算,隐状态 h t − 1 h_{t-1} ht1被更新为 h t h_t ht h t h_t ht就是这一层的输出,将每一时刻的 h h h拼接在一起就是GRU网络的总输出。代码如下:

    import torch
    from torch import nn
    import numpy as np
    import torch.nn.functional as F
    
    
    weight_ih = torch.tensor([[ 0.3162,  0.0833,  0.1223,  0.4317, -0.2017,  0.1417, -0.1990,  0.3196,
              0.3572, -0.4123],
            [ 0.3818,  0.2136,  0.1949,  0.1841,  0.3718, -0.0590, -0.3782, -0.1283,
             -0.3150,  0.0296],
            [-0.0835, -0.2399, -0.0407,  0.4237, -0.0353,  0.0142, -0.0697,  0.0703,
              0.3985,  0.2735],
            [ 0.1587,  0.0972,  0.1054,  0.1728, -0.0578, -0.4156, -0.2766,  0.3817,
              0.0267, -0.3623],
            [ 0.0705,  0.3695, -0.4226, -0.3011, -0.1781,  0.0180, -0.1043, -0.0491,
             -0.4360,  0.2094],
            [ 0.3925,  0.2734, -0.3167, -0.3605,  0.1857,  0.0100,  0.1833, -0.4370,
             -0.0267,  0.3154],
            [ 0.2075,  0.0163,  0.0879, -0.0423, -0.2459, -0.1690, -0.2723,  0.3715,
              0.2461,  0.1564],
            [-0.3429,  0.3451,  0.1402,  0.3094, -0.1759,  0.0948,  0.4367,  0.3008,
              0.3587, -0.0939],
            [ 0.3407, -0.3503,  0.0387, -0.2518, -0.1043, -0.1145,  0.0335,  0.4070,
              0.2214, -0.0019],
            [ 0.3175, -0.2292,  0.2305, -0.0415, -0.0778,  0.0524, -0.3426,  0.0517,
              0.1504,  0.3823],
            [-0.1392,  0.1610,  0.4470, -0.1918,  0.4251, -0.2220,  0.1971,  0.1752,
              0.1249,  0.3537],
            [-0.1807,  0.1175,  0.0025, -0.3364, -0.1086, -0.2987,  0.1977,  0.0402,
              0.0438, -0.1357],
            [ 0.0022, -0.1391,  0.1285,  0.4343,  0.0677, -0.1981, -0.2732,  0.0342,
             -0.3318, -0.3361],
            [-0.2911, -0.1519,  0.0331,  0.3080,  0.1732,  0.3426, -0.2808,  0.0377,
             -0.3975,  0.2565],
            [ 0.0932,  0.4326, -0.3181,  0.3586,  0.3775,  0.3616,  0.0638,  0.4066,
              0.2987,  0.3337]])
    weight_hh = torch.tensor([[-0.0291, -0.3432, -0.0056,  0.0839, -0.3046],
            [-0.2565, -0.4288, -0.1568,  0.3896,  0.0765],
            [-0.0273,  0.0180,  0.2789, -0.3949, -0.3451],
            [-0.1487, -0.2574,  0.2307,  0.3160, -0.4339],
            [-0.3795, -0.4355,  0.1687,  0.3599, -0.3467],
            [-0.2070,  0.1423, -0.2920,  0.3799,  0.1043],
            [-0.1245,  0.0290,  0.1394, -0.1581, -0.3465],
            [ 0.0030,  0.0081,  0.0090, -0.0653,  0.2871],
            [-0.1248, -0.0433,  0.1839, -0.2815,  0.1197],
            [-0.0989,  0.2145, -0.2426,  0.0165,  0.0438],
            [-0.3598, -0.3252,  0.1715, -0.1302,  0.2656],
            [-0.4418, -0.2211, -0.3684,  0.1786, -0.0130],
            [-0.0834, -0.0744, -0.3496,  0.1268,  0.0111],
            [-0.3086,  0.1683, -0.0090, -0.4325,  0.2406],
            [ 0.2392, -0.0843, -0.3088,  0.0180,  0.3375]])
    bias_ih = torch.tensor([ 0.4094, -0.3376, -0.2020,  0.3482,  0.2186,  0.2768, -0.2226,  0.3853,
            -0.3676, -0.0215,  0.0093,  0.0751, -0.3375,  0.4103,  0.4395])
    bias_hh = torch.tensor([-0.3088,  0.0165, -0.2382,  0.4288,  0.2494,  0.2634,  0.1443, -0.0445,
             0.2518,  0.0076, -0.1631,  0.2309,  0.1403, -0.1159, -0.1226])
    
    class GRUtest(nn.Module):     # pytorch中的gru
        def __init__(self, input, hidden, act):
            super().__init__()
            self.gru = nn.GRU(input, hidden, batch_first=True)
            if act == 'sigmoid':
                self.act = nn.Sigmoid()
            elif act == 'tanh':
                self.act = nn.Tanh()
            elif act == 'relu':
                self.act = nn.ReLU()
    
        def forward(self, x):  
            self.gru.flatten_parameters()
            gru_out, gru_state = self.gru(x)   
            return gru_out, gru_state
    
    class GRULayer:
        def __init__(self, input_size, hidden_size, act):
            self.bias_ih = bias_ih.reshape(-1)
            self.bias_hh = bias_hh.reshape(-1)
            self.weight_ih = weight_ih.reshape(-1)
            self.weight_hh = weight_hh.reshape(-1)
            self.nb_input = input_size
            self.nb_neurons = hidden_size
            self.activation = act
    
    
    def compute_gru(gru, state, input):
        M = gru.nb_input
        N = gru.nb_neurons
        r = torch.zeros(N)
        z = torch.zeros(N)
        n = torch.zeros(N)
        h_new = torch.zeros(N)
        
        for i in range(N):
            sum = gru.bias_ih[0*N + i] +  gru.bias_hh[0*N + i]
            for j in range(M):
                sum += input[j] * gru.weight_ih[0*M*N + i*M + j]
            for j in range(N):
                sum += state[j] * gru.weight_hh[0*N*N + i*N + j] 
            r[i] = torch.sigmoid(sum)
        
        for i in range(N):
            sum = gru.bias_ih[1*N+i] +  gru.bias_hh[1*N+i]
            for j in range(M):
                sum += input[j] * gru.weight_ih[1*M*N + i*M + j]
            for j in range(N):
                sum += state[j] * gru.weight_hh[1*N*N + i*N + j] 
            z[i] = torch.sigmoid(sum)
        
        for i in range(N):
            sum = 0
            sum += gru.bias_ih[2*N+i]
            tmp = 0 
            for j in range(M):
                sum += input[j] * gru.weight_ih[2*M*N + i*M + j]
            for j in range(N):
                tmp += state[j] * gru.weight_hh[2*N*N + i*N + j]
            sum += r[i]*(tmp + gru.bias_hh[2*N+i])
            n[i] = torch.tanh(sum)
        
        for i in range(N):
            h_new[i] = (1 - z[i]) * n[i] + z[i] * state[i]
            state[i] = h_new[i]
    
    b = torch.randn((1, 5, 10))
       
    
    if __name__ == '__main__':
        insize = 10
        hsize = 5
        net1 = GRUtest(insize, hsize, 'tanh')
        model_ckpt1 = torch.load('./nn_test.pkl')    #根据路径需要及进行修改
        net1.load_state_dict(model_ckpt1.state_dict())
        gru = GRULayer(insize, hsize, 'tanh')      # 自己写的gru类,包含gru参数 
        out = torch.zeros((5, 5))    #用以保存计算结果
        state = torch.zeros(5)       #用来储存gidden_state的变量,初始化为0
        for i in range(5):
            input = b[0][i]
            compute_gru(gru, state, input)
            out[i] = state
        print("自己实现前向计算结果:")
        print(out)
        print("pytorch实现前向计算结果:")
        torch_out, _ = net1(b)
        print(torch_out)
    

    计算结果为:

    自己实现前向计算结果:
    tensor([[-0.1810,  0.1028, -0.2076, -0.0975,  0.1328],
            [-0.2521, -0.4217,  0.1996,  0.4948,  0.2553],
            [-0.1471,  0.2741,  0.0375, -0.1926, -0.1080],
            [-0.7646,  0.0691, -0.1276,  0.0147, -0.0271],
            [-0.6323,  0.1059,  0.0936,  0.1193, -0.2436]])
    pytorch实现前向计算结果:
    tensor([[[-0.1810,  0.1028, -0.2076, -0.0975,  0.1328],
             [-0.2522, -0.4217,  0.1996,  0.4948,  0.2553],
             [-0.1471,  0.2741,  0.0375, -0.1926, -0.1079],
             [-0.7646,  0.0691, -0.1276,  0.0147, -0.0271],
             [-0.6323,  0.1059,  0.0937,  0.1193, -0.2436]]],
           grad_fn=<TransposeBackward1>)
    

    可以看出,结果十分一致

四、拓展到C语言

按照上述python代码容易写出。本人代码正在整理,待更新

五、补充说明

模型nn_test.pkl见百度网盘:https://pan.baidu.com/s/1wu-i_1X1YuDJygcxPsKi2w 提取码:razn

  • 2
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
要保存Python中的GRU神经网络模型,你可以使用PyTorch库提供的方法。首先,你可以使用`torch.save()`函数将模型的状态字典保存为.pth文件。这个状态字典包含了模型的参数和权重。你可以使用以下代码来保存模型: ``` torch.save(model.state_dict(), '路径/文件名.pth') ``` 其中,`model`是你要保存的神经网络模型对象,'路径/文件名.pth'是你想要保存的文件路径和文件名。 然后,当你想要加载保存的模型时,你可以使用`torch.load()`函数来加载.pth文件,并使用`load_state_dict()`方法将参数和权重加载到模型中。以下是加载模型的代码示例: ``` model = YourModel() # 创建一个新的模型对象 model.load_state_dict(torch.load('路径/文件名.pth', map_location=device)) ``` 你需要在代码中指定设备(例如CPU或GPU)的位置,通过`map_location`参数。在上述代码中,`device`是你指定的设备。 这样,你就可以保存和加载Python中的GRU神经网络模型了。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [Pytorch模型的保存和提取以及保存和提取神经网络 - pytorch中文网](https://blog.csdn.net/weixin_39529302/article/details/111454066)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [[Pytorch]保存与加载神经网络模型](https://blog.csdn.net/weixin_37878740/article/details/128724667)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值