《动手学》Pytorch第二次打卡

最新推荐文章于 2023-09-12 21:41:25 发布

格拉迪沃

最新推荐文章于 2023-09-12 21:41:25 发布

阅读量812

点赞数

分类专栏： pytorch 文章标签：人工智能自然语言处理

本文链接：https://blog.csdn.net/qq_32796253/article/details/104377664

版权

概述

由于本人主要是做NLP的，学习这门课的主要目目的是学习torch，因此主要在torch实现功能上做研读，关于理论就不再详细做笔记了。如果笔记成为讲义，不如之间看jupyter。

LSTM与GRU

1.1 GRU

RNN存在的问题：梯度较容易出现衰减或爆炸（BPTT）
⻔控循环神经⽹络：捕捉时间序列中时间步距离较⼤的依赖关系
RNN:

Image Name

$H_{t} = ϕ(X_{t}W_{xh} + H_{t-1}W_{hh} + b_{h})$
GRU:

Image Name

$R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\\ Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\\ \widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\\ H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙\widetilde{H}_t$
• 重置⻔有助于捕捉时间序列⾥短期的依赖关系；
• 更新⻔有助于捕捉时间序列⾥⻓期的依赖关系。
参数初始化demo

num_inputs, num_hiddens, num_outputs = vocab_size, 256, vocab_size
print('will use', device)

def get_params():  
    def _one(shape):
        ts = torch.tensor(np.random.normal(0, 0.01, size=shape), device=device, dtype=torch.float32) #正态分布
        return torch.nn.Parameter(ts, requires_grad=True)
    def _three():
        return (_one((num_inputs, num_hiddens)),
                _one((num_hiddens, num_hiddens)),
                torch.nn.Parameter(torch.zeros(num_hiddens, device=device, dtype=torch.float32), requires_grad=True))
     
    W_xz, W_hz, b_z = _three()  # 更新门参数
    W_xr, W_hr, b_r = _three()  # 重置门参数
    W_xh, W_hh, b_h = _three()  # 候选隐藏状态参数
    
    # 输出层参数
    W_hq = _one((num_hiddens, num_outputs))
    b_q = torch.nn.Parameter(torch.zeros(num_outputs, device=device, dtype=torch.float32), requires_grad=True)
    return nn.ParameterList([W_xz, W_hz, b_z, W_xr, W_hr, b_r, W_xh