LSTM公式原理+Keras Pytorch LSTM中参数对比+每层LSTM参数计算

universe_1207

已于 2022-06-05 18:44:43 修改

阅读量1.1k

点赞数

分类专栏：代码小能手机器学习文章标签： keras pytorch lstm

于 2022-06-03 16:09:39 首次发布

本文链接：https://blog.csdn.net/universe_1207/article/details/123993367

版权

19 篇文章 0 订阅

订阅专栏

11 篇文章 1 订阅

订阅专栏

Keras

输入shape:(samples,timesteps,input_dim)

concat = torch.cat((x1,x3),1)#(none,50,2,244)
input_dim = 2*244
timesteps = 50
#每个句子有50个单词，每个单词表示向量为488-dim

在这里插入图片描述

参考：

LSTM采用了门控输出的方式，即三门（输入门、遗忘门、输出门）两态（Cell State长时、Hidden State短时）。其核心即Cell State
遗忘门： $f_{t}=\sigma\left(W_{f} \cdot\left[h_{t-1}, x_{t}\right]+b_{f}\right)$
输入门： $i_{t}=\sigma\left(W_{i} \cdot\left[h_{t-1}, x_{t}\right]+b_{i}\right)$ $\tilde{C}_{t}=\tanh \left(W_{C} \cdot\left[h_{t-1}, x_{t}\right]+b_{C}\right)$ 以及t时刻cell的状态（长时）： $C_{t}=f_{t} \cdot C_{t-1}+i_{t} \cdot \tilde{C}_{t}$
输出门： $o_{t}=\sigma\left(W_{o} \cdot\left[h_{t-1}, x_{t}\right]+b_{o}\right)$ $h_{t}=o_{t} \cdot \tanh \left(C_{t}\right)$
这里的 $f_t,i_t,C_t\cdots$ 的维数都是hidden_size(cell个数) $\times 1$
https://www.wangt.cc/2021/12/lstm%E6%80%BB%E7%BB%93%E7%AC%94%E8%AE%B0/
要注意，以上都是将 $h_{t-1}与x_t$ 拼接起来的，但它本来应该是这样的
为什么要提到这个呢？因为不同的深度学习框架学习的参数是不一样的，后面计算参数那部分会说