CS224N notes_chapter11_Review GRU & LSTM

最新推荐文章于 2024-06-18 22:23:08 发布

lirt15

最新推荐文章于 2024-06-18 22:23:08 发布

阅读量107

点赞数

分类专栏： CS224N 文章标签： NLP CS224n

本文链接：https://blog.csdn.net/lirt15/article/details/94846494

版权

CS224N 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

第十一讲 Review GRU & LSTM

原视频中还涉及一些关于MT的其他议题，笔者在此处忽略了。

GRU

idea: Perhaps we could use shortcut connections to prevent model from gradient vanishing. -> adaptive shortcut connections( $u_t$ ).
$\begin{aligned} f(h_{t-1},x_t) =& u_t\odot \hat h_t + (1-u_t)\odot h_{t-1} \\ \hat h_t =& tanh(W[x_t]+Uh_{t-1}+b) \\ u_t =& \sigma(W_u[x_t]+U_uh_{t-1}+b_u) \end{aligned}$
idea: Prune unnecessary connections adaptively( $r_t$ ).
$\begin{aligned} \hat h_t =& tanh(W[x_t]+U(r_t\odot h_{t-1})+b) \\ r_t =& \sigma(W_r[x_t]+U_r h_{t-1}+b_r) \\ u_t =& \sigma(W_u[x_t]+U_uh_{t-1}+b_u) \end{aligned}$

Some tricks to train RNN

Use LSTM or GRU
initialize recurrent matrices to be orthogonal
initialize other matrices with a sensible scale
initialize forget gate bias to 1: default to remembering
Adam, Adadelta
clip norm.
dropout vertically

Ensemble.

MT evaluation

Manual
Testing in an application that uses MT as one sub-componet
Automatic metric
- WER word error rate
- BLEU Bilingual Evaluation Understudy

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

lirt15

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS224N notes_chapter11_Review GRU & LSTM

第十一讲 Review GRU & LSTM原视频中还涉及一些关于MT的其他议题，笔者在此处忽略了。GRUidea: Perhaps we could use shortcut connections to prevent model from gradient vanishing. -> adaptive shortcut connections(utu_tut).f(h...
复制链接

扫一扫