Gated Recurrent Units (GRU)

最新推荐文章于 2024-03-31 14:34:28 发布

sohero

最新推荐文章于 2024-03-31 14:34:28 发布

阅读量3.3k

点赞数

Illustration:

update gate
$u t = σ (W u x x t + W u h h t - 1)$ $u_t=\sigma(W^{u}_xx_t+W^{u}_hh_{t-1})$
Update gate $u$ controls how much of past state should matter now. If $u$ close to 1, then we can copy information in that unit through many time steps. Less vanishing gradient.
reset gate
$r t = σ (W r x x t + W r h h t - 1)$ $r_t=\sigma(W^{r}_xx_t+W^{r}_hh_{t-1})$
If reset is close to 0, ignore previous hidden state $\to$ Allows model to drop information that is irrelevant in the future.
new memory content
$h ˜ t = t a n h (W h ˜ x x t + r t \circ W h ˜ h h t - 1)$ $\widetilde h_t=tanh(W^{\widetilde h}_xx_t+r_t\circ W^{\widetilde h}_hh_{t-1})$
final memory at time step combines current and previous time steps:
$h t = u t \circ h t - 1 + (1 - u t) \circ h ˜ t$ $h_t=u_t\circ h_{t-1}+(1-u_t)\circ \widetilde h_t$