普通Recurrent Neural Network (RNN)
输入: x t x_t xt
循环层: h t = f ( W x h x t + W h h h t − 1 + b h ) h_{t}=f\left(W_{x h} x_{t}+W_{h h} h_{t-1}+b_{h}\right) ht=f(Wxhxt+Whhht−1+bh)
输出: y t = g ( W o h t + b o ) y_{t}=g\left(W_{\mathrm{o}} h_{t}+b_{\mathrm{o}}\right) yt=g(Woht+bo)
Long Short-Term Memory (LSTM)
三个门控信号:
i
t
=
σ
(
W
x
i
x
t
+
W
h
i
h
t
−
1
+
b
i
)
i_t=\sigma(W_{xi}x_t+W_{hi}h_{t-1}+b_i)
it=σ(Wxixt+Whiht−1+bi)
f
t
=
σ
(
W
x
f
x
t
+
W
h
f
h
t
−
1
+
b
f
)
f_t=\sigma(W_{xf}x_t+W_{hf}h_{t-1}+b_f)
ft=σ(Wxfxt+Whfht−1+bf)
o
t
=
σ
(
W
x
o
x
t
+
W
h
o
h
t
−
1
+
b
o
)
o_t=\sigma(W_{xo}x_t+W_{ho}h_{t-1}+b_o)
ot=σ(Wxoxt+Whoht−1+bo)
cell状态:
c
t
=
f
t
⊙
c
t
−
1
+
i
t
⊙
t
a
n
h
(
W
x
c
x
t
+
W
h
c
h
t
−
1
+
b
c
)
c_t=f_t\odot c_{t-1}+i_t\odot tanh(W_{xc}x_t+W_{hc}h_{t-1}+b_c)
ct=ft⊙ct−1+it⊙tanh(Wxcxt+Whcht−1+bc)
h层状态:
h
t
=
o
t
⊙
t
a
n
h
(
c
t
)
h_t=o_t\odot tanh(c_t)
ht=ot⊙tanh(ct)
Gate Recurrent Unit (GRU)
两个门控信号:
r
t
=
σ
(
W
x
r
x
t
+
W
h
r
h
t
−
1
+
b
r
)
r_t = \sigma(W_{xr}x_t+W_{hr}h_{t-1} + b_r)
rt=σ(Wxrxt+Whrht−1+br)
z
t
=
σ
(
W
x
z
x
t
+
W
h
z
h
t
−
1
+
b
z
)
z_t = \sigma(W_{xz}x_t+W_{hz}h_{t-1} + b_z)
zt=σ(Wxzxt+Whzht−1+bz)
中间状态
n
t
n_t
nt:
n
t
=
t
a
n
h
(
W
x
n
x
t
+
b
x
n
+
r
t
⊙
(
W
h
n
h
t
−
1
+
b
h
n
)
)
n_t=tanh(W_{xn}x_t + b_{xn}+r_t\odot(W_{hn}h_{t-1}+b_{hn}))
nt=tanh(Wxnxt+bxn+rt⊙(Whnht−1+bhn))
h层状态:
h
t
=
(
1
−
z
t
)
⊙
n
t
+
z
t
⊙
h
t
−
1
h_t=(1-z_t)\odot n_t+z_t \odot h_{t-1}
ht=(1−zt)⊙nt+zt⊙ht−1
相对于LSTM,GRU少了一个门,因此网络参数相对较少,但性能却和LSTM接近。