RNN和LSTM的反向传播公式推导

RNN和LSTM的反向传播公式推导

这篇博客主要推导RNN和LSTM的反向传播公式,以便更好的理解RNN和LSTM的运算



一. RNN

1. RNN正向传播

h ~ t = W x t + U h t − 1 + b \widetilde h_{t}=Wx_{t}+Uh_{t-1}+b h t=Wxt+Uht1+b
h t = tanh ⁡ ( h ~ t ) h_{t}= \tanh(\widetilde h_{t}) ht=tanh(h t)

2. RNN梯度计算

t a n h ( x ) tanh(x) tanh(x)函数的导数为 t a n h ′ ( x ) tanh^{'}(x) tanh(x),则RNN中可训练参数的梯度为:

d h t d W = tanh ⁡ ′ ( h ~ t ) [ x t x t T + U T d h t − 1 d W ] \frac{\mathrm{d}h_{t}}{\mathrm{d}W}= \tanh^{'}(\widetilde h_{t})[x_{t}x^{T}_{t} + U^{T}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W}] dWdht=tanh(h t)[xtxtT+UTdWdht1]

d h t d U = tanh ⁡ ′ ( h ~ t ) [ h t − 1 h t − 1 T + U T d h t − 1 d U ] \frac{\mathrm{d}h_{t}}{\mathrm{d}U}= \tanh^{'}(\widetilde h_{t})[h_{t-1}h^{T}_{t-1}+U^{T}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U}] dUdht=tanh(h t)[ht1ht1T+UTdUdht1]

d h t d b = tanh ⁡ ′ ( h ~ t ) [ 1 + U T d h t − 1 d b ] \frac{\mathrm{d} h_{t}}{\mathrm{d} b}= \tanh^{'}(\widetilde h_{t})[1+U^{T}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b}] dbdht=tanh(h t)[1+UTdbdht1]

其中:
d h 1 d W = tanh ⁡ ′ ( h ~ 1 ) [ x 1 x 1 T ] \frac{\mathrm{d}h_{1}}{\mathrm{d}W}= \tanh^{'}(\widetilde h_{1}) [x_{1}x^{T}_{1}] dWdh1=tanh(h 1)[x1x1T]

d h 1 d U = tanh ⁡ ′ ( h ~ 1 ) [ h 0 h 0 T ] \frac{\mathrm{d}h_{1}}{\mathrm{d}U}= \tanh^{'}(\widetilde h_{1}) [h_{0}h^{T}_{0}] dUdh1=tanh(h 1)[h0h0T]

d h 1 d b = tanh ⁡ ′ ( h ~ 1 ) \frac{\mathrm{d} h_{1}}{\mathrm{d} b}= \tanh^{'}(\widetilde h_{1}) dbdh1=tanh(h 1)


二. LSTM

1. LSTM正向传播

为了求导时方便书写,将LSTM写成如下形式:
i ~ t = W i x t + U i h t − 1 + b i \widetilde i_{t}=W^{i}x_{t}+U^{i}h_{t-1}+b^{i} i t=Wixt+Uiht1+bi
f ~ t = W f x t + U f h t − 1 + b f \widetilde f_{t}=W^{f}x_{t}+U^{f}h_{t-1}+b^{f} f t=Wfxt+Ufht1+bf
o ~ t = W o x t + U o h t − 1 + b o \widetilde o_{t}=W^{o}x_{t}+U^{o}h_{t-1}+b^{o} o t=Woxt+Uoht1+bo
g ~ t = W c x t + U c h t − 1 + b c \widetilde{g}_{t}=W^{c}x_{t}+U^{c}h_{t-1}+b^{c} g t=Wcxt+Ucht1+bc
i t = σ ( i ~ t ) i_{t} = \sigma(\widetilde i_{t}) it=σ(i t)
f t = σ ( f ~ t ) f_{t} = \sigma(\widetilde f_{t}) ft=σ(f t)
o t = σ ( o ~ t ) o_{t} = \sigma(\widetilde o_{t}) ot=σ(o t)
g t = σ ( g ~ t ) g_{t} = \sigma(\widetilde g_{t}) gt=σ(g t)
c t = f t ⊗ c t − 1 + i t ⊗ g t c_{t}=f_{t}\otimes c_{t-1}+i_{t}\otimes g_{t} ct=ftct1+itgt
h t = o t ⊗ tanh ⁡ ( c t ) h_{t}=o_{t}\otimes \tanh(c_{t}) ht=ottanh(ct)

2. LSTM反向传播

t a n h ( x ) tanh(x) tanh(x)函数的导数为 t a n h ′ ( x ) tanh^{'}(x) tanh(x) σ ( x ) \sigma(x) σ(x)函数的导数为 s i g m ′ ( x ) sigm^{'}(x) sigm(x), 则LSTM中可训练参数的梯度为:


输入门:
d h t d W i = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d W i ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d W i \frac{\mathrm{d}h_{t}}{\mathrm{d}W^{i}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{i}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}W^{i}} dWidht=[sigm(o t)UoTdWidht1]tanh(ct)+ottanh(ct)dWidct

d h t d U i = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d U i ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d U i \frac{\mathrm{d}h_{t}}{\mathrm{d}U^{i}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{i}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}U^{i}} dUidht=[sigm(o t)UoTdUidht1]tanh(ct)+ottanh(ct)dUidct

d h t d b i = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d b i ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d b i \frac{\mathrm{d}h_{t}}{\mathrm{d}b^{i}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{i}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}b^{i}} dbidht=[sigm(o t)UoTdbidht1]tanh(ct)+ottanh(ct)dbidct


遗忘门:
d h t d W f = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d W f ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d W f \frac{\mathrm{d}h_{t}}{\mathrm{d}W^{f}}=[sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{f}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}W^{f}} dWfdht=[sigm(o t)UoTdWfdht1]tanh(ct)+ottanh(ct)dWfdct

d h t d U f = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d U f ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d U f \frac{\mathrm{d}h_{t}}{\mathrm{d}U^{f}}=[sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{f}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}U^{f}} dUfdht=[sigm(o t)UoTdUfdht1]tanh(ct)+ottanh(ct)dUfdct

d h t d b f = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d b f ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d b f \frac{\mathrm{d}h_{t}}{\mathrm{d}b^{f}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{f}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}b^{f}} dbfdht=[sigm(o t)UoTdbfdht1]tanh(ct)+ottanh(ct)dbfdct


候选记忆细胞:
d h t d W c = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d W c ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d W c \frac{\mathrm{d}h_{t}}{\mathrm{d}W^{c}}=[sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{c}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}W^{c}} dWcdht=[sigm(o t)UoTdWcdht1]tanh(ct)+ottanh(ct)dWcdct

d h t d U c = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d U c ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d U c \frac{\mathrm{d}h_{t}}{\mathrm{d}U^{c}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{c}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}U^{c}} dUcdht=[sigm(o t)UoTdUcdht1]tanh(ct)+ottanh(ct)dUcdct

d h t d b c = [ s i g m ′ ( o ~ t ) U o T d h t − 1 d b c ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d b c \frac{\mathrm{d}h_{t}}{\mathrm{d}b^{c}}= [sigm^{'}(\widetilde o_{t})U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{c}}]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}b^{c}} dbcdht=[sigm(o t)UoTdbcdht1]tanh(ct)+ottanh(ct)dbcdct


输出门:
d h t d W o = [ s i g m ′ ( o ~ t ) ( x t x t T + U o T d h t − 1 d W o ) ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d W o \frac{\mathrm{d}h_{t}}{\mathrm{d}W^{o}}=[sigm^{'}(\widetilde o_{t})(x_{t}x^{T}_{t} + U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{o}})]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}W^{o}} dWodht=[sigm(o t)(xtxtT+UoTdWodht1)]tanh(ct)+ottanh(ct)dWodct

d h t d U o = [ s i g m ′ ( o ~ t ) ( h t − 1 h t − 1 T + U o T d h t − 1 d U o ) ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d U o \frac{\mathrm{d}h_{t}}{\mathrm{d}U^{o}}=[sigm^{'}(\widetilde o_{t})(h_{t-1}h^{T}_{t-1} + U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{o}})]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}U^{o}} dUodht=[sigm(o t)(ht1ht1T+UoTdUodht1)]tanh(ct)+ottanh(ct)dUodct

d h t d b o = [ s i g m ′ ( o ~ t ) ( 1 + U o T d h t − 1 d b o ) ] tanh ⁡ ( c t ) + o t tanh ⁡ ′ ( c t ) d c t d b o \frac{\mathrm{d}h_{t}}{\mathrm{d}b^{o}}=[sigm^{'}(\widetilde o_{t})(1 +U^{o^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{o}})]\tanh(c_{t})+o_{t}\tanh^{'}(c_{t})\frac{\mathrm{d}c_{t}}{\mathrm{d}b^{o}} dbodht=[sigm(o t)(1+UoTdbodht1)]tanh(ct)+ottanh(ct)dbodct


以上公式都包含了 d c t \mathrm{d}c_{t} dct,因此还需要对 d c t \mathrm{d}c_{t} dct进行计算:


输入门:
d c t d W i = s i g m ′ ( i ~ t ) [ x t x t T + U i T d h t − 1 d W i ] g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d W i + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d W i ] c t − 1 + f t d c t − 1 d W i \frac{\mathrm{d}c_{t}}{\mathrm{d}W^{i}}= sigm^{'}(\widetilde i_{t})[x_{t}x^{T}_{t} + U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{i}}]g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{i}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{i}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}W^{i}} dWidct=sigm(i t)[xtxtT+UiTdWidht1]gt+ittanh(g t)UcTdWidht1+sigm(f t)[UfTdWidht1]ct1+ftdWidct1

d c t d U i = s i g m ′ ( i ~ t ) [ h t − 1 h t − 1 T + U i T d h t − 1 d U i ] g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d U i + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d U i ] c t − 1 + f t d c t − 1 d U i \frac{\mathrm{d}c_{t}}{\mathrm{d}U^{i}}= sigm^{'}(\widetilde i_{t})[h_{t-1}h^{T}_{t-1} + U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{i}}]g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{i}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{i}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}U^{i}} dUidct=sigm(i t)[ht1ht1T+UiTdUidht1]gt+ittanh(g t)UcTdUidht1+sigm(f t)[UfTdUidht1]ct1+ftdUidct1

d c t d b i = s i g m ′ ( i ~ t ) [ 1 + U i T d h t − 1 d b i ] g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d b i + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d b i ) ] c t − 1 + f t d c t − 1 d b i \frac{\mathrm{d}c_{t}}{\mathrm{d}b^{i}}= sigm^{'}(\widetilde i_{t})[1 + U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{i}}]g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{i}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{i}})]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}b^{i}} dbidct=sigm(i t)[1+UiTdbidht1]gt+ittanh(g t)UcTdbidht1+sigm(f t)[UfTdbidht1)]ct1+ftdbidct1


遗忘门:
d c t d W f = s i g m ′ ( i ~ t ) U i T d h t − 1 d W f g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d W f + s i g m ′ ( f ~ t ) [ x t x t T + U f T d h t − 1 d W f ] c t − 1 + f t d c t − 1 d W f \frac{\mathrm{d}c_{t}}{\mathrm{d}W^{f}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{f}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{f}} + sigm^{'}(\widetilde f_{t})[x_{t}x^{T}_{t} + U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{f}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}W^{f}} dWfdct=sigm(i t)UiTdWfdht1gt+ittanh(g t)UcTdWfdht1+sigm(f t)[xtxtT+UfTdWfdht1]ct1+ftdWfdct1

d c t d U f = s i g m ′ ( i ~ t ) U i T d h t − 1 d U f g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d U f + s i g m ′ ( f ~ t ) [ h t − 1 h t − 1 T + U f T d h t − 1 d U f ] c t − 1 + f t d c t − 1 d U f \frac{\mathrm{d}c_{t}}{\mathrm{d}U^{f}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{f}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{f}} + sigm^{'}(\widetilde f_{t})[h_{t-1}h^{T}_{t-1} + U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{f}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}U^{f}} dUfdct=sigm(i t)UiTdUfdht1gt+ittanh(g t)UcTdUfdht1+sigm(f t)[ht1ht1T+UfTdUfdht1]ct1+ftdUfdct1

d c t d b f = s i g m ′ ( i ~ t ) U i T d h t − 1 d b f g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d b f + s i g m ′ ( f ~ t ) [ 1 + U f T d h t − 1 d b f ] c t − 1 + f t d c t − 1 d b f \frac{\mathrm{d}c_{t}}{\mathrm{d}b^{f}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{f}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{f}} + sigm^{'}(\widetilde f_{t})[1 + U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{f}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}b^{f}} dbfdct=sigm(i t)UiTdbfdht1gt+ittanh(g t)UcTdbfdht1+sigm(f t)[1+UfTdbfdht1]ct1+ftdbfdct1


输出门:
d c t d W o = s i g m ′ ( i ~ t ) U i T d h t − 1 d W o g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d W o + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d W o ] c t − 1 + f t d c t − 1 d W o \frac{\mathrm{d}c_{t}}{\mathrm{d}W^{o}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{o}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{o}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{o}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}W^{o}} dWodct=sigm(i t)UiTdWodht1gt+ittanh(g t)UcTdWodht1+sigm(f t)[UfTdWodht1]ct1+ftdWodct1

d c t d U o = s i g m ′ ( i ~ t ) U i T d h t − 1 d U o g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d U o + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d U o ] c t − 1 + f t d c t − 1 d U o \frac{\mathrm{d}c_{t}}{\mathrm{d}U^{o}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{o}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{o}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{o}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}U^{o}} dUodct=sigm(i t)UiTdUodht1gt+ittanh(g t)UcTdUodht1+sigm(f t)[UfTdUodht1]ct1+ftdUodct1

d c t d b o = s i g m ′ ( i ~ t ) U i T d h t − 1 d b o g t + i t tanh ⁡ ′ ( g ~ t ) U c T d h t − 1 d b o + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d b o ] c t − 1 + f t d c t − 1 d b o \frac{\mathrm{d}c_{t}}{\mathrm{d}b^{o}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{o}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{o}} + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{o}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}b^{o}} dbodct=sigm(i t)UiTdbodht1gt+ittanh(g t)UcTdbodht1+sigm(f t)[UfTdbodht1]ct1+ftdbodct1


候选记忆细胞:
d c t d W c = s i g m ′ ( i ~ t ) U i T d h t − 1 d W c g t + i t tanh ⁡ ′ ( g ~ t ) [ x t x t T + U c T d h t − 1 d W c ] + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d W c ] c t − 1 + f t d c t − 1 d W c \frac{\mathrm{d}c_{t}}{\mathrm{d}W^{c}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{c}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})[x_{t}x^{T}_{t}+U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{c}}] + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}W^{c}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}W^{c}} dWcdct=sigm(i t)UiTdWcdht1gt+ittanh(g t)[xtxtT+UcTdWcdht1]+sigm(f t)[UfTdWcdht1]ct1+ftdWcdct1

d c t d U c = s i g m ′ ( i ~ t ) U i T d h t − 1 d U c g t + i t tanh ⁡ ′ ( g ~ t ) [ h t − 1 h t − 1 T + U c T d h t − 1 d U c ] + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d U c ] c t − 1 + f t d c t − 1 d U c \frac{\mathrm{d}c_{t}}{\mathrm{d}U^{c}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{c}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})[h_{t-1}h^{T}_{t-1}+U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{c}}] + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}U^{c}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}U^{c}} dUcdct=sigm(i t)UiTdUcdht1gt+ittanh(g t)[ht1ht1T+UcTdUcdht1]+sigm(f t)[UfTdUcdht1]ct1+ftdUcdct1

d c t d b c = s i g m ′ ( i ~ t ) U i T d h t − 1 d b c g t + i t tanh ⁡ ′ ( g ~ t ) [ 1 + U c T d h t − 1 d b c ] + s i g m ′ ( f ~ t ) [ U f T d h t − 1 d b c ] c t − 1 + f t d c t − 1 d b c \frac{\mathrm{d}c_{t}}{\mathrm{d}b^{c}}= sigm^{'}(\widetilde i_{t})U^{i^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{c}}g_{t}+i_{t}\tanh^{'}(\widetilde g_{t})[1 +U^{c^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{c}}] + sigm^{'}(\widetilde f_{t})[U^{f^{T}}\frac{\mathrm{d}h_{t-1}}{\mathrm{d}b^{c}}]c_{t-1}+f_{t}\frac{\mathrm{d}c_{t-1}}{\mathrm{d}b^{c}} dbcdct=sigm(i t)UiTdbcdht1gt+ittanh(g t)[1+UcTdbcdht1]+sigm(f t)[UfTdbcdht1]ct1+ftdbcdct1


最后就是他们的初始值了
输入门:

d c 1 d W i = g 1 [ x 1 x 1 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}W^{i}}= g_{1}[x_{1}x^{T}_{1}] dWidc1=g1[x1x1T]

d c 1 d U i = g 1 [ h 0 h 0 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}U^{i}}= g_{1}[h_{0}h^{T}_{0}] dUidc1=g1[h0h0T]

d c 1 d b i = g 1 \frac{\mathrm{d}c_{1}}{\mathrm{d}b^{i}}= g_{1} dbidc1=g1

d h 1 d W i = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d W i \frac{\mathrm{d}h_{1}}{\mathrm{d}W^{i}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}W^{i}} dWidh1=o1tanh(c1)dWidc1
d h 1 d U i = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d U i \frac{\mathrm{d}h_{1}}{\mathrm{d}U^{i}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}U^{i}} dUidh1=o1tanh(c1)dUidc1
d h 1 d b i = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d b i \frac{\mathrm{d}h_{1}}{\mathrm{d}b^{i}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}b^{i}} dbidh1=o1tanh(c1)dbidc1


遗忘门:

d c 1 d W f = c 0 [ x 1 x 1 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}W^{f}}= c_{0}[x_{1}x^{T}_{1}] dWfdc1=c0[x1x1T]

d c 1 d U f = c 0 [ h 0 h 0 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}U^{f}}= c_{0}[h_{0}h^{T}_{0}] dUfdc1=c0[h0h0T]

d c 1 d b f = c 0 \frac{\mathrm{d}c_{1}}{\mathrm{d}b^{f}}= c_{0} dbfdc1=c0

d h 1 d W f = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d W f \frac{\mathrm{d}h_{1}}{\mathrm{d}W^{f}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}W^{f}} dWfdh1=o1tanh(c1)dWfdc1
d h 1 d U f = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d U f \frac{\mathrm{d}h_{1}}{\mathrm{d}U^{f}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}U^{f}} dUfdh1=o1tanh(c1)dUfdc1
d h 1 d b f = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d b f \frac{\mathrm{d}h_{1}}{\mathrm{d}b^{f}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}b^{f}} dbfdh1=o1tanh(c1)dbfdc1


输出门:

d c 1 d W o = 0 \frac{\mathrm{d}c_{1}}{\mathrm{d}W^{o}}= 0 dWodc1=0

d c 1 d U o = 0 \frac{\mathrm{d}c_{1}}{\mathrm{d}U^{o}}= 0 dUodc1=0

d c 1 d b o = 0 \frac{\mathrm{d}c_{1}}{\mathrm{d}b^{o}}= 0 dbodc1=0

d h 1 d W o = s i g m ′ ( o ~ 1 ) [ x 1 x 1 T ] tanh ⁡ ( c 1 ) \frac{\mathrm{d}h_{1}}{\mathrm{d}W^{o}}= sigm^{'}(\widetilde o_{1})[x_{1}x^{T}_{1}]\tanh(c_{1}) dWodh1=sigm(o 1)[x1x1T]tanh(c1)

d h 1 d U o = s i g m ′ ( o ~ 1 ) [ h 0 h 0 T ] tanh ⁡ ( c 1 ) \frac{\mathrm{d}h_{1}}{\mathrm{d}U^{o}}= sigm^{'}(\widetilde o_{1})[h _{0}h^{T}_{0}]\tanh(c_{1}) dUodh1=sigm(o 1)[h0h0T]tanh(c1)

d h 1 d b o = s i g m ′ ( o ~ 1 ) tanh ⁡ ( c 1 ) \frac{\mathrm{d}h_{1}}{\mathrm{d}b^{o}}= sigm^{'}(\widetilde o_{1})\tanh(c_{1}) dbodh1=sigm(o 1)tanh(c1)


候选记忆细胞:

d c 1 d W c = i 1 tanh ⁡ ′ ( g ~ 1 ) [ x 1 x 1 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}W^{c}}= i_{1}\tanh^{'}(\widetilde g_{1})[x_{1}x^{T}_{1}] dWcdc1=i1tanh(g 1)[x1x1T]

d c 1 d U c = i 1 tanh ⁡ ′ ( g ~ 1 ) [ h 0 h 0 T ] \frac{\mathrm{d}c_{1}}{\mathrm{d}U^{c}}= i_{1}\tanh^{'}(\widetilde g_{1})[h_{0}h^{T}_{0}] dUcdc1=i1tanh(g 1)[h0h0T]

d c 1 d b c = i 1 tanh ⁡ ′ ( g ~ 1 ) \frac{\mathrm{d}c_{1}}{\mathrm{d}b^{c}}= i_{1}\tanh^{'}(\widetilde g_{1}) dbcdc1=i1tanh(g 1)

d h 1 d W c = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d W c \frac{\mathrm{d}h_{1}}{\mathrm{d}W^{c}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}W^{c}} dWcdh1=o1tanh(c1)dWcdc1

d h 1 d U c = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d U c \frac{\mathrm{d}h_{1}}{\mathrm{d}U^{c}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}U^{c}} dUcdh1=o1tanh(c1)dUcdc1

d h 1 d b c = o 1 tanh ⁡ ′ ( c 1 ) d c 1 d b c \frac{\mathrm{d}h_{1}}{\mathrm{d}b^{c}}= o_{1}\tanh^{'}(c_{1})\frac{\mathrm{d}c_{1}}{\mathrm{d}b^{c}} dbcdh1=o1tanh(c1)dbcdc1


总结

ok,到这里反向传播的导数公式就全部推导结束,都是用的链式求导法则。如果有算错的,欢迎指正。

  • 3
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

皮皮宽

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值