RNN对隐含层求梯度

最新推荐文章于 2024-07-14 11:02:46 发布

u014540876

最新推荐文章于 2024-07-14 11:02:46 发布

阅读量570

点赞数

分类专栏：机器学习算法文章标签：深度学习神经网络

本文链接：https://blog.csdn.net/u014540876/article/details/106139509

版权

机器学习算法专栏收录该内容

7 篇文章 3 订阅

订阅专栏

在看《动手学深度学习》一书时，里面有介绍简化版的对RNN求梯度。其中求隐含层梯度时，作者只是简略地说了句“将上⾯的递归公式展开”就直接给出了结果，下面我详细地给出中间步骤。
在这里插入图片描述

$\frac{\partial L}{\partial h_t} = W^{\top}_{hh} \cdot \frac{\partial L}{\partial h_{t+1}} + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_t}$
$W^{\top}_{hh} \cdot(W^{\top}_{hh} \cdot \frac{\partial L}{\partial h_{t+2}} + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{t+1}}) + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_t}$
$W^{\top}_{hh})^2\cdot \frac{\partial L}{\partial h_{t+2}} + W^{\top}_{hh} \cdot W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{t+1}} + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_t}$
$W^{\top}_{hh} \cdot(W^{\top}_{hh} \cdot (W^{\top}_{hh} \cdot \frac{\partial L}{\partial h_{t+3}} + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{t+2}}) + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{t+1}}) + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_t}$
$=(W^{\top}_{hh})^3\cdot \frac{\partial L}{\partial h_{t+3}} +(W^{\top}_{hh})^2\cdot \frac{\partial L}{\partial h_{t+2}} + W^{\top}_{hh} \cdot W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{t+1}} + W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_t}$
$=\cdots\cdots$
$=(W^{\top}_{hh})^{T-t}\cdot \frac{\partial L}{\partial h_T} + \sum_{i=t+1}^{T}\textbf{[}(W^{\top}_{hh})^{T-i}\cdot W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{T+t-i}}\textbf{]}$
$又\qquad\qquad\frac{\partial L}{\partial h_T}=(W^{\top}_{qh})^T\cdot \frac{\partial L}{\partial O_T}$
将其代入上式，即得：
$\frac{\partial L}{\partial h_t}=\sum_{i=t}^{T}\textbf{[}(W^{\top}_{hh})^{T-i}\cdot W^{\top}_{qh} \cdot \frac{\partial L}{\partial O_{T+t-i}}\textbf{]}$