参考博客:
https://www.cnblogs.com/milaohu/p/7337330.html?utm_source=itdadao&utm_medium=referral
对于函数:
y
=
f
(
A
B
)
=
f
(
C
)
y = f(AB) = f(C)
y=f(AB)=f(C)
A维度(n, m)
B维度(m, k)
C维度(n, k)
y是标量
现在 求
∂
y
∂
A
\frac{\partial y}{\partial A}
∂A∂y和
∂
y
∂
B
\frac{\partial y}{\partial B}
∂B∂y
标量对矩阵求导可以看作对矩阵的每一个元素求导。然后把结果按照矩阵的排列方式摆放,得到一个相同维度的梯度矩阵。
先求y对A中每个元素的导数,根据链式法则:
∂
y
∂
A
p
,
q
=
∑
i
,
j
∂
y
∂
C
i
,
j
∂
C
i
,
j
∂
A
p
,
q
\frac{\partial y}{\partial A_{p,q}}=\sum_{i,j} \frac{\partial y}{\partial C_{i,j}}\frac{\partial C_{i,j}}{\partial A_{p,q}}
∂Ap,q∂y=i,j∑∂Ci,j∂y∂Ap,q∂Ci,j
根据矩阵乘法定义:
C
i
,
j
=
∑
h
A
i
,
h
B
h
,
j
C_{i,j}=\sum_h A_{i,h}B_{h,j}
Ci,j=h∑Ai,hBh,j
所以,当
i
≠
p
i \neq p
i=p时,
C
i
,
j
C_{i,j}
Ci,j与
A
p
,
q
A_{p,q}
Ap,q无关:
∂
C
i
,
j
∂
A
p
,
q
=
{
B
q
,
j
i
=
p
0
i
≠
p
\frac{\partial C_{i,j}}{\partial A_{p,q}}=\begin{cases} B_{q,j} & i=p \\0 & i\ne p\end{cases}
∂Ap,q∂Ci,j={Bq,j0i=pi=p
代入前面的式子,我们有:
∂
y
∂
A
p
,
q
=
∑
i
,
j
∂
y
∂
C
i
,
j
∂
C
i
,
j
∂
A
p
,
q
=
∑
j
∂
y
∂
C
p
,
j
∂
C
p
,
j
∂
A
p
,
q
=
∑
j
∂
y
∂
C
p
,
j
B
q
,
j
=
∑
j
∂
y
∂
C
p
,
j
B
j
,
q
T
\frac{\partial y}{\partial A_{p,q}}=\sum_{i,j} \frac{\partial y}{\partial C_{i,j}}\frac{\partial C_{i,j}}{\partial A_{p,q}}=\sum_{j} \frac{\partial y}{\partial C_{p,j}}\frac{\partial C_{p,j}}{\partial A_{p,q}}=\sum_{j} \frac{\partial y}{\partial C_{p,j}}B_{q,j}=\sum_{j} \frac{\partial y}{\partial C_{p,j}}B_{j,q}^T
∂Ap,q∂y=i,j∑∂Ci,j∂y∂Ap,q∂Ci,j=j∑∂Cp,j∂y∂Ap,q∂Cp,j=j∑∂Cp,j∂yBq,j=j∑∂Cp,j∂yBj,qT
其中
∑
j
∂
y
∂
C
p
,
j
B
j
,
q
T
\sum_{j} \frac{\partial y}{\partial C_{p,j}}B_{j,q}^T
j∑∂Cp,j∂yBj,qT
是
j
j
j个标量对应相乘相加,可以看作是
j
j
j维行向量和
j
j
j维列向量做向量乘法,所以将
y
y
y对元素
A
p
,
q
A_{p,q}
Ap,q求导扩展到整个矩阵
A
A
A。表达式为:
∂
y
∂
A
=
∂
y
∂
C
B
T
\frac{\partial y}{\partial A}=\frac{\partial y}{\partial C}B^T
∂A∂y=∂C∂yBT
同理可得:
∂
y
∂
B
=
A
T
∂
y
∂
C
\frac{\partial y}{\partial B}=A^T\frac{\partial y}{\partial C}
∂B∂y=AT∂C∂y
矩阵乘法求导计算公式推导
最新推荐文章于 2025-04-06 18:58:42 发布