结论1. 若
L
=
A
B
C
D
E
F
L=ABCDEF
L=ABCDEF, 则
∂
L
∂
A
=
(
B
C
D
E
F
)
T
,
∂
L
∂
B
=
A
T
(
C
D
E
F
)
T
\frac{\partial L}{\partial A}=(BCDEF)^T, \frac{\partial L}{\partial B}=A^T(CDEF)^T
∂A∂L=(BCDEF)T,∂B∂L=AT(CDEF)T
∂
L
∂
C
=
(
A
B
)
T
(
D
E
F
)
T
,
∂
L
∂
D
=
(
A
B
C
)
T
(
E
F
)
T
\frac{\partial L}{\partial C}=(AB)^T(DEF)^T, \frac{\partial L}{\partial D}=(ABC)^T(EF)^T
∂C∂L=(AB)T(DEF)T,∂D∂L=(ABC)T(EF)T
∂
L
∂
E
=
(
A
B
C
D
)
T
F
T
,
∂
L
∂
F
=
(
A
B
C
D
E
)
T
\frac{\partial L}{\partial E}=(ABCD)^TF^T, \frac{\partial L}{\partial F}=(ABCDE)^T
∂E∂L=(ABCD)TFT,∂F∂L=(ABCDE)T
这个还是比较容易看出规律的,L对右边项中间某个张量的偏导等于该张量左边所有的转置乘右边所有的转置。
结论2. 若
O
p
×
n
=
V
p
×
m
H
m
×
n
O_{p\times n}=V_{p\times m}H_{m\times n}
Op×n=Vp×mHm×n,
L
o
s
s
Loss
Loss是标量(scalar)则,
∂
L
o
s
s
∂
H
=
∂
O
∂
H
∂
L
o
s
s
∂
O
\frac{\partial Loss}{\partial H}=\frac{\partial O}{\partial H}\frac{\partial Loss}{\partial O}
∂H∂Loss=∂H∂O∂O∂Loss
∂
L
o
s
s
∂
V
=
∂
L
o
s
s
∂
O
∂
O
∂
V
\frac{\partial Loss}{\partial V}=\frac{\partial Loss}{\partial O}\frac{\partial O}{\partial V}
∂V∂Loss=∂O∂Loss∂V∂O
下证明之:
∵
L
o
s
s
∈
R
,
令
L
o
s
s
=
A
1
×
p
O
p
×
n
B
n
×
1
\because Loss \in \mathbb{R}, \quad令 \quad Loss = A_{1\times p}O_{p\times n}B_{n\times 1}
∵Loss∈R,令Loss=A1×pOp×nBn×1又由已知
O
p
×
n
=
V
p
×
m
H
m
×
n
O_{p\times n}=V_{p\times m}H_{m\times n}
Op×n=Vp×mHm×n
∴
L
o
s
s
=
A
1
×
p
V
p
×
m
H
m
×
n
B
n
×
1
\therefore Loss=A_{1\times p}V_{p\times m}H_{m\times n}B_{n\times 1}
∴Loss=A1×pVp×mHm×nBn×1
由结论1
∂
L
o
s
s
∂
H
=
(
A
V
)
T
B
T
=
V
T
A
T
B
T
=
V
T
(
A
T
B
T
)
\frac{\partial Loss}{\partial H}=(AV)^TB^T=V^TA^TB^T=V^T(A^TB^T)
∂H∂Loss=(AV)TBT=VTATBT=VT(ATBT)
∂
O
∂
H
=
V
T
,
∂
L
o
s
s
∂
O
=
A
T
B
T
\frac{\partial O}{\partial H}=V^T,\frac{\partial Loss}{\partial O}=A^TB^T
∂H∂O=VT,∂O∂Loss=ATBT
∴
∂
L
o
s
s
∂
H
=
∂
O
∂
H
∂
L
o
s
s
∂
O
\therefore \frac{\partial Loss}{\partial H}=\frac{\partial O}{\partial H}\frac{\partial Loss}{\partial O}
∴∂H∂Loss=∂H∂O∂O∂Loss
同理可证
∂
L
o
s
s
∂
V
=
∂
L
o
s
s
∂
O
∂
O
∂
V
\frac{\partial Loss}{\partial V}=\frac{\partial Loss}{\partial O}\frac{\partial O}{\partial V}
∂V∂Loss=∂O∂Loss∂V∂O
结论3.
(
∂
C
∂
A
T
)
T
=
∂
C
∂
A
(\frac{\partial C}{\partial A^T})^T=\frac{\partial C}{\partial A}
(∂AT∂C)T=∂A∂C
结论4. 若
C
=
A
T
B
C=A^TB
C=ATB, 则由结论3易证
∂
C
∂
A
=
B
\frac{\partial C}{\partial A} = B
∂A∂C=B
结论5. 若
y
=
w
T
X
w
y=w^TXw
y=wTXw, 则
∂
y
∂
w
=
(
X
+
X
T
)
w
\frac{\partial y}{\partial w}=(X+X^T)w
∂w∂y=(X+XT)w
特别地,若
X
X
X是实对称矩阵,则有
X
=
X
T
X=X^T
X=XT,故
∂
y
∂
w
=
2
X
w
\frac{\partial y}{\partial w}=2Xw
∂w∂y=2Xw