1、全微分
-
当X是矩阵, d y = t r ( ∂ y ∂ X T d X ) dy = tr(\frac{\partial y}{\partial X}^T dX) dy=tr(∂X∂yTdX)
-
当X是向量, d y = ∂ y ∂ X T d X = t r ( ∂ y ∂ X T d X ) dy = \frac{\partial y}{\partial X}^T dX=tr(\frac{\partial y}{\partial X}^T dX) dy=∂X∂yTdX=tr(∂X∂yTdX)
2、活用迹(tr)
( 1 ) a 是 标 量 , a = t r ( a ) (1) a是标量,a = tr(a) (1)a是标量,a=tr(a)
( 2 ) A , B 为 方 阵 , t r ( A B ) = t r ( B A ) (2) A,B为方阵,tr(AB) = tr(BA) (2)A,B为方阵,tr(AB)=tr(BA)
( 3 ) t r ( A ) = t r ( A T ) (3) tr(A) = tr(A^T) (3)tr(A)=tr(AT)
( 4 ) t r ( A + B ) = t r ( A ) + t r ( B ) (4) tr(A+B) = tr(A)+tr(B) (4)tr(A+B)=tr(A)+tr(B)
( 5 ) 微 分 d ( X T ) = ( d X ) T (5) 微分d(X^T) = (dX)^T (5)微分d(XT)=(dX)T
这些公式将用于下面的求导
3、矩阵求导例子
下面将展示一些用上面公式求矩阵导数的例子:
例1:
X = ( x 1 , . . . , x n ) T X = (x_1,...,x_n)^T X=(x1,...,xn)T是向量, A A A是与 X X X无关的矩阵: y = X T A X , 求 ∂ y ∂ X ? y = X^TAX ,求 \frac{\partial y}{\partial X}? y=XTAX,求∂X∂y?
全微分表达式:
d
y
=
(
d
X
T
)
A
X
+
X
T
A
(
d
X
)
dy = (dX^T)AX + X^TA(dX)
dy=(dXT)AX+XTA(dX)
由公式(1)得:
d y = t r ( ( d X T ) A X + X T A ( d X ) ) dy = tr((dX^T)AX + X^TA(dX)) dy=tr((dXT)AX+XTA(dX))
由公式(2)得:
d y = t r ( ( d X T ) A X ) + t r ( X T A ( d X ) ) dy = tr((dX^T)AX)+tr( X^TA(dX)) dy=tr((dXT)AX)+tr(XTA(dX))
由公式(5) (3)得:
d y = t r ( ( d X ) T A X ) + t r ( X T A ( d X ) ) = t r ( X T A T d X ) + t r ( X T A ( d X ) ) dy = tr((dX)^TAX)+tr( X^TA(dX))= tr(X^TA^TdX)+tr( X^TA(dX)) dy=tr((dX)TAX)+tr(XTA(dX))=tr(XTATdX)+tr(XTA(dX))
由公式(2)得:
d y = t r ( X T A T ( d X ) + X T A ( d X ) ) = t r ( X T ( A T + A ) d X ) dy = tr(X^TA^T(dX) + X^TA(dX))=tr(X^T(A^T+A)dX) dy=tr(XTAT(dX)+XTA(dX))=tr(XT(AT+A)dX)
∵ d y = t r ( ∂ y ∂ X T d X ) \because dy = tr(\frac{\partial y}{\partial X}^T dX) ∵dy=tr(∂X∂yTdX)
⇒ ∂ y ∂ X T = X T ( A T + A ) \Rightarrow \frac{\partial y}{\partial X}^T = X^T(A^T+A) ⇒∂X∂yT=XT(AT+A)
⇒ ∂ y ∂ X = ( X T ( A T + A ) ) T = ( A + A T ) X \Rightarrow \frac{\partial y}{\partial X} = (X^T(A^T+A))^T= (A+A^T)X ⇒∂X∂y=(XT(AT+A))T=(A+AT)X
例2:
y = t r ( A B ) , 求 ∂ y ∂ A ? y = tr(AB),求\frac{\partial y}{\partial A}? y=tr(AB),求∂A∂y?
全微分表达式:
d
y
=
t
r
[
(
d
A
)
B
]
dy = tr[(dA)B]
dy=tr[(dA)B]
由公式(2)得:
d
y
=
t
r
[
B
d
A
]
dy = tr[B\ dA]
dy=tr[B dA]
∵ d y = t r ( ∂ y ∂ X T d X ) \because dy = tr(\frac{\partial y}{\partial X}^T dX) ∵dy=tr(∂X∂yTdX)
⇒ ∂ y ∂ A T = B \Rightarrow \frac{\partial y}{\partial A}^T = B ⇒∂A∂yT=B
⇒ ∂ y ∂ A = B T \Rightarrow \frac{\partial y}{\partial A} =B^T ⇒∂A∂y=BT