Matrix Calculus
- Matrix Calculus
- Derivative of the vector with respect to vector
- Derivative of a Scalar with Respect to Vector
- Derivative of Vector with Respect to Scalar
- Chain rule for Vectors
- Derivative of Scalar with Respect to Matrix
- Product rules for matrix-functions
- Derivatives of Matrices Vectors and Scalar Forms
- Derivatives of Traces
Derivative of the vector with respect to vector
x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥,y=⎡⎣⎢⎢⎢⎢⎢y1y2⋮ym⎤⎦⎥⎥⎥⎥⎥
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y1∂x2⋮∂y1∂xn∂y2∂x1∂y2∂x2⋮∂y2∂xn⋯⋯⋱⋯∂ym∂x1∂ym∂x2⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
Derivative of a Scalar with Respect to Vector
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x1∂y∂x2⋮∂y∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
Derivative of Vector with Respect to Scalar
∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x]EXAMPLE
1、
y=Axwhere A is a square matrix of order n
y=Ax=[a1a2⋯an]⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥=∑i=1n(aixi)∂y∂xj=∂∑i=1n(aixi)∂xj=∂(ajxj)∂xj=aTj,∂y∂x=AT 2、
y=xTA
y=xTA=[xTa1xTa2⋯xTan]=[∑i=1nxiai1∑i=1nxiai2⋯∑i=1nxiain]∂y∂xj=(∂yT∂xj)T=⎛⎝⎜⎜⎜⎜∂[∑i=1nxiai1∑i=1nxiai2⋯∑i=1nxiain]T∂xj⎞⎠⎟⎟⎟⎟T=((aj)T)T=aj∂y∂x=A3、
y=xTx
y=xTx=[x1x2⋯xn]⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥=(∑i=1nxixi)∂y∂xj=2xj∂y∂x=2x4、
y=xTAxwhere A is a square matrix of order ny=xTAx=[xTa1xTa2⋯xTan]x=∑j=1n((xTaj)xj)=∑j=1n(∑i=1nxiaij)xj
∂y∂xk=∂∑j=1n(∑i=1nxiaij)xj∂xk=∑j=1n(∂(∑i=1nxiaij)xj+(∑i=1nxiaij)∂xj)∂xk=∑j=1n(akjxj)+∑i=1nxiaik=aTkx+akx∂y∂x=⎡⎣⎢⎢⎢⎢⎢aT1x+a1xaT2x+a2x⋮aTnx+anx⎤⎦⎥⎥⎥⎥⎥=ATx+Ax
Chain rule for Vectors
x=⎡⎣⎢⎢⎢⎢x1x2⋮xn⎤⎦⎥⎥⎥⎥,y=⎡⎣⎢⎢⎢⎢⎢y1y2⋮yr⎤⎦⎥⎥⎥⎥⎥,z=⎡⎣⎢⎢⎢⎢z1z2⋮zm⎤⎦⎥⎥⎥⎥
(∂z∂x)T=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂z1∂x1∂z2∂x1⋮∂zm∂x1∂z1∂x2∂z2∂x2⋮∂zm∂x2⋯⋯⋱⋯∂z1∂xn∂z2∂xn⋮∂zm∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥,∂zi∂xj=∑q=1r∂zi∂yq∂yq∂xj(∂z∂x)T=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∑q=1r∂z1∂yq∂yq∂x1∑q=1r∂z2∂yq∂yq∂x1⋮∑q=1r∂zm∂yq∂yq∂x1∑q=1r∂z1∂yq∂yq∂x2∑q=1r∂z2∂yq∂yq∂x2⋮∑q=1r∂zm∂yq∂yq∂x1⋯⋯⋱⋯∑q=1r∂z1∂yq∂yq∂xn∑q=1r∂z2∂yq∂yq∂xn⋮∑q=1r∂zm∂yq∂yq∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂z1∂y1∂z2∂y1⋮∂zm∂y1∂z1∂y2∂z2∂y2⋮∂zm∂y2⋯⋯⋱⋯∂z1∂yr∂z2∂yr⋮∂zm∂yr⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂yr∂x1∂y1∂x2∂y2∂x2⋮∂yr∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂yr∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=(∂z∂y)T(∂y∂x)T=(∂y∂x∂z∂y)T∂z∂x=∂y∂x∂z∂y
Derivative of Scalar with Respect to Matrix
X=⎡⎣⎢⎢⎢⎢⎢x11x21⋮xm1x12x22⋮xm2⋯⋯⋱⋯x1nx2n⋮xmn⎤⎦⎥⎥⎥⎥⎥,∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x21⋮∂y∂xm1∂y∂x12∂y∂x22⋮∂y∂xm2⋯⋯⋱⋯∂y∂x1n∂y∂x2n⋮∂y∂xmn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥=[∂y∂Xij]EXAMPLE
1、
y=tr(X)=∑i=1nxii,∂y∂X=I2、
∂|Y|∂X,∂|Y|∂xrs=∑i∑j∂|Y|∂yij∂yij∂xrs=∑i∑jYij∂yij∂xrswhere where Yi,j is the cofactor of the element yi,j in |Y|3、
U∈Rn×k,V∈Rm×k,YU∈Rn×m
J(U,V)=∥∥UVT−Y∥∥2F+λ2(∥U∥2F+∥V∥2F)=∑i,j(∑aUiaVja−Yij)2+λ2⎛⎝∑i,aU2ia+∑j,aV2ja⎞⎠
∂J∂Uia=2∑j(∑aUiaVja−Yij)Vja+λUia=2∑j(UiVTj−Yij)Vja+λUia=2(UiVT−Yi)V.a+λUia
∂J∂Ui=2(UiVT−Yi)V+λUi
∂J∂U=2(UVT−Y)V+λUr
∂J∂V=2(UVT−Y)TU+λV
Product rules for matrix-functions
∇X(f(X)Tg(X))=∇X(f(X))g(X)+∇X(g(X))f(X)
Derivatives of Matrices, Vectors and Scalar Forms
Derivatives of Traces
Basics
Tr(A)=∑iAiiTr(AB)=Tr(BA)Tr(A+B)=Tr(A)+Tr(B)Tr(ABC)=Tr(BCA)=Tr(CAB)aTa=Tr(aaT)
总结记忆:若转置矩阵(向量)对原始矩阵(向量)求偏导—左右两边系数改变顺序,不转置
**若原始矩阵(向量)对原始矩阵(向量)求偏导—左右两边系数转置,顺序不变