机器学习或优化领域经常有对向量的微分,这里补一下相关公式.
含参矩阵函数的微分
- d d t e A t = A e A t = e A t A ; \frac{\mathrm{d}}{\mathrm{d}t}e^{\bm{A}t}=\bm{A}e^{\bm{A}t}=e^{\bm{A}t}\bm{A}; dtdeAt=AeAt=eAtA;
- d d t cos A t = − A ( sin A t ) = − ( sin A t ) A ; \frac{\mathrm{d}}{\mathrm{d}t}\cos\bm{A}t=-\bm{A}(\sin\bm{A}t)=-(\sin\bm{A}t)\bm{A}; dtdcosAt=−A(sinAt)=−(sinAt)A;
- d d t sin A t = A ( cos A t ) = ( cos A t ) A . \frac{\mathrm{d}}{\mathrm{d}t}\sin\bm{A}t=\bm{A}(\cos\bm{A}t)=(\cos\bm{A}t)\bm{A}. dtdsinAt=A(cosAt)=(cosAt)A.
函数对向量的微分
运算法则:
- 线性法则: ∇ x ( a f ( x ) + b g ( x ) ) = a ∇ x f ( x ) + b ∇ x g ( x ) , a , b ∈ R ; \nabla_\bm x(af(\bm x)+bg(\bm x))=a\nabla_\bm xf(\bm x)+b\nabla_\bm xg(\bm x),a,b\in \mathbb R; ∇x(af(x)+bg(x))=a∇xf(x)+b∇xg(x),a,b∈R;
- 乘积法则: ∇ x ( f ( x ) g ( x ) ) = ∇ x f ( x ) g ( x ) + f ( x ) ∇ x g ( x ) ; \nabla_\bm x (f(\bm x)g(\bm x))=\nabla_\bm x f(\bm x)g(\bm x) + f(\bm x)\nabla_\bm x g(\bm x); ∇x(f(x)g(x))=∇xf(x)g(x)+f(x)∇xg(x);
- 链式法则: ∇ x ( f ( y ( x ) ) ) = ∇ x [ y ( x ) ] ⊤ ∇ y f ( y ) . \nabla_\bm x(f(y(\bm x)))=\nabla_\bm x[y(\bm x)]^\top\nabla_\bm yf(\bm y). ∇x(f(y(x)))=∇x[y(x)]⊤∇yf(y).
常用公式
其中的 A 、 b \bm{A}、\bm b A、b分别是与 x \bm x x无关的常矩阵和常向量.
f ( x ) f(\bm x) f(x) | ∇ x f ( x ) \nabla_\bm x f(\bm x) ∇xf(x) |
---|---|
a x a\bm x ax | a a a |
b ⊤ x 或 x ⊤ b \bm b^\top\bm x或\bm x^\top \bm b b⊤x或x⊤b | b \bm b b |
x ⊤ x 或 ∥ x ∥ 2 2 \bm x^\top\bm x或\|\bm x\|_2^2 x⊤x或∥x∥22 | 2 x 2\bm x 2x |
e − 1 2 x ⊤ A x e^{-\frac{1}{2}\bm x^\top\bm{A}\bm x} e−21x⊤Ax | e − 1 2 x ⊤ A x A x e^{-\frac{1}{2}\bm x^\top\bm{A}\bm x}\bm A\bm x e−21x⊤AxAx |
b ⊤ A x \bm b^\top\bm A\bm x b⊤Ax | A ⊤ b \bm A^\top\bm b A⊤b |
x ⊤ A b \bm x^\top\bm A\bm b x⊤Ab | A b \bm {Ab} Ab |
x ⊤ A x \bm x^\top\bm{Ax} x⊤Ax | ( A + A ⊤ ) x (\bm{A+A}^\top)\bm x (A+A⊤)x |