# 矩阵微分

$x\phantom{\rule{1px}{0ex}}x=\left[{x}_{1},...,{x}_{m}{\right]}^{T}\in {R}^{m}$$\pmb x = [x_1, ..., x_m]^T \in R^m$为实向量变元
$X\phantom{\rule{1px}{0ex}}X=\left[x\phantom{\rule{1px}{0ex}}{x}_{1},...,x\phantom{\rule{1px}{0ex}}{x}_{m}{\right]}^{T}\in {R}^{m×n}$$\pmb X = [\pmb x_1, ..., \pmb x_m]^T \in R^{m \times n}$为矩阵变元
$f\left(x\phantom{\rule{1px}{0ex}}x\right)\in R为实值标量函数,其变元x\phantom{\rule{1px}{0ex}}x\in {R}^{m},记做f:{R}^{m}\to R$$f(\pmb x) \in R 为实值标量函数,其变元\pmb x \in R^m,记做f:R^{m} \to R$
$f\left(X\phantom{\rule{1px}{0ex}}X\right)\in R为实值标量函数,其变元X\phantom{\rule{1px}{0ex}}X\in {R}^{m×n},记做f:{R}^{m×n}\to R$$f(\pmb X) \in R 为实值标量函数,其变元\pmb X \in R^{m \times n},记做f:R^{m \times n} \to R$
$f\phantom{\rule{1px}{0ex}}f\left(x\phantom{\rule{1px}{0ex}}x\right)\in {R}^{p}为p维实列向量函数,其变元x\phantom{\rule{1px}{0ex}}x\in {R}^{m},记做f:{R}^{m}\to {R}^{p}$$\pmb f(\pmb x) \in R^p 为p维实列向量函数,其变元\pmb x \in R^m,记做f:R^{m} \to R^p$
$f\phantom{\rule{1px}{0ex}}f\left(X\phantom{\rule{1px}{0ex}}X\right)\in {R}^{p}为p维实列向量函数,其变元X\phantom{\rule{1px}{0ex}}X\in {R}^{m×n},记做f:{R}^{m×n}\to {R}^{p}$$\pmb f(\pmb X) \in R^p 为p维实列向量函数,其变元\pmb X \in R^{m \times n},记做f:R^{m \times n} \to R^p$
$F\phantom{\rule{1px}{0ex}}F\left(x\phantom{\rule{1px}{0ex}}x\right)\in {R}^{p×q}为p×q实矩阵函数,其变元x\phantom{\rule{1px}{0ex}}x\in {R}^{m},记做f:{R}^{m}\to {R}^{p×q}$$\pmb F(\pmb x) \in R^{p \times q} 为p \times q 实矩阵函数,其变元\pmb x \in R^m,记做f:R^{m} \to R^{p \times q}$
$F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)\in {R}^{p×q}为p×q实矩阵函数,其变元X\phantom{\rule{1px}{0ex}}X\in {R}^{m×n},记做f:{R}^{m×n}\to {R}^{p×q}$$\pmb F(\pmb X) \in R^{p \times q} 为p \times q 实矩阵函数,其变元\pmb X \in R^{m \times n},记做f:R^{m \times n} \to R^{p \times q}$

### Jacobian 矩阵

${D}_{x\phantom{\rule{1px}{0ex}}x}\stackrel{def}{=}\left[\frac{\mathrm{\partial }}{\mathrm{\partial }{x}_{1}},...,\frac{\mathrm{\partial }}{\mathrm{\partial }{x}_{m}}\right]$

${D}_{x\phantom{\rule{1px}{0ex}}x}f\left(x\phantom{\rule{1px}{0ex}}x\right)=\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}^{T}}=\left[\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{m}}\right]$

${D}_{X\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)=\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }X\phantom{\rule{1px}{0ex}}{X}^{T}}$

${D}_{vecX\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)=\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }vec\left(X\phantom{\rule{1px}{0ex}}X{\right)}^{T}}=\left[\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1}},...\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{m1}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1n}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{mn}}\right]$

$当F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)为p×q实矩阵函数时,定义他的Jacobian矩阵如下$$当\pmb F(\pmb X)为p \times q 实矩阵函数时,定义他的Jacobian矩阵如下$
${D}_{X\phantom{\rule{1px}{0ex}}X}F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)\stackrel{def}{=}\frac{\mathrm{\partial }vec\left(F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)\right)}{\mathrm{\partial }\left(vecX\phantom{\rule{1px}{0ex}}X{\right)}^{T}}$

### 梯度矩阵

${\mathrm{\nabla }}_{x\phantom{\rule{1px}{0ex}}x}\stackrel{def}{=}\left[\frac{\mathrm{\partial }}{\mathrm{\partial }{x}_{1}},...,\frac{\mathrm{\partial }}{\mathrm{\partial }{x}_{m}}{\right]}^{T}$

${\mathrm{\nabla }}_{x\phantom{\rule{1px}{0ex}}x}f\left(x\phantom{\rule{1px}{0ex}}x\right)=\left[\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{m}}{\right]}^{T}$

${\mathrm{\nabla }}_{vecX\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)=\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }vec\left(X\phantom{\rule{1px}{0ex}}X\right)}=\left[\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1}},...\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{m1}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1n}},...,\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{mn}}{\right]}^{T}$

${\mathrm{\nabla }}_{X\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)=\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }X\phantom{\rule{1px}{0ex}}X}$

${\mathrm{\nabla }}_{X\phantom{\rule{1px}{0ex}}X}F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)\stackrel{def}{=}\frac{\mathrm{\partial }vec\left(F\phantom{\rule{1px}{0ex}}F\left(X\phantom{\rule{1px}{0ex}}X\right)\right)}{\mathrm{\partial }\left(vecX\phantom{\rule{1px}{0ex}}X{\right)}^{T}}$

### 标量函数f(xx)$f\left(x\phantom{\rule{1px}{0ex}}x\right)$$f(\pmb x)$与Jacobian矩阵

$df\left(x\phantom{\rule{1px}{0ex}}x\right)=\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{1}}d{x}_{1}+...+\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }{x}_{m}}d{x}_{m}=\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}^{T}}dx\phantom{\rule{1px}{0ex}}x$

$记A\phantom{\rule{1px}{0ex}}A=\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}^{T}}$$记\pmb A = \cfrac {\partial f(\pmb x)}{\partial \pmb x ^T}$,则有如下等价关系
$df\left(x\phantom{\rule{1px}{0ex}}x\right)=tr\left(Adx\phantom{\rule{1px}{0ex}}x\right)\phantom{\rule{thickmathspace}{0ex}}⟺\phantom{\rule{thickmathspace}{0ex}}{D}_{x\phantom{\rule{1px}{0ex}}x}f\left(x\phantom{\rule{1px}{0ex}}x\right)=\frac{\mathrm{\partial }f\left(x\phantom{\rule{1px}{0ex}}x\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}^{T}}=A$

### 标量函数f(XX)$f\left(X\phantom{\rule{1px}{0ex}}X\right)$$f(\pmb X)$与Jacobian矩阵

$\begin{array}{rl}df\left(X\phantom{\rule{1px}{0ex}}X\right)& =\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}_{1}^{T}}dx\phantom{\rule{1px}{0ex}}{x}_{1}+...+\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }x\phantom{\rule{1px}{0ex}}{x}_{n}^{T}}dx\phantom{\rule{1px}{0ex}}{x}_{n}\\ & =\frac{\mathrm{\partial }f\left(X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }ve{c}^{T}\left(\phantom{\rule{1px}{0ex}}\left(X\right)}d\left(vecX\phantom{\rule{1px}{0ex}}X\right)\\ & ={D}_{vecX\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)d\left(vecX\phantom{\rule{1px}{0ex}}X\right)\end{array}$

$\begin{array}{rl}df\left(X\phantom{\rule{1px}{0ex}}X\right)& =\left(vec\left({A}^{T}\right){\right)}^{T}d\left(vecX\phantom{\rule{1px}{0ex}}X\right)\end{array}$

$df\left(X\phantom{\rule{1px}{0ex}}X\right)=tr\left(A\phantom{\rule{1px}{0ex}}AdX\phantom{\rule{1px}{0ex}}X\right)$

Jacobian矩阵可以通过以下式子等价确定

$df\left(x\phantom{\rule{1px}{0ex}}x\right)=tr\left(Adx\phantom{\rule{1px}{0ex}}x\right)\phantom{\rule{thickmathspace}{0ex}}⟺\phantom{\rule{thickmathspace}{0ex}}{D}_{x\phantom{\rule{1px}{0ex}}x}f\left(x\phantom{\rule{1px}{0ex}}x\right)=A\phantom{\rule{0ex}{0ex}}df\left(X\phantom{\rule{1px}{0ex}}X\right)=tr\left(AdX\phantom{\rule{1px}{0ex}}X\right)\phantom{\rule{thickmathspace}{0ex}}⟺\phantom{\rule{thickmathspace}{0ex}}{D}_{X\phantom{\rule{1px}{0ex}}X}f\left(X\phantom{\rule{1px}{0ex}}X\right)=A$

$\begin{array}{rl}dtr\left(X\phantom{\rule{1px}{0ex}}{X}^{T}X\phantom{\rule{1px}{0ex}}X\right)& =tr\left(d\left(X\phantom{\rule{1px}{0ex}}{X}^{T}X\phantom{\rule{1px}{0ex}}X\right)\right)\\ & =tr\left(d\left(X\phantom{\rule{1px}{0ex}}X{\right)}^{T}X\phantom{\rule{1px}{0ex}}X+X\phantom{\rule{1px}{0ex}}{X}^{T}dX\phantom{\rule{1px}{0ex}}X\right)\\ & =tr\left(d\left(X\phantom{\rule{1px}{0ex}}X{\right)}^{T}X\phantom{\rule{1px}{0ex}}X\right)+tr\left(X\phantom{\rule{1px}{0ex}}{X}^{T}dX\phantom{\rule{1px}{0ex}}X\right)\\ & =tr\left(X\phantom{\rule{1px}{0ex}}{X}^{T}d\left(X\phantom{\rule{1px}{0ex}}X\right)\right)+tr\left(X\phantom{\rule{1px}{0ex}}{X}^{T}dX\phantom{\rule{1px}{0ex}}X\right)\\ & =tr\left(2X\phantom{\rule{1px}{0ex}}{X}^{T}d\left(X\phantom{\rule{1px}{0ex}}X\right)\right)\end{array}$

$\frac{\mathrm{\partial }tr\left(X\phantom{\rule{1px}{0ex}}{X}^{T}X\phantom{\rule{1px}{0ex}}X\right)}{\mathrm{\partial }X\phantom{\rule{1px}{0ex}}X}=\left(2X\phantom{\rule{1px}{0ex}}{X}^{T}{\right)}^{T}=2X\phantom{\rule{1px}{0ex}}X$

The Matrix Cookbook.