# title: 闲话矩阵求导 原始文件没有办法把latex公式正常显示，所以一个一个弄出来了，保留了原来的公式。原始文章来自《闲话矩阵求导》。

## 1 布局(Layout)

$$\mathbf{y}=\begin{bmatrix}y_{1}\\ y_{2}\\ \vdots\\ y_{m} \end{bmatrix}$$


$$\frac{\partial\mathbf{y}}{\partial x}=\begin{bmatrix}\frac{\partial y_{1}}{\partial x}\\ \frac{\partial y_{2}}{\partial x}\\ \vdots\\ \frac{\partial y_{m}}{\partial x} \end{bmatrix}$$


$$\frac{\partial\mathbf{y}}{\partial x}=\begin{bmatrix}\frac{\partial y_{1}}{\partial x} & \frac{\partial y_{2}}{\partial x} & \cdots & \frac{\partial y_{m}}{\partial x}\end{bmatrix}$$


## 2 基本的求导规则（定义）

$$\frac{\partial y}{\partial\mathbf{x}}=\begin{bmatrix}\frac{\partial y}{\partial x_{1}}\\ \frac{\partial y}{\partial x_{2}}\\ \vdots\\ \frac{\partial y}{\partial x_{m}} \end{bmatrix}$$


$$\mathbf{x}=\begin{bmatrix}x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{bmatrix}$$


$$\mathbf{y}=\begin{bmatrix}y_{1}\\ y_{2}\\ \vdots\\ y_{m} \end{bmatrix}$$


$$\frac{\partial\mathbf{y}}{\partial\mathbf{x}}=\begin{bmatrix}\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}}\\ \frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{2}} & \cdots & \frac{\partial y_{m}}{\partial x_{2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{1}}{\partial x_{n}} & \frac{\partial y_{2}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} \end{bmatrix}$$


### 标量对矩阵求导，

$$\frac{\partial y}{\partial\mathbf{X}}=\begin{bmatrix}\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1q}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2q}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \cdots & \frac{\partial y}{\partial x_{pq}} \end{bmatrix}$$


### 矩阵对标量求导，

$$\frac{\partial\mathbf{y}}{\partial x}=\begin{bmatrix}\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{21}}{\partial x} & \cdots & \frac{\partial y_{m1}}{\partial x}\\ \frac{\partial y_{12}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{m2}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{1n}}{\partial x} & \frac{\partial y_{2n}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \end{bmatrix}$$


• 向量对标量
• 标量对向量
• 向量对向量
• 矩阵对标量
• 标量对矩阵
这些定义我在上面都已经一一列出。接下来是时候去看一些更加复杂的东西了。

## 3 维度分析

!$(\mathbf{Ax})_{i}=a_{i1}x_{1}+a_{i2}x_{2}+\cdots+a_{in}x_{n}$,于是利用向量对向量求导法则，我们有

$$\frac{\partial\mathbf{\mathbf{Ax}})}{\partial\mathbf{x}}=\begin{bmatrix}\frac{\partial(\mathbf{Ax})_{1}}{\partial x_{1}} & \frac{\partial(\mathbf{Ax})_{2}}{\partial x_{1}} & \cdots & \frac{\partial(\mathbf{Ax})_{m}}{\partial x_{1}}\\ \frac{\partial(\mathbf{Ax})_{1}}{\partial x_{2}} & \frac{\partial(\mathbf{Ax})_{2}}{\partial x_{2}} & \cdots & \frac{\partial(\mathbf{Ax})_{m}}{\partial x_{2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial(\mathbf{Ax})_{1}}{\partial x_{n}} & \frac{\partial(\mathbf{Ax})_{2}}{\partial x_{n}} & \cdots & \frac{\partial(\mathbf{Ax})_{m}}{\partial x_{n}} \end{bmatrix}=\begin{bmatrix}a_{11} & a_{21} & \cdots & a_{m1}\\ a_{12} & a_{22} & \cdots & a_{m2}\\ \vdots & \vdots & \ddots & \vdots\\ a_{1n} & a_{2n} & \cdots & a_{mn} \end{bmatrix}=\mathbf{A}^{\mathrm{T}}$$


1、

$$\frac{\partial\mathbf{Au}}{\partial\mathbf{x}}=\frac{\partial\mathbf{u}}{\partial\mathbf{x}}\mathbf{A}^{\mathrm{T}}$$


a，u是和x相关的标量

$$\frac{\partial a\mathbf{u}}{\partial\mathbf{x}}=a\frac{\partial\mathbf{u}}{\partial\mathbf{x}}+\frac{\partial a}{\partial\mathbf{x}}\mathbf{u}^{\mathrm{T}}$$


$$\frac{\partial\mathbf{x}^{\mathrm{T}}\mathbf{Ay}}{\partial\mathbf{x}},\mathbf{x}\in\mathbb{R}^{m\times1},\mathbf{y}\in\mathbb{R}^{n\times1}$$


$$\frac{\partial\text{(}\mathbf{x}^{\mathrm{T}}\mathbf{A)y}}{\partial\mathbf{x}}$$


$$\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\in\mathbb{R}^{m\times n}$$


,另一个是

$$\frac{\partial\mathbf{x}^{\mathrm{T}}\mathbf{A}}{\partial\mathbf{x}}=\mathbf{A}\in\mathbb{R}^{m\times n}$$


,同样通过分析维度，我们可以得到

$$\frac{\partial\text{(}\mathbf{x}^{\mathrm{T}}\mathbf{A)y}}{\partial\mathbf{x}}=\frac{\partial\mathbf{y}}{\partial\mathbf{x}}\mathbf{A}^{\mathrm{T}}\mathbf{x}+\mathbf{Ay}$$


$$\frac{\partial\mathbf{x}^{\mathrm{T}}\mathbf{Ax}}{\partial\mathbf{x}}=(\mathbf{A}^{\mathrm{T}}+\mathbf{A})\mathbf{x}$$


$$\frac{\partial\mathbf{a}^{\mbox{T}}\mathbf{xx}^{\mbox{T}}\mathbf{b}}{\partial\mathbf{x}},\mathbf{a,b,x}\in\mathbb{R}^{m\times1} \frac{\partial\mathbf{a}^{\mbox{T}}\mathbf{xx}^{\mbox{T}}\mathbf{b}}{\partial\mathbf{x}}=\frac{\partial(\mathbf{a}^{\mbox{T}}\mathbf{x)(x}^{\mbox{T}}\mathbf{b)}}{\partial\mathbf{x}}$$


$$\frac{\partial(\mathbf{a}^{\mbox{T}}\mathbf{x)}}{\partial\mathbf{x}}=\mathbf{a},\frac{\partial(\mathbf{x}^{\mbox{T}}\mathbf{b)}}{\partial\mathbf{x}}=\mathbf{b}$$


$$\frac{\partial\mathbf{a}^{\mbox{T}}\mathbf{xx}^{\mbox{T}}\mathbf{b}}{\partial\mathbf{x}}=\frac{\partial(\mathbf{a}^{\mbox{T}}\mathbf{x)(x}^{\mbox{T}}\mathbf{b)}}{\partial\mathbf{x}}=\mathbf{a}\mathbf{x}^{\mbox{T}}\mathbf{b}+\mathbf{ba}^{\mbox{T}}\mathbf{x}=(\mathbf{ab}^{\mathrm{T}}+\mathbf{ba}^{\mathrm{T}})\mathbf{x}$$


## 4 标量对矩阵求导（微分形式）

• 乘积法则成立
• 迹和微分可交换
好了，现在你应该已经忘记分子布局了吧，不过不要紧，所有之前的结果转置一下，就得到了分子布局下的结果。
接下来请注意，当我们谈论微分的时候，只有在分子布局下才是有意义的。
（Warning：微分只有分子布局，没有分母布局）
首先我们指出

image.png

$$\mathrm{d}\mathbf{Y}=\mathrm{tr}(\mathbf{A}\mathrm{d}\mathbf{X})$$


$$\frac{\partial\mathbf{Y}}{\partial\mathbf{X}}=\mathbf{A}$$


$$\frac{\partial\mathbf{Y}}{\partial\mathbf{X}}=\mathbf{A}^{\mbox{T}}$$


$$\mathrm{d}\mathbf{Y}=\mathrm{tr}(\mathbf{A}\mathrm{d}\mathbf{X})$$


$$\frac{\partial\mathbf{Y}}{\partial\mathbf{X}}=\mathbf{A}^{\mbox{T}}$$


$$\mathrm{d}\mathrm{tr}(\mathbf{AX})=\mathrm{tr}(\mathrm{d}(\mathbf{AX}))=\mathrm{tr}(\mathbf{A}\mathrm{d}\mathbf{X})$$


• 矩阵的迹和转置的迹相同（转置性质）
• 矩阵乘积的迹和矩阵乘积轮换对称后的迹相同（循环排列）
考虑

\begin{aligned} \mbox{d tr(}\mathbf{X}^{\mbox{T}}\mathbf{AX}) & = & \mbox{tr}(\mbox{d}(\mathbf{X}^{\mbox{T}}\mathbf{AX}))\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X}+\mbox{d}(\mathbf{X}^{\mbox{T}}\mathbf{A})\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X}+\mbox{d}(\mathbf{X}^{\mbox{T}}\mathbf{A})\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X}+\mbox{d}(\mathbf{A}^{\mbox{T}}\mathbf{X}){}^{\mbox{T}}\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X})+\mbox{tr}(\mbox{d}(\mathbf{A}^{\mbox{T}}\mathbf{X}){}^{\mbox{T}}\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X})+\mbox{tr}(\mbox{d}(\mathbf{A}^{\mbox{T}}\mathbf{X}){}^{\mbox{T}}\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X})+\mbox{tr}(\mathbf{X}^{\mbox{T}}\mbox{d}(\mathbf{A}^{\mbox{T}}\mathbf{X}))\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X})+\mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}^{\mbox{T}}\mbox{d}\mathbf{X})\\ & = & \mbox{tr}(\mathbf{X}^{\mbox{T}}\mathbf{A}\mbox{d}\mathbf{X}+\mathbf{X}^{\mbox{T}}\mathbf{A}^{\mbox{T}}\mbox{d}\mathbf{X})\\ & = & \mbox{tr}((\mathbf{X}^{\mbox{T}}\mathbf{A}+\mathbf{X}^{\mbox{T}}\mathbf{A}^{\mbox{T}})\mbox{d}\mathbf{X})\end{aligned}


$$\frac{\partial\mathrm{tr}(\mathbf{X}^{\mathrm{T}}\mathbf{AX})}{\partial\mathbf{X}}=\text{(}\mathbf{X}^{\mbox{T}}\mathbf{A}+\mathbf{X}^{\mbox{T}}\mathbf{A}^{\mbox{T}})^{\mbox{T}}=(\mathbf{A}+\mathbf{A}^{\mbox{T}})\mathbf{X}$$