【十分钟解决】矩阵求导术

求导后维度本质

求导的本质: d A d B \frac{dA}{dB} dBdAA中每一个元素对B中每一个元素求导。
考虑以下情景
1、A/B都是1 × \times × 1矩阵,求导得到 1 × 1 1 \times 1 1×1矩阵
2、A是 1 × p 1 \times p 1×p矩阵,B是 1 × n 1 \times n 1×n矩阵,A中每个元素对B中每个元素求导,排列组合得到 p × n p \times n p×n个元素
3、A是 q × p q \times p q×p矩阵,B是 m × n m \times n m×n矩阵,A中每个元素对B中每个元素求导,排列组合得到 q × p × m × n q \times p \times m\times n q×p×m×n个元素

向量求导的 Y X拉伸技巧

技巧核心:
1、标量不变,向量拉伸
2、前面横向拉,后面纵向拉(dy/dx的话就是Y横向拉伸,X纵向拉伸)

上一节写到,求导得到的元素非常多,因此需要有合适的方法把元素写出来。

例子1
f(x)是标量函数,X是 n × 1 n \times 1 n×1列向量(x1,x2,…,xn):
d f ( x ) d x \frac{d f(x)}{d x} dxdf(x)
标量不变,所以f(x)不需要拉伸。
X纵向拉伸,就称为一个向量:
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x_{1}} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right] dxdf(x)=x1f(x)x2f(x)xnf(x)
向量纵向拉伸,相当于对每个分量求导的结果拉成列向量。
最终得到 n × 1 n \times 1 n×1列向量。

例子2
f(x)是列向量函数 n × 1 n \times 1 n×1,X是标量:
同理,前面的Y横向拉,而且x不要拉,得到:
d f ( x ) d x = [ ∂ f ( x ) ∂ x ∂ f 2 ( x ) ∂ x ⋯ ⋅ ∂ f n ( x ) ∂ x ] \frac{d f(x)}{d x}=\left[\frac{\partial f(x)}{\partial x} \frac{\partial f_{2}(x)}{\partial x} \cdots \cdot \frac{\partial f_{n}(x)}{\partial x}\right] dxdf(x)=[xf(x)xf2(x)xfn(x)]
最终是一个横着的 1 × n 1 \times n 1×n向量。

例子3
两个都是向量
f ( x ) : [ f ( x ) f 2 ( x ) ⋮ f n ( x ) ] x = [ x 1 x 2 ⋮ ⋮ x n ] f(x):\left[\begin{array}{c} f(x) \\ f_{2}(x) \\ \vdots \\ f_{n}(x) \end{array}\right] \quad x=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ \vdots \\ x_{n} \end{array}\right] f(x):f(x)f2(x)fn(x)x=x1x2xn
首先,拉成一个列向量,每个都是f(x)对标量求导,之后再横向拉开:

第一步:
d f ( x ) d x = [ ∂ f ( x ) ∂ x ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right] dxdf(x)=xf(x)x2f(x)xnf(x)
第二步就得到:
[ ∂ f ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 1 ⋯ ∂ f n ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ∂ f 2 r x 1 ∂ x 2 ⋯ ∂ f n ( x ) ∂ x 2 ∂ f 1 ∂ x n ∂ f 2 ( x ) ∂ n ⋯ ∂ f n ( x ) ∂ λ n ] \left[\begin{array}{ll} \frac{\partial f(x)}{\partial x_{1}} & \frac{\partial f_{2}(x)}{\partial x_{1}} \cdots \frac{\partial f_{n}(x)}{\partial x_{1}} \\ \frac{\partial f_{1}(x)}{\partial x_{2}} & \frac{\partial f_{2} r x_{1}}{\partial x_{2}} \cdots \frac{\partial f_{n}(x)}{\partial x_{2}} \\ \frac{\partial f_{1}}{\partial x_{n}} & \frac{\partial f_{2}(x)}{\partial_{n}}\cdots \frac{\partial f_{n}(x)}{\partial \lambda_{n}} \end{array}\right] x1f(x)x2f1(x)xnf1x1f2(x)x1fn(x)x2f2rx1x2fn(x)nf2(x)λnfn(x)
得到了 n × n n \times n n×n个元素,符合第一节的原理。

推导矩阵求导公式

最简单的f(x):
f ( x ) = A T x = [ a 1 a 2 ⋮ a n ] n × 1 x = [ x 2 x 2 ⋮ x n ] f(x)=A^{T} x=\left[\begin{array}{c} a_{1} \\ a_{2} \\ \vdots \\ a_{n} \end{array}\right]_{n \times 1} \quad x=\left[\begin{array}{c} x_{2} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right] f(x)=ATx=a1a2ann×1x=x2x2xn
此时f(x)是标量( ∑ i = 1 n a i x i \sum_{i=1}^{n} a_{i} x_{i} i=1naixi),根据第二节有:

d f ( x ) d x = [ ∂ f ( x ) ∂ x ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ a 1 a 2 ⋮ a 3 ] = A \frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right] =\left[\begin{array}{c} a_1 \\ a_2 \\ \vdots \\ a_3 \end{array}\right] =A dxdf(x)=xf(x)x2f(x)xnf(x)=a1a2a3=A

此处 f ( x ) = A ⊤ ⋅ x = x ⊤ ⋅ A f(x)=A^{\top} \cdot x=x^{\top} \cdot A f(x)=Ax=xA即有很常用的 d A ⋅ x d x = d x ⊤ ⋅ A d x = A \frac{d A \cdot x}{d x}=\frac{d x^{\top} \cdot A}{d x} = A dxdAx=dxdxA=A

例子2
λ = [ x 1 x 2 ⋮ ⋮ x n ] A = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ a n 1 a n 2 ⋯ a n n ] \lambda=\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ \vdots \\ x_{n} \end{array}\right] \quad A=\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 n} \\ a_{21} & a_{22} & \cdots & a_{2 n} \\ \vdots & & & \\ a_{n 1} & a_{n 2} & \cdots & a_{n n} \end{array}\right] λ=x1x2xnA=a11a21an1a12a22an2a1na2nann

f ( x ) = x T A x f(x)=x^TAx f(x)=xTAx, 求 d f ( x ) d x \frac{df(x)}{dx} dxdf(x)

第一拉伸:
d f ( x ) d x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] \frac{d f(x)}{d x}=\left[\begin{array}{c} \frac{\partial f(x)}{\partial x_1} \\ \frac{\partial f(x)}{\partial x_{2}} \\ \vdots \\ \frac{\partial f(x)}{\partial x_{n}} \end{array}\right] dxdf(x)=x1f(x)x2f(x)xnf(x)
考虑原式中:
f ( x ) = ∑ i = 1 n ∑ j = 1 r a i j x j x j f(x)=\sum_{i=1}^{n} \sum_{j=1}^{r} a_{i j} x_{j} x_{j} f(x)=i=1nj=1raijxjxj
线性代数的东西,看起来复杂,其实没那么难,代入稍微化简一步之后:
[ ∑ j = 1 n a 1 j x j + ∑ i = 1 n a i 2 x i . . . ∑ j = 1 n a n j x j + ∑ i = 1 n a i n x i ] \left[\begin{array}{c} \sum_{j=1}^{n} a_{1j} x_{j}+\sum_{i=1}^{n} a_{i 2} x_{i} \\ ... \\ \sum_{j=1}^{n} a_{n j x_{j}}+\sum_{i=1}^{n} {a_{i n}} x_{i} \end{array}\right] j=1na1jxj+i=1nai2xi...j=1nanjxj+i=1nainxi
上式即为:
[ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ a n 1 a n 2 ⋯ a n n ] [ x 1 x 2 ⋮ x n ] + [ a 11 a 21 ⋯ a n 1 a 12 a 22 ⋯ a 12 ⋮ a i n a 2 n ⋯ a n n ] [ x 1 x 2 ⋮ x n ] = A X + A T X \left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 n} \\ a_{21} & a_{22} & \cdots & a_{2 n} \\ \vdots & & & \\ a_{n 1} & a_{n 2} & \cdots & a_{n n} \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]+\left[\begin{array}{ccc} a_{11} & a_{21} & \cdots & a_{n 1} \\ a_{12} & a_{22} & \cdots & a_{12} \\ \vdots & & \\ a_{in} & a_{2 n} & \cdots & a_{n n} \end{array}\right]\left[\begin{array}{c} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{array}\right]=AX+A^TX a11a21an1a12a22an2a1na2nannx1x2xn+a11a12aina21a22a2nan1a12annx1x2xn=AX+ATX

应用:最小二乘求导

笔者手推
在这里插入图片描述

应用2:和矩阵相求导

在这里插入图片描述
更正上图中最后一个向量的 1,1 位置应该是2xq
在这里插入图片描述
更正上图中最后矩阵的 1,1,1 位置应该是2xq

附录

常用公式
d V ⊤ U d x = ∂ U ∂ x V + ∂ V ∂ x U \frac{d V^{\top} U}{d x}=\frac{\partial U}{\partial x} V+\frac{\partial V}{\partial x} U dxdVU=xUV+xVU

d ( U + V ) d x = d U d x + d V d x \frac{d(U+V)}{d x}=\frac{d U}{d x}+\frac{d V}{d x} dxd(U+V)=dxdU+dxdV

The Matrix Cookbook

  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值