文章目录
前言
矩阵求导的学习记录
一、标量对向量求导
标量对向量求导,实际上是标量对向量中的每个元素求偏导,然后再组成一个和向量形状相同的向量。也就是:
∂
y
∂
x
⃗
=
(
∂
y
∂
x
1
,
∂
y
∂
x
2
…
∂
y
∂
x
n
)
T
\frac{\partial y}{\partial \vec{x}} = (\frac{\partial y}{\partial x_1},\frac{\partial y}{\partial x_2}\dots \frac{\partial y}{\partial x_n})^T
∂x∂y=(∂x1∂y,∂x2∂y…∂xn∂y)T
式中
y
y
y是一个标量,
x
=
(
x
1
,
x
2
…
x
n
)
T
x = (x_1,x_2\dots x_n)^T
x=(x1,x2…xn)T为一个n维向量;
二、例子
1. y = w T ∗ x y = w^T*x y=wT∗x
这是在信号处理中比较常见的一种加权求和形式。
实际上我们将其乘积结果展开可以得到:
y
=
w
1
x
1
+
w
2
x
2
+
…
w
n
x
n
y = w_1x_1 + w_2x_2 + \dots w_nx_n
y=w1x1+w2x2+…wnxn
那么根据我们以上理论
∂
y
∂
x
⃗
=
(
∂
w
1
x
1
+
w
2
x
2
+
…
w
n
x
n
∂
x
1
,
∂
w
1
x
1
+
w
2
x
2
+
…
w
n
x
n
∂
x
2
…
∂
w
1
x
1
+
w
2
x
2
+
…
w
n
x
n
∂
x
n
)
T
\frac{\partial y}{\partial \vec{x}} = (\frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_1},\frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_2}\dots \frac{\partial w_1x_1 + w_2x_2 + \dots w_nx_n}{\partial x_n})^T
∂x∂y=(∂x1∂w1x1+w2x2+…wnxn,∂x2∂w1x1+w2x2+…wnxn…∂xn∂w1x1+w2x2+…wnxn)T
显然
∂
y
∂
x
⃗
=
(
w
1
,
w
2
,
…
,
w
n
)
T
=
w
⃗
\frac{\partial y}{\partial \vec{x}} = (w_1,w_2,\dots,w_n)^T=\vec{w}
∂x∂y=(w1,w2,…,wn)T=w
这样我们得到了第一种形式的导数求法。
2. y = x T ∗ w y = x^T * w y=xT∗w
同理
y
=
w
1
x
1
+
w
2
x
2
+
…
w
n
x
n
y = w_1x_1 + w_2x_2 + \dots w_nx_n
y=w1x1+w2x2+…wnxn
实际上这个与上一种情况结果一样
∂
y
∂
x
⃗
=
(
w
1
,
w
2
,
…
,
w
n
)
T
=
w
⃗
\frac{\partial y}{\partial \vec{x}} = (w_1,w_2,\dots,w_n)^T=\vec{w}
∂x∂y=(w1,w2,…,wn)T=w
2. y = x T ∗ A n ∗ n ∗ x y = x^T * A_{n*n} * x y=xT∗An∗n∗x
这种二次型情况也比较常见,我们将二次型展开写可以得到
y
=
a
11
x
1
2
+
a
12
x
1
x
2
+
⋯
+
a
1
n
x
1
x
n
+
a
21
x
2
x
1
+
a
22
x
2
2
+
⋯
+
a
1
n
x
1
x
n
+
⋮
a
n
1
x
n
x
1
+
a
n
2
x
n
x
2
+
⋯
+
a
n
n
x
n
2
\begin{aligned} y=&a_{11}x_1^2+a_{12}x_1x_2+\dots+a_{1n}x_1x_n + \\ & a_{21}x_2x_1+a_{22}x_2^2+\dots+a_{1n}x_1x_n +\\ & \vdots \\ & a_{n1}x_nx_1 + a_{n2}x_nx_2 + \dots + a_{nn}x_n^2 \end{aligned}
y=a11x12+a12x1x2+⋯+a1nx1xn+a21x2x1+a22x22+⋯+a1nx1xn+⋮an1xnx1+an2xnx2+⋯+annxn2
∂
y
∂
x
1
=
(
2
a
11
x
1
+
(
a
12
+
a
21
)
x
2
+
⋯
+
(
a
1
n
+
a
n
1
)
x
n
)
\frac{\partial y}{\partial x_1} = (2a_{11}x_1 +(a_{12}+a_{21})x_2+\dots+(a_{1n}+a_{n1})x_n)
∂x1∂y=(2a11x1+(a12+a21)x2+⋯+(a1n+an1)xn)
∂
y
∂
x
2
=
(
(
a
12
+
a
21
)
x
1
+
2
a
22
x
2
+
⋯
+
(
a
2
n
+
a
n
2
)
x
n
)
\frac{\partial y}{\partial x_2} = ((a_{12}+a_{21})x_1 +2a_{22}x_2+\dots+(a_{2n}+a_{n2})x_n)
∂x2∂y=((a12+a21)x1+2a22x2+⋯+(a2n+an2)xn)
∂
y
∂
x
n
=
(
(
a
1
n
+
a
n
1
)
x
1
+
(
a
2
n
+
a
n
2
)
x
2
+
⋯
+
a
n
n
2
x
n
)
\frac{\partial y}{\partial x_n} = ((a_{1n}+a_{n1})x_1 +(a_{2n}+a_{n2})x_2+\dots+a_{nn}^2x_n)
∂xn∂y=((a1n+an1)x1+(a2n+an2)x2+⋯+ann2xn)
所以
∂
y
∂
x
⃗
=
(
2
a
11
a
12
+
a
21
…
a
1
n
+
a
n
1
a
12
+
a
21
2
a
22
…
a
2
n
+
a
n
2
⋮
⋮
⋮
⋮
a
1
n
+
a
n
1
a
2
n
+
a
n
2
2
a
n
n
)
x
⃗
\frac{\partial y}{\partial \vec{x}} = \begin{pmatrix} 2a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}&2 a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & 2a_{nn} \end{pmatrix}\vec{x}
∂x∂y=⎝⎜⎜⎜⎛2a11a12+a21⋮a1n+an1a12+a212a22⋮a2n+an2……⋮a1n+an1a2n+an2⋮2ann⎠⎟⎟⎟⎞x
实际上
(
2
a
11
a
12
+
a
21
…
a
1
n
+
a
n
1
a
12
+
a
21
a
22
…
a
2
n
+
a
n
2
⋮
⋮
⋮
⋮
a
1
n
+
a
n
1
a
2
n
+
a
n
2
2
a
n
n
)
=
(
a
11
+
a
11
a
12
+
a
21
…
a
1
n
+
a
n
1
a
12
+
a
21
a
22
+
a
22
…
a
2
n
+
a
n
2
⋮
⋮
⋮
⋮
a
1
n
+
a
n
1
a
2
n
+
a
n
2
a
n
n
+
a
n
n
)
=
A
T
+
A
\begin{pmatrix} 2a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}& a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & 2a_{nn} \end{pmatrix}=\begin{pmatrix} a_{11}+a_{11} & a_{12} + a_{21} & \dots &a_{1n}+a_{n1}\\ a_{12}+a_{21}& a_{22}+a_{22} & \dots & a_{2n}+a_{n2}\\ \vdots& \vdots& \vdots & \vdots \\ a_{1n}+a_{n1}& a_{2n}+a_{n2} & & a_{nn}+a_{nn} \end{pmatrix}=A^T + A
⎝⎜⎜⎜⎛2a11a12+a21⋮a1n+an1a12+a21a22⋮a2n+an2……⋮a1n+an1a2n+an2⋮2ann⎠⎟⎟⎟⎞=⎝⎜⎜⎜⎛a11+a11a12+a21⋮a1n+an1a12+a21a22+a22⋮a2n+an2……⋮a1n+an1a2n+an2⋮ann+ann⎠⎟⎟⎟⎞=AT+A
故可以得到
∂
y
∂
x
⃗
=
(
A
T
+
A
)
x
⃗
\frac{\partial y}{\partial \vec{x}} = (A^T+A)\vec{x}
∂x∂y=(AT+A)x
我们来看前一篇文章中的一个求导。
L
(
w
)
=
w
T
R
~
w
+
λ
[
w
T
a
(
θ
d
)
−
1
]
L(w) = w^T\tilde{R}w+\lambda[w^Ta(\theta_d)-1]
L(w)=wTR~w+λ[wTa(θd)−1]
式中
L
L
L为一个标量,
w
=
(
w
1
,
w
2
,
…
,
w
n
)
T
w=(w_1,w_2,\dots,w_n)^T
w=(w1,w2,…,wn)T,
R
R
R为一个实对称矩阵。
要求
∂
L
(
w
)
∂
w
\frac{\partial L(w)}{\partial w}
∂w∂L(w)
分成两部分
∂
(
w
T
R
w
)
∂
w
=
(
R
T
+
R
)
w
=
2
R
w
\frac{\partial (w^TRw)}{\partial w}=(R^T+R)w=2Rw
∂w∂(wTRw)=(RT+R)w=2Rw
∂
(
w
T
a
(
θ
d
)
−
1
)
∂
w
=
a
(
θ
d
)
\frac{\partial (w^Ta(\theta_d)-1)}{\partial w}=a(\theta_d)
∂w∂(wTa(θd)−1)=a(θd)
故最终结果
∂
L
(
w
)
∂
w
=
2
R
~
w
+
λ
a
(
θ
d
)
\frac{\partial L(w)}{\partial w}=2\tilde{R}w+\lambda a(\theta_d)
∂w∂L(w)=2R~w+λa(θd)
总结
主要介绍了常见的几种标量对向量求导,实际上在数字信号处理中和深度学习中,对向量求导很常见。后面有时间继续写。