1 问题
给定 x ∈ R n × 1 \mathbf{x} \in \mathbb{R}^{n \times 1} x∈Rn×1, A ∈ R n × n \mathbf{A} \in \mathbb{R}^{n \times n} A∈Rn×n, f ( x ) = ( A x ) ⊙ ( A x ) \mathbf{f}(\mathbf{x})=\sqrt{(\mathbf{A} \mathbf{x}) \odot (\mathbf{Ax})} f(x)=(Ax)⊙(Ax)。 其中 ( ⋅ ) \sqrt{(\cdot)} (⋅)表示Hadamard root (elements-wise square root),即矩阵元素逐项平方根。求 ∂ f ∂ x \frac{\partial \mathbf{f}}{\partial \mathbf{x}} ∂x∂f。
2 求解
2.1 先用Hadamard product解平方根
令: b = A x \mathbf{b} = \mathbf{A} \mathbf{x} b=Ax,有: d b = d ( A x ) = A d x d\mathbf{b} = d(\mathbf{A} \mathbf{x}) = \mathbf{A} d\mathbf{x} db=d(Ax)=Adx
2.2 矩阵对矩阵求导一般先将矩阵矢量化
f ⊙ f = ( A x ) ⊙ ( A x ) = b ⊙ b \begin{aligned} \mathbf{f} \odot \mathbf{f} &=(\mathbf{A} \mathbf{x}) \odot (\mathbf{A} \mathbf{x}) \\ &=\mathbf{b} \odot \mathbf{b} \end{aligned} f⊙f=(Ax)⊙(Ax)=b⊙b
根据微分哈达马乘积性质:
d
(
x
⊙
Y
)
=
x
⊙
d
Y
+
d
x
⊙
Y
d(\mathbf{x} \odot \mathbf{Y})=\mathbf{x} \odot d \mathbf{Y}+d \mathbf{x} \odot \mathbf{Y}
d(x⊙Y)=x⊙dY+dx⊙Y
有:
d
(
f
⊙
f
)
=
f
⊙
d
f
+
d
f
⊙
f
=
f
⊙
d
f
+
f
⊙
d
f
=
2
f
⊙
d
f
d
i
a
g
(
f
)
vec(df)
=
d
i
a
g
(
b
)
vec(db)
(
性
质
:
vec
(
A
⊙
X
)
=
diag
(
A
)
vec
(
X
)
)
\begin{aligned} d(\mathbf{f} \odot \mathbf{f}) &=\mathbf{f} \odot d \mathbf{f}+d \mathbf{f} \odot \mathbf{f} \\ &=\mathbf{f} \odot d \mathbf{f}+\mathbf{f} \odot d \mathbf{f} \\ &= 2\mathbf{f} \odot d \mathbf{f} \\ \operatorname{diag(\mathbf{f})\operatorname{vec(d\mathbf{f})}} &= \operatorname{diag(\mathbf{b})\operatorname{vec(d\mathbf{b})}} \quad (性质:\operatorname{vec}(\mathbf{A} \odot \mathbf{X})=\operatorname{diag}(\mathbf{A}) \operatorname{vec}(\mathbf{X})) \end{aligned}
d(f⊙f)diag(f)vec(df)=f⊙df+df⊙f=f⊙df+f⊙df=2f⊙df=diag(b)vec(db)(性质:vec(A⊙X)=diag(A)vec(X))
其中
diag
(
f
)
\operatorname{diag}(\mathbf{f})
diag(f) 是
n
×
n
n \times n
n×n 的对角矩阵,对角线上的元素是矩阵
f
\mathbf{f}
f 按列向量化后排列出来的;
diag
(
b
)
\operatorname{diag}(\mathbf{b})
diag(b)同理。
vec(df) = diag(f) − 1 diag ( b ) vec ( d b ) \operatorname{vec(d\mathbf{f})} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag}(\mathbf{b}) \operatorname{vec}(d\mathbf{b}) vec(df)=diag(f)−1diag(b)vec(db)
b ∈ R n × 1 ⟹ vec(db) = d b \mathbf{b} \in \mathbb{R}^{n \times 1} \implies \operatorname{vec(d \mathbf{b})} = d \mathbf{b} b∈Rn×1⟹vec(db)=db
∴ diag(f) d f = diag(b) A d x \therefore \operatorname{diag(\mathbf{f})} d \mathbf{f} = \operatorname{diag(b)} \mathbf{A} d \mathbf{x} ∴diag(f)df=diag(b)Adx
vec(df) = diag(f) − 1 diag(b) A d x \operatorname{vec(d\mathbf{f})} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A} d \mathbf{x} vec(df)=diag(f)−1diag(b)Adx
矩阵对矩阵求导如果采用分母布局,有:
vec
(
d
f
)
=
(
∂
f
∂
x
)
T
vec
(
d
x
)
\operatorname{vec}(d \mathbf{f})=\left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right)^{T} \operatorname{vec}(d \mathbf{x})
vec(df)=(∂x∂f)Tvec(dx)
如果是采用分子布局,有:
vec
(
d
f
)
=
(
∂
f
∂
x
)
vec
(
d
x
)
\operatorname{vec}(d \mathbf{f})=\left(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\right) \operatorname{vec}(d \mathbf{x})
vec(df)=(∂x∂f)vec(dx)
所以,对于此问题,如果采用分母布局:
∂
f
∂
x
=
(
diag(f)
−
1
diag(b)
A
)
T
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \left(\operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A}\right)^{T}
∂x∂f=(diag(f)−1diag(b)A)T
如果采用分子布局:
∂
f
∂
x
=
diag(f)
−
1
diag(b)
A
\frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \operatorname{diag(\mathbf{f})}^{-1} \operatorname{diag(\mathbf{b})} \mathbf{A}
∂x∂f=diag(f)−1diag(b)A