注 数学系列为本人学习笔记,水平有限,错误在所难免,请读者不吝指正。
证明主体部分来自下面的链接。
https://www.planetmath.org/RayleighRitzTheorem
先来看几个基本概念
复平面(Complex Plane)
考虑形如 a + b i a+bi a+bi 的复数,该数代表复平面上的一个点。复平面中 x x x 轴代表实数部分, y y y 轴代表虚数部分,这样 a + b i a+bi a+bi 在复平面上就代表坐标为 ( a , b ) (a,b) (a,b) 的一个点。复数 a + b i a + bi a+bi 也可以看作在复平面上以原点 ( 0 , 0 ) (0,0) (0,0) 为出发点,以 ( a , b ) (a,b) (a,b) 为终点的向量。这样,对于复数的加减就相当于对复平面上的向量进行加减。
复共轭(complex conjugate) 定义复数 z = a + b i z=a+bi z=a+bi 的共轭 z ∗ z^* z∗ 为 z ∗ = a − b i z^* = a - bi z∗=a−bi 。
两个有用的公式
z
1
∗
×
z
2
∗
=
(
z
1
×
z
2
)
∗
(1)
z^*_1 \times z^*_2 = (z_1 \times z_2)^* \tag{1}
z1∗×z2∗=(z1×z2)∗(1)
z
1
∗
+
z
2
∗
=
(
z
1
+
z
2
)
∗
(2)
z^*_1 + z^*_2 = (z_1 + z_2)^* \tag{2}
z1∗+z2∗=(z1+z2)∗(2) 例如 ,
z
1
=
3
+
2
i
z_1 = 3 + 2i
z1=3+2i,
z
2
=
1
−
i
z_2 = 1 - i
z2=1−i,则
z
1
∗
×
z
2
∗
=
(
3
−
2
i
)
×
(
1
+
i
)
=
5
+
i
z
1
×
z
2
=
(
3
+
2
i
)
×
(
1
−
i
)
=
5
−
i
z
1
∗
+
z
2
∗
=
(
3
−
2
i
)
+
(
1
+
i
)
=
4
−
i
z
1
+
z
2
=
(
3
+
2
i
)
+
(
1
−
i
)
=
4
+
i
z^*_1 \times z^*_2 = (3-2i) \times (1 + i) = 5 +i \\ z_1 \times z_2 = (3+2i) \times (1-i) = 5 - i \\ z^*_1 + z^*_2 = (3-2i) + (1+i) = 4 - i \\ z_1 + z_2 = (3+2i) + (1-i) = 4 + i
z1∗×z2∗=(3−2i)×(1+i)=5+iz1×z2=(3+2i)×(1−i)=5−iz1∗+z2∗=(3−2i)+(1+i)=4−iz1+z2=(3+2i)+(1−i)=4+i
矩阵特征值和特征向量的共轭
如果
A
\bf A
A 是实数矩阵,并且
A
x
=
λ
x
{\bf Ax} = \lambda {\bf x}
Ax=λx 那么
A
x
∗
=
λ
∗
x
∗
{\bf A}{\bf x}^* = \lambda^* {\bf x}^*
Ax∗=λ∗x∗
复数和其共轭相乘或相加得实数 即
z
+
z
∗
∈
R
z
×
z
∗
∈
R
z + z^* \in {\Bbb R} \\ z \times z^* \in {\Bbb R}
z+z∗∈Rz×z∗∈R
一些有用的公式
∣
(
a
+
b
i
)
∣
2
=
a
2
+
b
2
(
a
+
b
i
)
(
a
−
b
i
)
=
a
2
+
b
2
1
a
+
b
i
=
1
a
+
b
i
a
−
b
i
a
−
b
i
=
a
−
b
i
a
2
+
b
2
\begin{aligned} |(a+bi)|^2 & = a^2 + b^2 \\[2ex] (a+bi)(a-bi) & = a^2 + b^2 \\[2ex] \frac{1}{a+bi} & = \frac{1}{a+bi} \frac{a - bi}{a - bi} = \frac{a-bi}{a^2 + b^2} \end{aligned}
∣(a+bi)∣2(a+bi)(a−bi)a+bi1=a2+b2=a2+b2=a+bi1a−bia−bi=a2+b2a−bi
在单位元上,即
a
2
+
b
2
=
1
a^2+b^2 = 1
a2+b2=1 时,
(
a
+
b
i
)
−
1
=
a
−
b
i
(a+bi)^{-1} = a - bi
(a+bi)−1=a−bi,即
1
/
z
=
z
∗
1/z = z^*
1/z=z∗ 。
复数的绝对值
∣
z
∣
=
∣
a
+
b
i
∣
=
a
2
+
b
2
2
|z| = |a+bi| = \sqrt[2]{a^2 + b^2}
∣z∣=∣a+bi∣=2a2+b2
∣
z
∣
|z|
∣z∣ 通常还被记为
r
r
r 。当
a
2
+
b
2
=
1
a^2+b^2 = 1
a2+b2=1 时,
r
r
r 就是单位圆的半径。
z
z
z 和
x
x
x 轴的夹角记为
θ
\theta
θ,
z
z
z 平方后与
x
x
x 轴的夹角变为
2
θ
2\theta
2θ。
复数的指数形式
z
=
r
cos
θ
+
i
r
sin
θ
=
r
e
i
θ
z
n
=
r
n
cos
n
θ
+
i
r
n
sin
n
θ
=
r
n
e
i
n
θ
z = r\cos\theta + ir\sin\theta = re^{i\theta} \\ z^n = r^n\cos n\theta + ir^n\sin n\theta = r^ne^{in\theta}
z=rcosθ+irsinθ=reiθzn=rncosnθ+irnsinnθ=rneinθ 设
z
′
=
r
′
cos
θ
′
+
i
r
′
sin
θ
′
z' = r'\cos\theta' + ir'\sin\theta'
z′=r′cosθ′+ir′sinθ′,则
z
×
z
′
=
(
r
cos
θ
+
i
r
sin
θ
)
×
(
r
′
cos
θ
′
+
i
r
′
sin
θ
′
)
=
r
r
′
(
cos
(
θ
+
θ
′
)
+
i
sin
(
θ
+
θ
′
)
)
z \times z' = (r\cos\theta + ir\sin\theta) \times (r'\cos\theta' + ir'\sin\theta') \\ = rr'(\cos(\theta + \theta')+i\sin(\theta + \theta'))
z×z′=(rcosθ+irsinθ)×(r′cosθ′+ir′sinθ′)=rr′(cos(θ+θ′)+isin(θ+θ′))
厄米特矩阵(Hermitian Matrix)
对于实数向量
x
\bf x
x,其长度平方(length squared)为
x
1
2
+
x
2
2
+
⋯
+
x
n
2
x_1^2 + x_2^2 + \cdots + x_n^2
x12+x22+⋯+xn2。但对于复数向量
z
\bf z
z,长度平方就不是
z
1
2
+
z
2
2
+
⋯
+
z
n
2
z^2_1 + z^2_2 + \cdots + z_n^2
z12+z22+⋯+zn2,比如向量
(
1
,
i
)
(1, i)
(1,i),如果还按照实数向量长度平方的定义,则
1
2
+
i
2
=
0
1^2 + i^2=0
12+i2=0 。如果这么定义,那么一个非零向量的长度平方就有可能是
0
0
0,这不是一个好的定义。并且这么定义,长度平方还有可能是复数。因此对于复数向量
z
\bf z
z,我们定义
z
∗
T
z
=
∣
∣
z
∣
∣
2
{\bf z}^{*T}{\bf z} = ||{\bf z}||^2
z∗Tz=∣∣z∣∣2 。 我们记
z
∗
T
=
z
H
{\bf z}^{*T} = {\bf z}^H
z∗T=zH,例如
A
=
[
1
i
0
1
+
i
]
{\bf A} = \begin{bmatrix} 1 & i \\ 0 & 1 + i \end{bmatrix}
A=[10i1+i] 则
A
H
=
[
1
0
−
i
1
−
i
]
{\bf A}^H = \begin{bmatrix} 1 & 0 \\ -i & 1 - i \end{bmatrix}
AH=[1−i01−i],即
A
H
{\bf A}^H
AH 为对
A
{\bf A}
A 转置后再取其复共轭。
对于实向量, x T x = ∣ ∣ x ∣ ∣ 2 {\bf x}^T{\bf x} = ||{\bf x}||^2 xTx=∣∣x∣∣2 ,对于复向量, z H z = ∣ ∣ z ∣ ∣ 2 {\bf z}^H{\bf z} = ||{\bf z}||^2 zHz=∣∣z∣∣2 。考虑 x T x {\bf x}^T{\bf x} xTx 就是 x \bf x x 和其自身的内积,我们定义复向量 u \bf u u 和 v \bf v v 的内积为 u H v {\bf u}^H{\bf v} uHv,即 u H v = [ u 1 ∗ , u 2 ∗ , ⋯ , u n ∗ ] [ v 1 v 2 ⋮ v n ] = u 1 ∗ v 1 + u 2 ∗ v 2 + ⋯ + u n ∗ v n {\bf u}^H{\bf v} = [u^*_1, u^*_2, \cdots, u^*_n] \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} = u^*_1v_1 + u^*_2v_2 + \cdots + u^*_nv_n uHv=[u1∗,u2∗,⋯,un∗]⎣⎢⎢⎢⎡v1v2⋮vn⎦⎥⎥⎥⎤=u1∗v1+u2∗v2+⋯+un∗vn 请注意,对于复向量, u H v {\bf u}^H{\bf v} uHv 和 v H u {\bf v}^H{\bf u} vHu 是不等价的。事实上, v H u {\bf v}^H{\bf u} vHu 是 u H v {\bf u}^H{\bf v} uHv 的复共轭。
方阵对角化
设
n
n
n 维方阵
A
\bf A
A 有
n
n
n 个线性独立的特征向量
x
1
,
x
2
,
⋯
,
x
n
\bf x_1, \bf x_2, \cdots , \bf x_n
x1,x2,⋯,xn,现在将这些特征向量作为特征矩阵
X
\bf X
X 的列向量,那么
X
−
1
A
X
{\bf X}^{-1}{\bf AX}
X−1AX 即是特征值矩阵
Λ
\bf \Lambda
Λ 。即
X
−
1
A
X
=
Λ
=
[
λ
1
⋱
λ
n
]
{\bf X}^{-1}{\bf AX} = {\bf \Lambda} = \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{bmatrix}
X−1AX=Λ=⎣⎡λ1⋱λn⎦⎤
正交基
我们说列向量
q
1
,
q
2
,
…
,
q
n
q_1, q_2, \ldots, q_n
q1,q2,…,qn 是正交的,如果
q
i
T
q
j
=
{
0
,
for
i
≠
j
1
,
for
i
=
j
q_i^Tq_j = \begin{cases} 0, & \text {for $i \neq j$} \\ 1, & \text{for $i = j$} \end{cases}
qiTqj={0,1,for i=jfor i=j 列向量
q
1
,
q
2
,
…
,
q
n
q_1, q_2, \ldots, q_n
q1,q2,…,qn 组成的矩阵
Q
\bf Q
Q 有如下性质
Q
T
Q
=
I
,
which means
Q
T
=
Q
−
1
{\bf Q}^T{\bf Q} = {\bf I}, \quad \text{ which means} \quad {\bf Q}^T = {\bf Q}^{-1}
QTQ=I, which meansQT=Q−1
厄米特矩阵
实对称矩阵
S
\bf S
S 可以写成
S
=
Q
Λ
Q
−
1
{\bf S}={\bf Q\Lambda Q}^{-1}
S=QΛQ−1 的形式,且
S
T
=
S
{\bf S}^T = {\bf S}
ST=S。复对称矩阵
S
\bf S
S,则有
S
H
=
S
{\bf S}^H = {\bf S}
SH=S 。当
S
H
=
S
{\bf S}^H = {\bf S}
SH=S 时,我们称矩阵
S
\bf S
S 为厄米特矩阵(Hermitian Matrix)。
如果 S = S H {\bf S} = {\bf S}^H S=SH,并且 z \bf z z 为实或者复列向量,则 z H S z {\bf z}^H{\bf Sz} zHSz 为实数。
每一个 Hermitian 矩阵的特征值都是实数。
Hermitian 矩阵的特征向量相互正交,即 S z = λ z S y = β y λ ≠ β } ⟹ y H z = 0 \left. \begin{array}{l} {\bf Sz} = \lambda{\bf z} \\ {\bf Sy} = \beta{\bf y} \\ \lambda \neq \beta \end{array} \right\} \implies {\bf y}^H{\bf z} = 0 Sz=λzSy=βyλ=β⎭⎬⎫⟹yHz=0
瑞利定理(Rayleigh theorem)
以下参考 https://www.planetmath.org/RayleighRitzTheorem
定义瑞利商(Rayleigh quotient)为
R
(
A
,
x
)
=
x
H
A
x
x
H
x
R({\bf A,x}) = \frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}
R(A,x)=xHxxHAx 其中,
x
\bf x
x 为非零向量,
A
\bf A
A 为
n
×
n
n \times n
n×n Hermitian Matrix,
A
\bf A
A 的特征向量即是函数
R
(
A
,
x
)
R({\bf A,x})
R(A,x) 的驻点(critical point),特征向量相对应的特征值即为函数在该驻点的值。由此,我们可知
R
(
A
,
x
)
R({\bf A,x})
R(A,x) 的最大值等于矩阵
A
\bf A
A 最大的特征值,而最小值等于矩阵
A
\bf A
A 的最小的特征值,即
λ
m
i
n
≤
x
H
A
x
x
H
x
≤
λ
m
a
x
\lambda_{min} \leq \frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}} \leq \lambda_{max}
λmin≤xHxxHAx≤λmax 当向量
x
\bf x
x 是标准正交基时,即满足
x
H
x
=
1
{\bf x}^H{\bf x}=1
xHx=1 时,瑞利熵为
R
(
A
,
x
)
=
x
H
A
x
R({\bf A,x}) = {\bf x}^H{\bf Ax}
R(A,x)=xHAx
证明
首先,根据 Hermitian Matrix 性质, x H A x {\bf x}^H{\bf Ax} xHAx 为实数, x H x {\bf x}^H{\bf x} xHx 显然是实数,因而 R ( A , x ) R({\bf A,x}) R(A,x) 为实数。
现在求
R
(
A
,
x
)
R({\bf A,x})
R(A,x) 的驻点
x
‾
\overline{\bf x}
x,我们将瑞利熵简写为
R
(
x
)
R({\bf x})
R(x),即求解方程
d
R
(
x
‾
)
d
x
=
0
T
\frac{dR(\overline{\bf x})}{d{\bf x}} = {\bf 0}^T
dxdR(x)=0T 令
x
=
x
R
+
i
x
I
{\bf x} = {\bf x}^{R} + i{\bf x}^{I}
x=xR+ixI
x
R
{\bf x}^R
xR 和
x
I
{\bf x}^I
xI 分别是
x
\bf x
x 的实部和虚部,则有
d
R
(
x
)
d
x
=
d
R
(
x
)
d
x
R
+
i
d
R
(
x
)
d
x
I
\frac{dR({\bf x})}{d{\bf x}} = \frac{dR({\bf x})}{d{\bf x}^R} + i\frac{dR({\bf x})}{d{\bf x}^I}
dxdR(x)=dxRdR(x)+idxIdR(x) 因此,有
d
R
(
x
‾
)
d
x
R
=
d
R
(
x
‾
)
d
x
I
=
0
T
(0)
\frac{dR(\overline{\bf x})}{d{\bf x}^R} = \frac{dR(\overline{\bf x})}{d{\bf x}^I} = {\bf 0}^T \tag{0}
dxRdR(x)=dxIdR(x)=0T(0) 根据微分法则
d
R
(
x
)
d
x
R
=
d
d
x
R
(
x
H
A
x
x
H
x
)
=
d
(
x
H
A
x
)
d
x
R
(
x
H
x
)
−
x
H
A
x
d
(
x
H
x
)
d
x
R
(
x
H
x
)
2
=
d
(
x
H
A
x
)
d
x
R
−
R
(
x
)
d
(
x
H
x
)
d
x
R
x
H
x
(1)
\begin{aligned} \frac{dR({\bf x})}{d{\bf x}^R} & = \frac{d}{d{\bf x}^R}(\frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}) \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R}({\bf x}^H{\bf x}) - {\bf x}^H{\bf Ax} \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^R}}{({\bf x}^H{\bf x})^2} \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R} - R({\bf x}) \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^R} }{{\bf x}^H{\bf x}} \end{aligned} \tag{1}
dxRdR(x)=dxRd(xHxxHAx)=(xHx)2dxRd(xHAx)(xHx)−xHAxdxRd(xHx)=xHxdxRd(xHAx)−R(x)dxRd(xHx)(1) 根据矩阵微分法则
d
(
x
H
A
x
)
d
x
R
=
x
H
A
d
x
d
x
R
+
x
T
A
T
d
x
∗
d
x
R
=
x
H
A
+
x
T
A
T
=
x
H
A
+
(
x
H
A
H
)
∗
\begin{aligned} \frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^R} & = {\bf x}^H{\bf A} \frac{d{\bf x}}{d{\bf x}^R} + {\bf x}^T{\bf A}^T \frac{d{\bf x}^*}{d{\bf x}^R} \\ \\ & = {\bf x}^H{\bf A} + {\bf x}^T{\bf A}^T \\ & = {\bf x}^H{\bf A} + ({\bf x}^H{\bf A}^H)^* \end{aligned}
dxRd(xHAx)=xHAdxRdx+xTATdxRdx∗=xHA+xTAT=xHA+(xHAH)∗ 又因为
A
=
A
H
{\bf A} = {\bf A}^H
A=AH,所以上式变为
x
H
A
+
(
x
H
A
)
∗
=
2
(
x
H
A
)
R
(2)
{\bf x}^H{\bf A} + ({\bf x}^H{\bf A})^* = 2({\bf x}^H{\bf A})^R \tag{2}
xHA+(xHA)∗=2(xHA)R(2) (注:矩阵微分参考手册 http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html )
类似的,我们可以得到
d
(
x
H
x
)
d
x
R
=
2
(
x
H
)
R
(3)
\frac{d({\bf x}^H{\bf x})}{d{\bf x}^R} = 2({\bf x}^H)^R \tag{3}
dxRd(xHx)=2(xH)R(3) 将
(
2
)
、
(
3
)
(2)、(3)
(2)、(3) 代入
(
1
)
(1)
(1) 得
d
R
(
x
)
d
x
R
=
2
(
x
H
A
)
R
−
R
(
x
)
(
x
H
)
R
x
H
x
\frac{d R({\bf x})}{d{\bf x}^R} = 2 \frac{({\bf x}^H{\bf A})^R - R({\bf x})({\bf x}^H)^R}{{\bf x}^H{\bf x}}
dxRdR(x)=2xHx(xHA)R−R(x)(xH)R 根据
(
0
)
(0)
(0) 式,我们有
0
T
=
(
x
‾
H
A
)
R
−
R
(
x
‾
)
(
x
‾
H
)
R
{\bf 0}^T = (\overline{\bf x}^H{\bf A})^R - R(\overline{\bf x})(\overline{\bf x}^H)^R
0T=(xHA)R−R(x)(xH)R 即
0
=
(
(
x
‾
H
A
)
R
−
R
(
x
‾
)
(
x
‾
H
)
R
)
T
=
(
A
T
x
‾
∗
)
R
−
R
(
x
‾
)
(
x
‾
∗
)
R
=
(
(
A
H
x
‾
)
∗
)
R
−
R
(
x
‾
)
(
x
‾
∗
)
R
=
(
(
A
x
‾
)
∗
)
R
−
R
(
x
‾
)
(
x
‾
∗
)
R
=
(
(
A
x
‾
)
)
R
−
R
(
x
‾
)
(
x
‾
)
R
\begin{aligned} {\bf 0} & = ((\overline{\bf x}^H{\bf A})^R - R(\overline{\bf x})(\overline{\bf x}^H)^R)^T \\ & = ({\bf A}^T\overline{\bf x}^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}^H\overline{\bf x})^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}\overline{\bf x})^*)^R - R(\overline{\bf x})(\overline{\bf x}^*)^R \\ & = (({\bf A}\overline{\bf x}))^R - R(\overline{\bf x})(\overline{\bf x})^R \end{aligned}
0=((xHA)R−R(x)(xH)R)T=(ATx∗)R−R(x)(x∗)R=((AHx)∗)R−R(x)(x∗)R=((Ax)∗)R−R(x)(x∗)R=((Ax))R−R(x)(x)R 由于
R
(
x
)
R(\bf x)
R(x) 为实数,因此
0
=
(
A
x
‾
−
R
(
x
‾
)
x
‾
)
R
(I)
{\bf 0} = ({\bf A}\overline{\bf x} - R(\overline{\bf x})\overline{\bf x})^R \tag{I}
0=(Ax−R(x)x)R(I) 接下来看
d
R
(
x
)
/
d
x
I
dR({\bf x})/d{\bf x}^I
dR(x)/dxI 根据微分法则
d
R
(
x
)
d
x
I
=
d
d
x
I
(
x
H
A
x
x
H
x
)
=
d
(
x
H
A
x
)
d
x
I
(
x
H
x
)
−
x
H
A
x
d
(
x
H
x
)
d
x
I
(
x
H
x
)
2
=
d
(
x
H
A
x
)
d
x
I
−
R
(
x
)
d
(
x
H
x
)
d
x
I
x
H
x
(4)
\begin{aligned} \frac{dR({\bf x})}{d{\bf x}^I} & = \frac{d}{d{\bf x}^I}(\frac{{\bf x}^H{\bf Ax}}{{\bf x}^H{\bf x}}) \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I}({\bf x}^H{\bf x}) - {\bf x}^H{\bf Ax} \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^I}}{({\bf x}^H{\bf x})^2} \\[2ex] & = \frac{\cfrac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} - R({\bf x}) \cfrac{d({\bf x}^H{\bf x})}{d{\bf x}^I} }{{\bf x}^H{\bf x}} \end{aligned} \tag{4}
dxIdR(x)=dxId(xHxxHAx)=(xHx)2dxId(xHAx)(xHx)−xHAxdxId(xHx)=xHxdxId(xHAx)−R(x)dxId(xHx)(4) 根据矩阵微分法则
d
(
x
H
A
x
)
d
x
I
=
x
H
A
d
x
d
x
I
+
x
T
A
T
d
x
∗
d
x
I
=
i
x
H
A
−
i
x
T
A
T
=
i
x
H
A
−
(
x
H
A
H
)
∗
\begin{aligned} \frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} & = {\bf x}^H{\bf A} \frac{d{\bf x}}{d{\bf x}^I} + {\bf x}^T{\bf A}^T \frac{d{\bf x}^*}{d{\bf x}^I} \\ \\ & = i{\bf x}^H{\bf A} - i{\bf x}^T{\bf A}^T \\ & = i{\bf x}^H{\bf A} - ({\bf x}^H{\bf A}^H)^* \end{aligned}
dxId(xHAx)=xHAdxIdx+xTATdxIdx∗=ixHA−ixTAT=ixHA−(xHAH)∗ 因为
A
=
A
H
{\bf A} = {\bf A}^H
A=AH,我们有
d
(
x
H
A
x
)
d
x
I
=
i
(
x
H
A
−
(
x
H
A
)
∗
)
=
i
(
2
i
(
x
H
A
)
I
)
=
−
2
(
x
H
A
)
I
(5)
\frac{d({\bf x}^H{\bf Ax})}{d{\bf x}^I} = i({\bf x}^H{\bf A} - ({\bf x}^H{\bf A})^*) = i(2i({\bf x}^H{\bf A})^I) = -2({\bf x}^H{\bf A})^I \tag{5}
dxId(xHAx)=i(xHA−(xHA)∗)=i(2i(xHA)I)=−2(xHA)I(5) 类似的,我们有
d
(
x
H
x
)
d
x
I
=
i
x
H
−
i
x
T
=
i
(
x
H
−
(
x
H
)
∗
)
=
i
(
2
i
(
x
H
)
I
)
=
−
2
(
x
H
)
I
(6)
\frac{d({\bf x}^H{\bf x})}{d{\bf x}^I} = i{\bf x}^H - i{\bf x}^T = i({\bf x}^H - ({\bf x}^H)^*) = i(2i({\bf x}^H)^I) = -2({\bf x}^H)^I \tag{6}
dxId(xHx)=ixH−ixT=i(xH−(xH)∗)=i(2i(xH)I)=−2(xH)I(6) 将
(
5
)
、
(
6
)
(5)、(6)
(5)、(6) 代入
(
4
)
(4)
(4),得
d
R
(
x
)
d
x
I
=
−
2
(
x
H
A
)
I
−
R
(
x
)
(
x
H
)
I
x
H
x
\frac{dR({\bf x})}{d{\bf x}^I} = -2 \frac{({\bf x}^H{\bf A})^I - R({\bf x})({\bf x}^H)^I}{{\bf x}^H{\bf x}}
dxIdR(x)=−2xHx(xHA)I−R(x)(xH)I 根据
(
0
)
(0)
(0) 式,我们有
0
T
=
(
x
‾
H
A
)
I
−
R
(
x
‾
)
(
x
‾
H
)
I
{\bf 0}^T = (\overline{\bf x}^H{\bf A})^I - R(\overline{\bf x})(\overline{\bf x}^H)^I
0T=(xHA)I−R(x)(xH)I 即
0
=
(
(
x
‾
H
A
)
I
−
R
(
x
‾
)
(
x
‾
H
)
I
)
T
=
(
A
T
x
‾
∗
)
I
−
R
(
x
‾
)
(
x
‾
∗
)
I
=
(
(
A
H
x
‾
)
∗
)
I
−
R
(
x
‾
)
(
x
‾
∗
)
I
=
(
(
A
x
‾
)
∗
)
I
−
R
(
x
‾
)
(
x
‾
∗
)
I
=
−
(
A
x
‾
)
I
+
R
(
x
‾
)
(
x
‾
)
I
\begin{aligned} {\bf 0} & = ((\overline{\bf x}^H{\bf A})^I - R(\overline{\bf x})(\overline{\bf x}^H)^I)^T \\ & = ({\bf A}^T\overline{\bf x}^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = (({\bf A}^H\overline{\bf x})^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = (({\bf A}\overline{\bf x})^*)^I - R(\overline{\bf x})(\overline{\bf x}^*)^I \\ & = -({\bf A}\overline{\bf x})^I + R(\overline{\bf x})(\overline{\bf x})^I \end{aligned}
0=((xHA)I−R(x)(xH)I)T=(ATx∗)I−R(x)(x∗)I=((AHx)∗)I−R(x)(x∗)I=((Ax)∗)I−R(x)(x∗)I=−(Ax)I+R(x)(x)I 因为
R
(
x
‾
)
R(\overline{\bf x})
R(x) 为实数,所以
0
=
(
(
A
x
‾
)
−
R
(
x
‾
)
(
x
‾
)
)
I
(II)
{\bf 0} = (({\bf A}\overline{\bf x}) - R(\overline{\bf x})(\overline{\bf x}))^I \tag{II}
0=((Ax)−R(x)(x))I(II) 根据
(
I
)
、
(
I
I
)
(I)、(II)
(I)、(II) 两式,可知
A
x
‾
−
R
(
x
‾
)
(
x
‾
)
=
0
{\bf A}\overline{\bf x} - R(\overline{\bf x})(\overline{\bf x}) = {\bf 0}
Ax−R(x)(x)=0 而这正是我们要证明的。
参考资料
- https://www.planetmath.org/RayleighRitzTheorem
- Gilbert Strang, Introduction to Linear Algebra, Fifth Edition, 清华大学出版社