一、Courant-Fischer定理
“In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.”
——https://en.wikipedia.org/wiki/Min-max_theorem
讲的是Courant–Fischer–Weyl min-max principle(最小-最大定理)给出了一个关于Hermitian矩阵特征值的变分特性描述。以下是该定理的一个推导:
1、Hermitian矩阵 H \mathbf H H 定义:
具有以下特性的矩阵,被称为Hermitian矩阵:
H = H ∗ , where H ∗ = H ˉ T ( 1 ) \mathbf H=\mathbf H^* \text{, where $\mathbf H^*=\bar{\mathbf H}^T$}\qquad(1) H=H∗, where H∗=HˉT(1)
H ∗ \mathbf H^* H∗ 是 H \mathbf H H 的共轭转置,即 H \mathbf H H 与它的共轭转置矩阵 H ∗ \mathbf H^* H∗ 相等。若 H \mathbf H H 为实数阵,则有 H = H ∗ = H T \mathbf H =\mathbf H^*=\mathbf H^T H=H∗=HT,即实对称阵为Hermitian矩阵。
2、Hermitian矩阵 H \mathbf H H 的性质
性质1:
若A是H矩阵(Hermitian矩阵,简称为H矩阵),对于任意向量 x ∈ C n \mathbf x \in \mathbb C^n x∈Cn, x T A x \mathbf x^TA\mathbf x xTAx 是实数。
性质2:
H矩阵的特征值皆为实数。
性质3:
H矩阵相异特征值对应的特征向量互为正交。
Rayleigh Quotient (Rayleigh商, R)定义:
R = x ∗ A x x ∗ x ( 2 ) R = \frac{\mathbf x^*\mathbf A\mathbf x}{\mathbf x^*\mathbf x}\qquad(2) R=x∗xx∗Ax(2)
Courant-Fischer便是讨论 Rayleigh Quotient 的界问题。
命题1:
设 A \mathbf A A是形状为 n × n n \times n n×n 的 H \mathbf H H 矩阵,它的特征值(eigenvalues)皆为实数,设为 λ 1 ≥ λ 2 ⋯ ≥ λ n \lambda_1\ge\lambda_2\cdots\ge\lambda_n λ1≥λ2⋯≥λn,则有:
{ max x ≠ 0 x ∗ A x x ∗ x = λ 1 min x ≠ 0 x ∗ A x x ∗ x = λ n ( 3 ) \left\{ \begin{array}{c} \max_{x\neq0}\frac{\mathbf x^*\mathbf A \mathbf x}{\mathbf x^*\mathbf x}=\lambda_1\\ \ \\ \min_{x\neq0}\frac{\mathbf x^*\mathbf A \mathbf x}{\mathbf x^*\mathbf x}=\lambda_n \end{array} \right. \qquad(3) ⎩⎨⎧maxx̸=0x∗xx∗Ax=λ1 minx̸=0x∗xx∗Ax=λn(3)
即Rayleigh商的上界是最大特征值( λ 1 \lambda_1 λ1),下界是最小特征值( λ n \lambda_n λn)。
证明:
令 Λ = d i a g ( λ 1 , λ 2 , ⋯   , λ n ) \mathbf \Lambda = diag(\lambda_1,\lambda_2,\cdots,\lambda_n) Λ=diag(λ1,λ2,⋯,λn),这些特征值对应的归一化特征矢量为 { u 1 , u 2 , ⋯   , u n } \{ \mathbf u_1,\mathbf u_2,\cdots,\mathbf u_n\} {
u1,u2,⋯,un},即 ∥ u i ∥ = 1 \Vert \mathbf u_i \Vert=1 ∥ui∥=1,且 u i T u j = 0 (if i not equal j) \mathbf u_i^T\mathbf u_j=\mathbf 0\text{ (if i not equal j)} uiTuj=0 (if i not equal j)。令 U = [ u 1 , u 2 , ⋯   , u n ] \mathbf U =[ \mathbf u_1,\mathbf u_2,\cdots,\mathbf u_n] U=[u1,u2,⋯,un],则有 A = U Λ U ∗ \mathbf A = \mathbf U \mathbf \Lambda \mathbf U^* A=UΛU∗。代入(2)有:
R = x ∗ A x x ∗ x = x ∗ U Λ U ∗ x x ∗ I x = x ∗ U Λ U ∗ x x ∗ U U ∗ x Let z = U ∗ x = [ z i ] , i ∈ [ 1 , n ] R = z ∗ Λ z z ∗ z = λ 1 ∣ z 1 ∣ 2 + ⋯ + λ n ∣ z n ∣ 2 ∣ z 1 ∣ 2 + ⋯ + ∣ z n ∣ 2 ≤ λ 1 ( 4 ) R=\frac{\mathbf x^*\mathbf A \mathbf x}{\mathbf x^*\mathbf x}=\frac{\mathbf x^*\mathbf U\mathbf \Lambda \mathbf U^*\mathbf x}{\mathbf x^*\mathbf I\mathbf x}=\frac{\mathbf x^*\mathbf U\mathbf \Lambda \mathbf U^*\mathbf x}{\mathbf x^*\mathbf U\mathbf U^*\mathbf x}\\ \ \\ \text{Let } \mathbf z=\mathbf U^*\mathbf x=[z_i] ,i\in[1,n]\\ \ \\ R=\frac{\mathbf z^*\mathbf \Lambda \mathbf z}{\mathbf z^*\mathbf z}=\frac{\lambda_1\vert z_1\vert^2+\cdots + \lambda_n\vert z_n\vert^2}{\vert z_1\vert^2+\cdots+\vert z_n\vert^2}\le \lambda_1\qquad(4) R=x∗xx∗Ax=x∗Ixx∗UΛU∗x=x∗UU∗xx∗UΛU∗x Let z=U∗x=[zi],i∈[1,n] R=z