考虑线性方程组
A
x
=
b
\mathbf{Ax}=\mathbf{b}
Ax=b
最小二乘法只考虑误差来自
b
\mathbf{b}
b,但是实际上误差也有可能来自
A
\mathbf{A}
A
总体最小二乘法(Total least squares, TLS)就考虑了这一点
KKT
T
L
S
TLS
TLS问题
min
E
,
w
,
x
∥
E
∥
F
2
+
∥
w
∥
2
s.t.
(
A
+
E
)
x
=
b
+
w
E
∈
R
m
×
n
,
w
∈
R
m
\begin{array}{ll} \min\limits_{\mathbf{E}, \mathbf{w}, \mathbf{x}} & \|\mathbf{E}\|_{F}^{2}+\|\mathbf{w}\|^{2} \\ \text { s.t. } & (\mathbf{A}+\mathbf{E}) \mathbf{x}=\mathbf{b}+\mathbf{w} \\ & \mathbf{E} \in \mathbb{R}^{m \times n}, \mathbf{w} \in \mathbb{R}^{m} \end{array}
E,w,xmin s.t. ∥E∥F2+∥w∥2(A+E)x=b+wE∈Rm×n,w∈Rm
因为
F
F
F范数是非凸的,所以考虑用KKT条件,先固定
x
\mathbf{x}
x,得到
T
L
S
′
TLS'
TLS′问题
min
E
,
w
∥
E
∥
F
2
+
∥
w
∥
2
s.t.
(
A
+
E
)
x
=
b
+
w
\begin{array}{ll} \min\limits_{\mathbf{E}, \mathbf{w}} & \|\mathbf{E}\|_{F}^{2}+\|\mathbf{w}\|^{2} \\ \text { s.t. } & (\mathbf{A}+\mathbf{E}) \mathbf{x}=\mathbf{b}+\mathbf{w} \end{array}
E,wmin s.t. ∥E∥F2+∥w∥2(A+E)x=b+w
拉格朗日函数
L
(
E
,
w
,
λ
)
=
∥
E
∥
F
2
+
∥
w
∥
2
+
2
λ
T
(
(
A
+
E
)
x
−
b
−
w
)
L\left(\mathbf{E},\mathbf{w},\mathbf{\lambda}\right)= \|\mathbf{E}\|_{F}^{2}+\|\mathbf{w}\|^{2}+2\mathbf{\lambda}^T\left((\mathbf{A}+\mathbf{E}) \mathbf{x}-\mathbf{b}-\mathbf{w}\right)
L(E,w,λ)=∥E∥F2+∥w∥2+2λT((A+E)x−b−w)
KKT条件
{
∇
E
L
=
2
E
+
2
λ
x
T
=
0
∇
w
L
=
2
w
−
2
λ
=
0
(
A
+
E
)
x
=
b
+
w
\begin{cases} \nabla_{\mathbf{E}}L=2\mathbf{E}+2\mathbf{\lambda}\mathbf{x}^T=0\\ \nabla_{\mathbf{w}}L=2\mathbf{w}-2\mathbf{\lambda}=0\\ (\mathbf{A}+\mathbf{E}) \mathbf{x}=\mathbf{b}+\mathbf{w} \end{cases}
⎩
⎨
⎧∇EL=2E+2λxT=0∇wL=2w−2λ=0(A+E)x=b+w
可以得出
λ
=
w
\mathbf{\lambda}=\mathbf{w}
λ=w
于是
E
=
−
λ
x
T
=
−
w
x
T
\mathbf{E}=-\mathbf{\lambda}\mathbf{x}^T=-\mathbf{w}\mathbf{x}^T
E=−λxT=−wxT
代入最后一个条件
w
=
A
x
−
b
∥
x
∥
2
+
1
E
=
−
(
A
x
−
b
)
x
T
∥
x
∥
2
+
1
\mathbf{w}=\frac{A \mathbf{x}-\mathbf{b}}{\|\mathbf{x}\|^{2}+1}\\ \mathbf{E}=-\frac{(\mathbf{A x}-\mathbf{b}) \mathbf{x}^{T}}{\|\mathbf{x}\|^{2}+1}
w=∥x∥2+1Ax−bE=−∥x∥2+1(Ax−b)xT
代回到
T
L
S
TLS
TLS问题中
min
x
∈
R
n
∥
A
x
−
b
∥
2
∥
x
∥
2
+
1
\min _{\mathbf{x} \in \mathbb{R}^{n}} \frac{\|\mathbf{A x}-\mathbf{b}\|^{2}}{\|\mathbf{x}\|^{2}+1}
x∈Rnmin∥x∥2+1∥Ax−b∥2
x \mathbf{x} x是 T L S ′ TLS' TLS′问题的最优解当且仅当 ( x , E , w ) \left(\mathbf{x},\mathbf{E},\mathbf{w}\right) (x,E,w)是 T L S TLS TLS问题的最优解,其中 E = − ( A x − b ) x T ∥ x ∥ 2 + 1 , w = A x − b ∥ x ∥ 2 + 1 \mathbf{E}=-\frac{(\mathbf{A x}-\mathbf{b}) \mathbf{x}^{T}}{\|\mathbf{x}\|^{2}+1},\mathbf{w}=\frac{A \mathbf{x}-\mathbf{b}}{\|\mathbf{x}\|^{2}+1} E=−∥x∥2+1(Ax−b)xT,w=∥x∥2+1Ax−b
虽然现在的问题比之前简单,但是依然是非凸的
问题等价于
min
x
∈
R
n
∥
A
x
−
t
b
∥
2
∥
x
∥
2
+
t
s.t.
t
=
1
\begin{array}{ll} \min \limits_{\mathbf{x} \in \mathbb{R}^{n}} &\frac{\|\mathbf{A x}-t\mathbf{b}\|^{2}}{\|\mathbf{x}\|^{2}+t} \\ \text { s.t. } &t=1 \end{array}
x∈Rnmin s.t. ∥x∥2+t∥Ax−tb∥2t=1
令
y
=
(
x
t
)
\mathbf{y}=\begin{pmatrix}\mathbf{x}\\t\\\end{pmatrix}
y=(xt)
f
∗
=
min
y
∈
R
n
+
1
y
n
+
1
=
1
y
T
B
y
∥
y
∥
2
f^*=\min\limits_{\mathbf{y}\in\mathbb{R}^{n+1}\atop y_{n+1}=1}\frac{\mathbf{y}^T\mathbf{B}\mathbf{y}}{\|\mathbf{y}\|^2}
f∗=yn+1=1y∈Rn+1min∥y∥2yTBy
其中
B
=
(
A
T
A
−
A
T
b
−
b
T
A
∥
b
∥
2
)
\mathbf{B}=\begin{pmatrix} \mathbf{A}^T\mathbf{A}&-\mathbf{A}^T\mathbf{b}\\ -\mathbf{b}^T\mathbf{A}&\|\mathbf{b}\|^2 \end{pmatrix}
B=(ATA−bTA−ATb∥b∥2)
去掉约束
g
∗
=
min
y
∈
R
n
+
1
y
≠
0
y
T
B
y
∥
y
∥
2
g^*=\min\limits_{\mathbf{y}\in\mathbb{R}^{n+1}\atop \mathbf{y}\neq \mathbf{0}}\frac{\mathbf{y}^T\mathbf{B}\mathbf{y}}{\|\mathbf{y}\|^2}
g∗=y=0y∈Rn+1min∥y∥2yTBy
这是瑞利商问题
g
∗
=
λ
m
i
n
(
B
)
g^*=\lambda_{min}\left(\mathbf{B}\right)
g∗=λmin(B)
如果
y
∗
\mathbf{y}^*
y∗是
g
g
g的最优解,且
y
n
+
1
=
≠
0
y_{n+1}=\neq 0
yn+1==0,则
y
~
=
1
y
n
+
1
∗
y
∗
\tilde{\mathbf{y}}=\frac{1}{y_{n+1}^{*}}\mathbf{y}^*
y~=yn+1∗1y∗也是
f
f
f的最优解
证明:
显然
f
∗
≥
g
∗
f^*\ge g^*
f∗≥g∗
设
y
~
\tilde{\mathbf{y}}
y~是
f
f
f的最优解(
y
~
n
+
1
=
1
\tilde{y}_{n+1}=1
y~n+1=1)
y
~
T
B
y
~
∥
y
~
∥
2
=
1
(
y
n
+
1
∗
)
2
(
y
∗
)
T
B
y
∗
1
(
y
n
+
1
∗
)
2
∥
y
∗
∥
2
=
(
y
∗
)
T
B
y
∗
∥
y
∗
∥
2
\frac{\tilde{\mathbf{y}}^T\mathbf{B}\tilde{\mathbf{y}}}{\|\tilde{\mathbf{y}}\|^2}=\frac{\frac{1}{\left(y_{n+1}^{*}\right)^2}\left(\mathbf{y}^*\right)^T\mathbf{B}\mathbf{y}^*}{\frac{1}{\left(y_{n+1}^{*}\right)^2}\|\mathbf{y}^*\|^2}=\frac{\left(\mathbf{y}^*\right)^T\mathbf{B}\mathbf{y}^*}{\|\mathbf{y}^*\|^2}
∥y~∥2y~TBy~=(yn+1∗)21∥y∗∥2(yn+1∗)21(y∗)TBy∗=∥y∗∥2(y∗)TBy∗
y
~
\tilde{\mathbf{y}}
y~是
f
f
f的最优解,也是
g
g
g的最优解
接下来就是怎么找一个 y n + 1 ≠ 0 y_{n+1}\neq 0 yn+1=0的解了
假设
λ
m
i
n
(
B
)
<
λ
m
i
n
(
A
T
A
)
\lambda_{min}\left(\mathbf{B}\right)<\lambda_{min}\left(\mathbf{A}^T\mathbf{A}\right)
λmin(B)<λmin(ATA)
则
T
L
S
′
TLS'
TLS′问题的最优解为
1
y
n
+
1
v
\frac{1}{y_{n+1}}\mathbf{v}
yn+11v,其中
y
=
(
v
y
n
+
1
)
\mathbf{y}=\begin{pmatrix} \mathbf{v}\\ y_{n+1}\\ \end{pmatrix}
y=(vyn+1)是
B
\mathbf{B}
B的最小特征值对应的特征向量
证明:设
y
∗
\mathbf{y}^*
y∗是
g
g
g的最优解
假设
y
n
+
1
∗
=
0
y_{n+1}^*=0
yn+1∗=0,则
λ
m
i
n
(
B
)
=
(
y
∗
)
T
B
y
∗
∥
y
∗
∥
2
=
v
T
A
T
A
v
∥
v
∥
2
≥
λ
(
A
T
A
)
\lambda_{min}\left(\mathbf{B}\right)=\frac{\left(\mathbf{y}^*\right)^T\mathbf{B}\mathbf{y}^*}{\|\mathbf{y}^*\|^2}=\frac{\mathbf{v}^T\mathbf{A}^T\mathbf{A}\mathbf{v}}{\|\mathbf{v}\|^2}\ge\lambda\left(\mathbf{A}^T\mathbf{A}\right)
λmin(B)=∥y∗∥2(y∗)TBy∗=∥v∥2vTATAv≥λ(ATA)
矛盾
引理1
设
A
\mathbf{A}
A是
n
×
n
n\times n
n×n对称矩阵,特征值为
α
1
≥
⋯
≥
α
n
\alpha_1\ge\cdots\ge \alpha_n
α1≥⋯≥αn
设
B
\mathbf{B}
B为
A
\mathbf{A}
A删掉第
k
k
k行和第
k
k
k列后的矩阵,特征值为
β
1
≥
⋯
≥
β
n
−
1
\beta_1\ge \cdots \ge \beta_{n-1}
β1≥⋯≥βn−1
于是
α
1
≥
β
1
≥
⋯
≥
β
n
−
1
≥
α
n
\alpha_1\ge \beta_1\ge \cdots \ge\beta_{n-1}\ge \alpha_n
α1≥β1≥⋯≥βn−1≥αn
证明:似乎要用Courant-Fischer Minmax Theorem,具体没找到
引理2
设
A
^
=
(
A
,
u
)
∈
R
m
×
n
,
m
≥
n
\hat{\mathbf{A}}=\left(\mathbf{A},\mathbf{u}\right)\in\mathbb{R}^{m\times n},\quad m\ge n
A^=(A,u)∈Rm×n,m≥n
则奇异值
σ
^
1
≥
σ
1
≥
σ
^
2
≥
⋯
≥
σ
^
n
−
1
≥
σ
n
−
1
≥
σ
^
n
\hat{\sigma}_1\ge \sigma_1\ge\hat{\sigma}_2\ge\cdots\ge \hat{\sigma}_{n-1}\ge \sigma_{n-1}\ge \hat{\sigma}_n
σ^1≥σ1≥σ^2≥⋯≥σ^n−1≥σn−1≥σ^n
证明:
A
^
T
A
^
=
(
A
T
A
A
T
u
u
T
A
u
T
u
)
\hat{\mathbf{A}}^T\hat{\mathbf{A}}=\begin{pmatrix} \mathbf{A}^T\mathbf{A}&\mathbf{A}^T\mathbf{u}\\ \mathbf{u}^T\mathbf{A}&\mathbf{u}^T\mathbf{u} \end{pmatrix}
A^TA^=(ATAuTAATuuTu)
删掉最后一行最后一列,然后代入引理1,就成立了
SVD
T
L
S
TLS
TLS问题
min
E
,
w
,
x
∥
E
∥
F
2
+
∥
w
∥
2
s.t.
(
A
+
E
)
x
=
b
+
w
E
∈
R
m
×
n
,
w
∈
R
m
\begin{array}{ll} \min\limits_{\mathbf{E}, \mathbf{w}, \mathbf{x}} & \|\mathbf{E}\|_{F}^{2}+\|\mathbf{w}\|^{2} \\ \text { s.t. } & (\mathbf{A}+\mathbf{E}) \mathbf{x}=\mathbf{b}+\mathbf{w} \\ & \mathbf{E} \in \mathbb{R}^{m \times n}, \mathbf{w} \in \mathbb{R}^{m} \end{array}
E,w,xmin s.t. ∥E∥F2+∥w∥2(A+E)x=b+wE∈Rm×n,w∈Rm
等价于
min
E
~
,
x
∥
E
~
∥
F
2
s.t.
(
A
~
+
E
~
)
(
x
−
1
)
=
0
\begin{array}{ll} \min\limits_{\tilde{\mathbf{E}}, \mathbf{x}} & \|\tilde{\mathbf{E}}\|_{F}^{2} \\ \text { s.t. } & \left(\tilde{\mathbf{A}}+\tilde{\mathbf{E}}\right)\begin{pmatrix} \mathbf{x}\\ -1\\ \end{pmatrix}=0 \end{array}
E~,xmin s.t. ∥E~∥F2(A~+E~)(x−1)=0
其中
E
~
=
(
E
,
w
)
,
A
~
=
(
A
,
b
)
∈
R
m
×
(
n
+
1
)
\tilde{\mathbf{E}}=\left(\mathbf{E},\mathbf{w}\right),\tilde{\mathbf{A}}=\left(\mathbf{A},\mathbf{b}\right)\in \mathbb{R}^{m\times\left(n+1\right)}
E~=(E,w),A~=(A,b)∈Rm×(n+1)
当 rank ( A ~ + E ~ ) < n + 1 \operatorname{rank}\left(\tilde{\mathbf{A}}+\tilde{\mathbf{E}}\right)<n+1 rank(A~+E~)<n+1时有解
(1)当 rank ( A ) < n + 1 \operatorname{rank}\left(\mathbf{A}\right)<n+1 rank(A)<n+1
E
~
=
0
\tilde{\mathbf{E}}=\mathbf{0}
E~=0
(2)当
rank
(
A
)
=
n
+
1
\operatorname{rank}\left(\mathbf{A}\right)=n+1
rank(A)=n+1
且
A
~
\tilde{\mathbf{A}}
A~的奇异值
σ
1
(
A
~
)
≥
σ
2
(
A
~
)
≥
⋯
≥
σ
n
(
A
~
)
>
σ
n
+
1
(
A
~
)
>
0
\sigma_1\left(\tilde{\mathbf{A}}\right)\ge \sigma_2\left(\tilde{\mathbf{A}}\right)\ge\cdots\ge \sigma_{n}\left(\tilde{\mathbf{A}}\right)> \sigma_{n+1}\left(\tilde{\mathbf{A}}\right)>0
σ1(A~)≥σ2(A~)≥⋯≥σn(A~)>σn+1(A~)>0
设
A
~
\tilde{\mathbf{A}}
A~的奇异值分解
A
~
=
∑
i
=
1
n
+
1
σ
i
u
i
v
i
T
\tilde{\mathbf{A}}=\sum_{i=1}^{n+1}\sigma_i\mathbf{u}_i\mathbf{v}_i^T
A~=∑i=1n+1σiuiviT
根据Eckart-Young Theorem,令
E
~
=
−
σ
n
+
1
u
n
+
1
v
n
+
1
T
\tilde{\mathbf{E}}=-\sigma_{n+1}\mathbf{u}_{n+1}\mathbf{v}_{n+1}^T
E~=−σn+1un+1vn+1T
(
A
~
+
E
~
)
y
=
0
⇒
y
=
k
v
n
+
1
\left(\tilde{\mathbf{A}}+\tilde{\mathbf{E}}\right)\mathbf{y}=\mathbf{0}\Rightarrow \mathbf{y}=k\mathbf{v}_{n+1}
(A~+E~)y=0⇒y=kvn+1
x
=
−
1
v
n
+
1
,
n
+
1
(
v
1
,
n
+
1
⋮
v
n
,
n
+
1
)
\mathbf{x}=-\frac{1}{v_{n+1,n+1}}\begin{pmatrix} v_{1,n+1}\\ \vdots\\ v_{n,n+1} \end{pmatrix}
x=−vn+1,n+11⎝
⎛v1,n+1⋮vn,n+1⎠
⎞
(3)当
rank
(
A
)
=
n
+
1
\operatorname{rank}\left(\mathbf{A}\right)=n+1
rank(A)=n+1
且
A
~
\tilde{\mathbf{A}}
A~的奇异值
σ
1
(
A
~
)
≥
σ
2
(
A
~
)
≥
⋯
≥
σ
r
(
A
~
)
>
σ
r
+
1
(
A
~
)
=
⋯
=
σ
n
+
1
(
A
~
)
>
0
\sigma_1\left(\tilde{\mathbf{A}}\right)\ge \sigma_2\left(\tilde{\mathbf{A}}\right)\ge\cdots\ge \sigma_{r}\left(\tilde{\mathbf{A}}\right)> \sigma_{r+1}\left(\tilde{\mathbf{A}}\right)=\cdots= \sigma_{n+1}\left(\tilde{\mathbf{A}}\right)>0
σ1(A~)≥σ2(A~)≥⋯≥σr(A~)>σr+1(A~)=⋯=σn+1(A~)>0
参考:
Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB
Ake Bjõrck - Numerical Methods for Least Squares Problems-SIAM
Charles L. Lawson, Richard J. Hanson - Solving Least Squares Problems-Society for Industrial Mathematics (1987)
the late J. H. Wilkinson - The algebraic eigenvalue problem-Clarendon Press_ Oxford University Press (1988)