摘要:本篇主要对教材第三章有关最小二乘的重要概念做一总结
3.1 The Discrete Least Square Problem
问题描述:A task that occurs frequently in scientific investigations is that of finding a straight line that “fits” some set of data points.
Ax=b,whereA∈Rn×m,x∈Rm,b∈Rn,n>m(overdeterminedsystem)
we want to minimize the residual:
||r||2=||b−Ax||2
3.1.1 使用二范数的最小二乘的统计意义
The choice of the 2-norm can be justified on the statistical grounds. Suppose the data fail to lie on a straight line because of errors in the measured yi . If the errors are independent and normally distributed with zero and variance σ2 , then the solution of the least square problem is the maximum likelihood estimator of the true solution. 即,最小二乘的结果是最大似然估计。
3.2 ORTHOGONAL MATRICES, ROTATORS, AND RELECTORS
3.2.1 Orghogonal Matrices
Def: A matrix Q∈Rn×n is said to be orthogonal if QQT=I. This equation says that Q has an inverse, and
Q−1=QT.
Orthogonal transformations preserve lengths and angles.
- (a) <Qx,Qy>=<x,y> <script type="math/tex" id="MathJax-Element-9"> = </script>
- (b) ||Qx||2=||x||2
有两种正交变换:rotator and reflector.
- rotator表示旋转
- reflector表示沿一条轴线做镜面变换。
- 二者的目的都是把某一个列向量跟坐标轴对齐
- All matrix computations build upon rotators and reflectors are normwise backward stable.
3.2.2 Rotators
Theorem 3.2.20
Let A∈Rn×n. Then there exists an orthogonal matrix Q and an upper triangular matrixR such that A=QR. 即,任意方阵都有QR分解。
3.2.3 Reflectors
Theorem 3.2.23
Let u∈Rn with ||u||2=1, and define P∈Rn×nbyP=uuT . then
- (a) Pu=u
- (b) Pv=0if<u,v>=0
- (c) P2=P
- (d) PT=P
- (e) P=uuT has rank 1, since its range consists of multiples of u
Theorem 3.2.26
Let
u∈Rn with ||u||2=1, and define Q∈Rn×nbyQ=I−2uuT . then- (a) Qu=−u
- (b) Qv=vif<u,v>=0
- (c) Q=QT ( Q is symmetric)
- (d)
QT=Q−1 ( Q is orthogonal) (e)
Q−1=Q ( Q is an involution)Matrices
Q=I−2uuT ( ||u||2 =1) are called reflectors or Householder transformations
如何避免overflow和underflow
Since squaring doubles the exponents, an overflow can occur if some of the entires are very large. Likewise underflow can occur if some of the entries are very small. Obviously we must avoid overflows; underflows can also occasionally be dangerous.
因此需要对数据进行缩放,常用方法是,将数据统一除以最大绝对值。(Page 199)
矩阵乘法运算顺序不同,计算量也不同
The amount of work required to compute uvTB depends dramatically upon the order in which the operations are perfhomed. Supposed that u∈Rn,v∈Rn,andB∈Rn×m
- (a) (uvT)B 的计算量是 2n2m
- (b) u(vTB) 的计算量是 3nm
- 因此
Q
应该保存为
Q=I−γuuT 并使用(b)的方式参与 QB 运算。
Uniqueness of the QR Decomposition
Theorem 3.2.46 Let A∈Rn×n be nonsingular. There exist unique Q,R∈Rn×n such that Q is orthogonal,
R is upper triangular with positive main-diagonal entries, and A=QR . 即,当矩阵 A 满秩时,QR分解唯一(在保证R的对角线元素皆为正的情况下)。
3.3 Solution of the Least Square Problem
Theorem 3.3.12
LetA∈Rn×mandb∈Rn,n>m. Then the least squares problem for the overdetermined system Ax=b always has a solution. If rank(A) < m, there are infinitely many solutions.
3.4 THE GRAM-SCHMIDT PROCESS
3.4.1 Theorem 3.4.2
Let Q∈Rn×n . Then Q is an orthogonal matrix if and only if its columns(rows) form an orthonormal set. 即,Q是正交矩阵当且仅当行(列)向量组成标准正交集。
3.4.2 The Gram-Schmidt orthogonalization is the same as the QR decomposition.
3.5 GEOMETRIC APPROACH TO THE LEAST SQUARES PROBLEM
3.5.1 Definitions
- orthogonal complement
The orghogonal complement of
S , denoted Sperp , is defined to be the set of vectors in R^n that are orthogonal to S . That is,
Sperp=x∈Rn|<x,y>=0forally∈S - Null space(kernel)
N(A)=x∈Rm|Ax=0.
即,使 Ax 等于0的向量集合。
- Range
R(A)=Ax|x∈Rm.
3.5.2 Theorem 3.5.3
Let S be any subspace of R^n. Then for every
x∈Rn , there exist unique elements s∈S and sperp∈Sperp for which x=s+sperp.
3.5.3 Normal Equation
即, Ax 应该是 b 在Corollary 3.5.20:
Let x∈Rm . Then||b−Ax||2=minw∈Rm||b−Aw||2if and only ifb−Ax∈R(A)perp
有引理 3.5.20和 R(A)perp=N(AT) 很容易推出Normal Equation.Let x∈Rm. Then x solves the least squares problem for the system
Ax=b if and only if
ATAx=ATb
3.5.4 The coefficient matrix of the normal equations is positive semidefinite. If rank(A)=m , then ATA is positive definite.
QR分解的relfector和rotator是backward stable的,而Normal Equation中计算ATA引入的浮点误差可能导致ATA不正定,除非矩阵A的条件数很小。PROOF: 非常简单
xT(ATA)x=xTATAx=(Ax)T(Ax)
记, Ax=u ,则上式变为 uTu 易知 uTu≥0 且只有 u=0 时等号成立,又如果 A 满秩,则只有x=0 的时候等号成立。
3.5 其它
- The Continuous Least Squares Problem
- Updating the QR Decomposition
即,当矩阵 A 不断增加一行一列时,如果根据之前的QR 分解计算出当前 QR 分解。 - 求解Least Square的方法
- QR分解
- Normal Equation
- LM
- 机器学习里的梯度下降法