本文为《Linear algebra and its applications》的读书笔记
目录
Least-Squares problems
- Inconsistent systems arise often in applications. When a solution is demanded and none exists, the best one can do is to find an x \boldsymbol x x that makes A x A\boldsymbol x Ax as close as possible to b \boldsymbol b b. Think of A x A\boldsymbol x Ax as an a p p r o x i m a t i o n approximation approximation to b \boldsymbol b b. The smaller the distance between b \boldsymbol b b and A x A\boldsymbol x Ax, given by ∥ b − A x ∥ \left\|\boldsymbol b - A\boldsymbol x\right\| ∥b−Ax∥, the better the approximation.
- The general least-squares problem is to find an
x
\boldsymbol x
x that makes
∥
b
−
A
x
∥
\left\|\boldsymbol b - A\boldsymbol x\right\|
∥b−Ax∥ as small as possible.
∥
b
−
A
x
∥
\left\|\boldsymbol b - A\boldsymbol x\right\|
∥b−Ax∥ is called the least-squares error of this approximation.
The adjective “least-squares” arises from the fact that ∥ b − A x ∥ \left\|\boldsymbol b - A\boldsymbol x\right\| ∥b−Ax∥ is the square root of a sum of squares.
- Notice that
A
x
A\boldsymbol x
Ax will necessarily be in
C
o
l
A
ColA
ColA. So we seek an
x
\boldsymbol x
x that makes
A
x
A\boldsymbol x
Ax the closest point in
C
o
l
A
ColA
ColA to
b
\boldsymbol b
b. See Figure 1.
Solution of the General Least-Squares Problem
A T A x = A T b A^TA\boldsymbol x=A^T\boldsymbol b ATAx=ATb
- Apply the Best Approximation Theorem in Section 6.3 to the subspace
C
o
l
A
ColA
ColA. Let
b ^ = p r o j C o l A b \hat \boldsymbol b=proj_{ColA}\boldsymbol b b^=projColAbBecause b ^ \hat\boldsymbol b b^ is in C o l A ColA ColA, the equation A x = b ^ A\boldsymbol x =\hat\boldsymbol b Ax=b^ is consistent, and there is an x ^ \hat\boldsymbol x x^ in R n \mathbb R^n Rn such that
A x ^ = b ^ ( 1 ) A\hat\boldsymbol x =\hat\boldsymbol b\ \ \ \ \ \ \ \ \ \ (1) Ax^=b^ (1)Since b ^ \hat\boldsymbol b b^ is the closest point in C o l A ColA ColA to b \boldsymbol b b, a vector x ^ \hat\boldsymbol x x^ is a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b if and only if x ^ \hat\boldsymbol x x^ satisfies (1). See Figure 2. [There are many solutions of (1) if the equation has free variables.]
- Suppose
x
^
\hat\boldsymbol x
x^ satisfies
A
x
^
=
b
^
A\hat\boldsymbol x =\hat\boldsymbol b
Ax^=b^. By the Orthogonal Decomposition Theorem in Section 6.3,
b
−
b
^
\boldsymbol b -\hat\boldsymbol b
b−b^ is orthogonal to
C
o
l
A
ColA
ColA, so
b
−
A
x
^
\boldsymbol b - A\hat\boldsymbol x
b−Ax^ is orthogonal to each column of
A
A
A. If
a
j
\boldsymbol a_j
aj is any column of
A
A
A, then
a
j
⋅
(
b
−
A
x
^
)
=
0
\boldsymbol a_j \cdot (\boldsymbol b- A\hat\boldsymbol x)=0
aj⋅(b−Ax^)=0, and
a
j
T
(
b
−
A
x
^
)
=
0
\boldsymbol a^T_j(\boldsymbol b - A\hat\boldsymbol x)= 0
ajT(b−Ax^)=0. Since each
a
j
T
\boldsymbol a^T_j
ajT is a row of
A
T
A^T
AT ,
A T ( b − A x ^ ) = 0 ( 2 ) A^T (\boldsymbol b - A\hat\boldsymbol x)=\boldsymbol 0\ \ \ \ \ \ \ (2) AT(b−Ax^)=0 (2)(This equation also follows from ( C o l A ) ⊥ = N u l A T (Col\ A)^\perp=Nul\ A^T (Col A)⊥=Nul AT.) Thus
A T b − A T A x ^ = 0 A T A x ^ = A T b \begin{aligned}A^T \boldsymbol b - A^TA\hat\boldsymbol x&=\boldsymbol 0\\ A^TA\hat\boldsymbol x&=A^T \boldsymbol b\end{aligned} ATb−ATAx^ATAx^=0=ATbThese calculations show that each least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b satisfies the equation
The equation represents a system of equations called the normal equations (法方程) for A x = b A\boldsymbol x =\boldsymbol b Ax=b. A solution of (3) is often denoted by x ^ \hat\boldsymbol x x^.
- The next theorem gives useful criteria for determining when there is only one least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b. (Of course, the orthogonal projection b ^ \hat\boldsymbol b b^ is always unique.)
Formula (4) for x ^ \hat \boldsymbol x x^ is useful mainly for theoretical purposes and for hand calculations when A T A A^TA ATA is a 2 × 2 2\times 2 2×2 invertible matrix.
PROOF
- If A x = 0 A\boldsymbol x =\boldsymbol 0 Ax=0, then A T A x = 0 A^TA\boldsymbol x =\boldsymbol 0 ATAx=0 ∴ N u l A ⊆ N u l A T A \therefore NulA\subseteq NulA^TA ∴NulA⊆NulATA.
- If A T A x = 0 A^TA\boldsymbol x =\boldsymbol 0 ATAx=0, then x T A T A x = ( A x ) T A x = 0 \boldsymbol x^TA^TA\boldsymbol x =(A\boldsymbol x)^TA\boldsymbol x=\boldsymbol 0 xTATAx=(Ax)TAx=0 ∴ A x = 0 \therefore A\boldsymbol x=\boldsymbol 0 ∴Ax=0 ∴ N u l A T A ⊆ N u l A \therefore NulA^TA\subseteq NulA ∴NulATA⊆NulA. ∴ N u l A = N u l A T A ∴ r a n k A T A = n − d i m N u l A T A = n − d i m N u l A = r a n k A \therefore NulA= NulA^TA\\\therefore rankA^TA=n-dimNulA^TA=n-dimNulA=rankA ∴NulA=NulATA∴rankATA=n−dimNulATA=n−dimNulA=rankASo when A A A has n n n linearly independent columns, r a n k A T A = r a n k A = n rankA^TA=rankA=n rankATA=rankA=n, which means A T A A^TA ATA is an invertible matrix.
当 A A A 的列正交时,快速找到最小二乘解
- The next example shows how to find a least-squares solution of A x = b A\boldsymbol x =\boldsymbol b Ax=b when the columns of A A A are orthogonal. Such matrices often appear in linear regression problems.
EXAMPLE 4
- Find a least-squares solution of
A
x
=
b
A\boldsymbol x =\boldsymbol b
Ax=b for
SOLUTION
- Because the columns
a
1
\boldsymbol a_1
a1 and
a
2
\boldsymbol a_2
a2 of
A
A
A are orthogonal, the orthogonal projection of
b
\boldsymbol b
b onto
C
o
l
A
ColA
ColA is given by
Now that b ^ \hat \boldsymbol b b^ is known, we can solve A x ^ = b ^ A\hat\boldsymbol x=\hat\boldsymbol b Ax^=b^. But this is trivial, since we already know what weights to place on the columns of A A A to produce b ^ \hat\boldsymbol b b^. It is clear from (5) that
- In some cases, the normal equations for a least-squares problem can be i l l ill ill- c o n d i t i o n e d ( 病 态 的 ) conditioned(病态的) conditioned(病态的); that is, small errors in the calculations of the entries of A T A A^TA ATA can sometimes cause relatively large errors in the solution x ^ \hat \boldsymbol x x^.
利用 QR 分解
- If the columns of A A A are linearly independent, the least-squares solution can often be computed more reliably through a Q R QR QR factorization of A A A (described in Section 6.4).