Chapter 6 (Orthogonality and Least Squares): Least-Squares problems (最小二乘问题)_the general least-squares problem is to find an x -CSDN博客

本文链接：https://blog.csdn.net/weixin_42437114/article/details/108919018

本文介绍了线性代数及其应用中的最小二乘问题，当线性系统无解时，寻找使误差平方和最小的解。详细讨论了通用最小二乘问题的解决方案，通过矩阵的ATA形式来表达，并指出当矩阵的列正交时，可以更快速地找到最小二乘解。此外，文章还提到了QR分解在处理这类问题时的可靠性优势，尤其是在处理病态系统时避免计算误差的重要性。

摘要由CSDN通过智能技术生成

本文为《Linear algebra and its applications》的读书笔记

Least-Squares problems

Inconsistent systems arise often in applications. When a solution is demanded and none exists, the best one can do is to find an $\boldsymbol x$ that makes $A\boldsymbol x$ as close as possible to $\boldsymbol b$ . Think of $A\boldsymbol x$ as an $a p p r o x i m a t i o n$ to $\boldsymbol b$ . The smaller the distance between $\boldsymbol b$ and $A\boldsymbol x$ , given by $\left\|\boldsymbol b - A\boldsymbol x\right\|$ , the better the approximation.

The general least-squares problem is to find an $\boldsymbol x$ that makes $\left\|\boldsymbol b - A\boldsymbol x\right\|$ as small as possible. $\left\|\boldsymbol b - A\boldsymbol x\right\|$ is called the least-squares error of this approximation.

The adjective “least-squares” arises from the fact that $\left\|\boldsymbol b - A\boldsymbol x\right\|$ is the square root of a sum of squares.

Notice that $A\boldsymbol x$ will necessarily be in $C o l A$ . So we seek an $\boldsymbol x$ that makes $A\boldsymbol x$ the closest point in $C o l A$ to $\boldsymbol b$ . See Figure 1.

Solution of the General Least-Squares Problem

$A^TA\boldsymbol x=A^T\boldsymbol b$

Apply the Best Approximation Theorem in Section 6.3 to the subspace $C o l A$ . Let
$\hat \boldsymbol b=proj_{ColA}\boldsymbol b$ Because $\hat\boldsymbol b$ is in $C o l A$ , the equation $A\boldsymbol x =\hat\boldsymbol b$ is consistent, and there is an $\hat\boldsymbol x$ in $\mathbb R^n$ such that
$A\hat\boldsymbol x =\hat\boldsymbol b\ \ \ \ \ \ \ \ \ \ (1)$ Since $\hat\boldsymbol b$ is the closest point in $C o l A$ to $\boldsymbol b$ , a vector $\hat\boldsymbol x$ is a least-squares solution of $A\boldsymbol x =\boldsymbol b$ if and only if $\hat\boldsymbol x$ satisfies (1). See Figure 2. [There are many solutions of (1) if the equation has free variables.]
Suppose $\hat\boldsymbol x$ satisfies $A\hat\boldsymbol x =\hat\boldsymbol b$ . By the Orthogonal Decomposition Theorem in Section 6.3, $\boldsymbol b -\hat\boldsymbol b$ is orthogonal to $C o l A$ , so $\boldsymbol b - A\hat\boldsymbol x$ is orthogonal to each column of $A$ . If $\boldsymbol a_j$ is any column of $A$ , then $\boldsymbol a_j \cdot (\boldsymbol b- A\hat\boldsymbol x)=0$ , and $\boldsymbol a^T_j(\boldsymbol b - A\hat\boldsymbol x)= 0$ . Since each $\boldsymbol a^T_j$ is a row of $A^T$ ,
$A^T (\boldsymbol b - A\hat\boldsymbol x)=\boldsymbol 0\ \ \ \ \ \ \ (2)$ (This equation also follows from $(Col\ A)^\perp=Nul\ A^T$ .) Thus
$\begin{aligned}A^T \boldsymbol b - A^TA\hat\boldsymbol x&=\boldsymbol 0\\ A^TA\hat\boldsymbol x&=A^T \boldsymbol b\end{aligned}$ These calculations show that each least-squares solution of $A\boldsymbol x =\boldsymbol b$ satisfies the equation
The equation represents a system of equations called the normal equations (法方程) for $A\boldsymbol x =\boldsymbol b$ . A solution of (3) is often denoted by $\hat\boldsymbol x$ .

在这里插入图片描述

The next theorem gives useful criteria for determining when there is only one least-squares solution of $A\boldsymbol x =\boldsymbol b$ . (Of course, the orthogonal projection $\hat\boldsymbol b$ is always unique.)

在这里插入图片描述

Formula (4) for $\hat \boldsymbol x$ is useful mainly for theoretical purposes and for hand calculations when $A^TA$ is a $2\times 2$ invertible matrix.

PROOF

If $A\boldsymbol x =\boldsymbol 0$ , then $A^TA\boldsymbol x =\boldsymbol 0$ $\therefore NulA\subseteq NulA^TA$ .
If $A^TA\boldsymbol x =\boldsymbol 0$ , then $\boldsymbol x^TA^TA\boldsymbol x =(A\boldsymbol x)^TA\boldsymbol x=\boldsymbol 0$ $\therefore A\boldsymbol x=\boldsymbol 0$ $\therefore NulA^TA\subseteq NulA$ . $\therefore NulA= NulA^TA\\\therefore rankA^TA=n-dimNulA^TA=n-dimNulA=rankA$ So when $A$ has $n$ linearly independent columns, $rankA^TA=rankA=n$ , which means $A^TA$ is an invertible matrix.

当 $A$ 的列正交时，快速找到最小二乘解

The next example shows how to find a least-squares solution of $A\boldsymbol x =\boldsymbol b$ when the columns of $A$ are orthogonal. Such matrices often appear in linear regression problems.

EXAMPLE 4

Find a least-squares solution of $A\boldsymbol x =\boldsymbol b$ for

SOLUTION

Because the columns $\boldsymbol a_1$ and $\boldsymbol a_2$ of $A$ are orthogonal, the orthogonal projection of $\boldsymbol b$ onto $C o l A$ is given by
Now that $\hat \boldsymbol b$ is known, we can solve $A\hat\boldsymbol x=\hat\boldsymbol b$ . But this is trivial, since we already know what weights to place on the columns of $A$ to produce $\hat\boldsymbol b$ . It is clear from (5) that

In some cases, the normal equations for a least-squares problem can be $i l l$ - $c o n d i t i o n e d (病态的)$ ; that is, small errors in the calculations of the entries of $A^TA$ can sometimes cause relatively large errors in the solution $\hat \boldsymbol x$ .

利用 QR 分解

If the columns of $A$ are linearly independent, the least-squares solution can often be computed more reliably through a $Q R$ factorization of $A$ (described in Section 6.4).