第四章 SVD

最新推荐文章于 2020-12-11 16:54:58 发布

吴一奇

最新推荐文章于 2020-12-11 16:54:58 发布

阅读量644

点赞数

分类专栏： matrix computation 文章标签： SVD 奇异值分解最小二乘

本文链接：https://blog.csdn.net/wu_nan_nan/article/details/76092098

版权

matrix computation 专栏收录该内容

10 篇文章 2 订阅

订阅专栏

SVD may be the most important matrix decomposition of all, for both theoretical and computational purposes.

4.1 Introduction

Theorem 4.1.1 (SVD Theorem) Let $A \in R^{n\times m}$ be a nonzero matrix with rank $r$ . Then A can be expressed as a product

$A = U Σ V^{T}, (4.1.2)$ $A=U\Sigma V^T, \quad\quad (4.1.2)$

where $U\in R^{n\times n}$ and $V\in R^{m\times m}$ are orthogonal, and $\Sigma \in R^{n\times m}$ is a nonsquare “diagonal” matrix. $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0$

4.1.1 Other Forms of the SVD Theorem

The SVD has a simple geometric interpretation.

Theorem 4.1.3 (Geometric SVD Theorem) Let $A\in R^{n\times m}$ be a nonzero matrix with rank $r$ . Then $R^m$ has an orthonormal basis $v_1,\cdots , v_m, R^n$ has an orthonormal basis $u_1, \cdots , u_n,$ and there exist $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r >0$ such that:

$A v i = σ i u i$ $Av_i=\sigma_i u_i$
$A T u i = σ i v i$ $A^Tu_i=\sigma_i v_i$

4.1.2

Theorem 4.1.12 Let $A \in R^{n\times m}$ be a nonzero matrix with rank $r$ . Let $\sigma_1, \cdots , \sigma_r$ be the singular values of $A$ , with associated right and left sinular vectors $v_1, \cdots , v_r$ and $u_1, \cdots , u_r,$ respectively. Then

$A = \sum j = 1 r σ j u j v T j$ $A = \sum_{j=1}^r{\sigma_j u_j v_j^T}$

4.2 SOME BASIC APPLICATIONS OF SINGULAR VALUES

4.2.1 Relationship to Norm and Condition Number

Geometrically $||A||_2$ represents the maximum magnification that can be undergone by any vector $x\in R^m$ when acted on by $A$ .

Theorem 4.2.1 Let $A\in R^{n\times m}$ have singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge 0.$ Then $||A||_2 = \sigma_1.$

Since $A$ and $A^T$ have the same singular values, we have the following corollary.

Corollary 4.2.2 $||A||_2=||A^T||_2$

不知道有什么用，但很好玩的结论：Frobenius matrix norm 等于奇异值的平方和再开根。

$| | A | | F = (\sum i = 1 n \sum j = 1 m | a i j | 2) 1 / 2 = (σ 21 + σ 22 + \dots + σ 2 r) 1 / 2$ $||A||_F=\Big( \sum_{i=1}^n \sum_{j=1}^m |a_{ij}|^2 \Big)^{1/2} = (\sigma_1^2+\sigma_2^2+\cdots +\sigma_r^2)^{1/2}$

方阵的2-条件数等于最大奇异值和最小奇异值之比：

Theorem 4.2.4 Let $A\in R^{n\times n}$ be a nonsigular matrix with singular values $\sigma_1 \ge \cdots \ge \sigma_n > 0.$ Then

$k 2 (A) = σ 1 σ n$ $k_2(A)=\frac{\sigma_1}{\sigma_n}$

Another expresssion for the condition number that was given in Chapter 2 is

$k 2 (A) = m a x m a g ( A ) m i n m a g ( A )$ $k_2(A)=\frac{maxmag(A)}{minmag(A)}$

Theorem 4.2.9 Let $A\in R^{n\times m}$ with $n \ge m.$ Then $||A^TA||_2=||A||_2^2$ and $k_2(A^TA)=k_2(A)^2.$
证明很简单，用A的SVD分解形式代入即可。

伪逆：

$(A^TA)^{-1}A^T$ 称为A的伪逆（pseudoinverse）。

4.2.2 Numerical Rank Determination

数据中可能出现roundoff error和uncertainty.因此有些本该为0的奇异值，可能为很小的正数。这是就应该考虑采用numerical rank.

如果只考虑roundoff error,那么阈值可以设为 $\epsilon = 10u||A||,$ where $u$ is the unit roundoff error.

Matlab的rank命令求的就是numerical rank，而且可以用户设定threshold.

Every rank-deficient matrix has full-rank matrices arbitrarily close to it.

证明：rank-deficient matrix $A$ 的SVD分解 $\Sigma$ 对角线元素有若干为0，只要把为0的奇异值替换成非常小的正数 $\epsilon$ 得到的新矩阵 $A_{\epsilon}$ 与 $A$ 的距离 $||A-A_{epsilon}||_2$ 就等于 $\epsilon.$

Theorem 4.2.15
Let $R^{n\times m}$ with $rank(A) = r > 0.$ Let $A = U\Sigma V^T$ be the SVD of A, with singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r > 0.$
For $k=1,\cdots , r-1,$ define $A_k = U\Sigma_k V^T,$ where $\Sigma_k\in R^{n\times m}$ is the diagonal matrix $diag \quad{ \sigma_1, \cdots ,\sigma_k, 0\cdots ,0}.$ Then $rank(A_k)=k,$ and

$σ k + 1 = | | A - A k | | 2 = m i n | | A - B | | 2, | r a n k (B) \leq k .$ $\sigma_{k+1}=||A-A_k||_2=min{||A-B||_2, \quad | \quad rank(B)\le k}.$
that is, of all matrices of rank k or less, $A_k$ is closest to A.

一句话，秩不一样的两个矩阵，是两个世界的人（它们之间有个鸿沟，且秩相差越大，鸿沟越大）

Corollary 4.2.16 Suppose $A\in R^{n\times m}$ has full rank. Thus $rank(A)=r,$ where $r = min{(n,m)}.$ Let $\sigma_1 \ge \cdots \ge \sigma_r$ be the sigular values of A. Let $B \in R^{n\times m}$ satisfy $||A-B||_2\le \sigma_r.$ Then B also has full rank.

显然，矩阵AB之间的鸿沟太小，不足以成为两个世界的矩阵

结论：

如果A满秩，则足够靠近A的矩阵皆为满秩（距离小于 $\sigma_r$ ）
如果A降秩，则存在满秩矩阵arbitrarily close to it（可以任意靠近，但就是不能等于）.
In topological language, the set of matrices of full rank is an open, dense subset of $R^{n\times m}.$ Thus, in a certain sense, almost all matrices have full rank.
如果一个矩阵降秩，任何很小的波动基本都会将其转换成满秩矩阵。因此，在存在浮点误差的情况下，it is impossible to calculate the (exact, theoretical) rank of a matrix or even detect it is rank deficent.

4.2.3 Orthogonal Decompostions

The QR decompostion with colum pivoting gives $AE=QR$ or equivalently $A=QRE^T,$ where E is a permutation matrix, a special type of orthogonal matrix.

The SVD gives $A=U\Sigma V^T.$ Both are examples of orthogonal decompositions $A=YTZ^T$ , where Y and Z are orthogonal, and T has a simple form.

The QR decomposition is much cheaper to compute than SVD. However, the SVD always reveals the numerical rank of the matrix, whereas QR decomposition may sometimes fail to do so.

4.2.4 Distance to Nearest Singular Matrix

Corollary 4.2.22 Let $A_s$ be the singular matrix that is closest to A, in the sense that $||A-A_s||_2$ is as small as possible. Then $||A-A_s||=\sigma_n$ , and

$| | A - A s | | 2 | | A | | 2 = 1 k 2 ( A )$ $\frac{||A-A_s||_2}{||A||_2}=\frac{1}{k_2(A)}$

由上面的结论很容易得出这里的结论，不过是把 $n\times m$ 的矩阵，变成了 $n\times n$ 的方阵。注意：只有方阵才能叫singular，非方阵只能叫rank-deficient.

4.3 THE SVD AND THE LEAST SQUARES PROBLEM

当系数矩阵A不满秩时，最小二乘问题没有唯一解。但可以增加条件 $||x||_2$ is minimized. 此时具有唯一解。

用SVD解决最小二乘问题的推导如下：

$| | b - A x | | 2 = | | U T (b - A x) | | 2 = | | U T b - Σ (V T x) | | 2 .$ $||b-Ax||_2=||U^T(b-Ax)||_2=||U^Tb-\Sigma(V^Tx)||_2.$

Letting $c=U^Tb$ and $y=V^Tx,$ we have

$| | b - A x | | 22 = | | c - Σ y | | 22 = \sum i = 1 r | c i - σ i y i | 2 + \sum i = r + 1 n | c i | 2$ $||b-Ax||_2^2=||c-\Sigma y||_2^2=\sum_{i=1}^r|c_i-\sigma_iy_i|^2 + \sum_{i=r+1}^n|c_i|^2$

SVD is an expensive way to solve the least squares problem. Its principal advantage is that it gives a completely reliable means of determining the numerical rank for rank-deficient least squares problems.

4.3.2 The Pseudoinverse

Every $A\in R^{n\times m}$ has a pseudoinverse. The minimum-norm solution to a least squares problem can be expressed in terms of the pesudoinverse $A^{\dagger}$ as $x=A^{\dagger}b,$ where

$A † = V Σ † U T$ $A^{\dagger}=V\Sigma^{\dagger} U^T$

其中 $\Sigma^{\dagger}$ 的对角线元素是 $\Sigma$ 对角线元素的倒数。

Pseudoinverse的另一种形式： $(A^TA)^{-1}A^T$

However, there is seldom any reason to compute the pseudoinverse; it is mainly a theoretical tool.

4.4 SENSITIVITY OF THE LEAST SQUARES PROBLEM

In this section we discuss the sensitivity of the solution of the least squares problem under perturbations of $A$ and $b$ .

根据3.5节，求解最小二乘问题可以分成两步：
- First we find a $y\in R(A)$ whose distance from b is minmimal:

$| | b - y | | 2 = m i n s \in R (A) | | b - s | | 2 .$ $||b-y||_2=min_{s\in R(A)}||b-s||_2.$
- Then the least squares solution $x\in R^m$ is found by solving the quation $Ax=y$ exactly.

即，如果将 $R(A)$ 看成一个超平面，那么先将b在超平面上做投影。最小二乘的余项的模就等于b到超平面的距离。

看书上Page 281.

4.4.1 The Effect of Perturbation of b

根据上面的两步求解法可知，
- 如果b几近垂直于 $R(A)$ ，那么 $b$ 很小的扰动，都会造成投影向量相对的改变 $\frac{||\delta y}{||y||_2}$ 很大，从而造成最终的误差很大。
- 如果线性系统 $Ax=y$ is ill conditioned，最终结果误差也会很大。

经过推导可得：

$| | δ x | | 2 | | x | | 2 \leq k 2 A c o s θ | | δ b | | 2 | | b | | 2$ $\frac{||\delta x||_2}{||x||_2} \le \frac{k_2{A}}{cos\theta}\frac{||\delta b||_2}{||b||_2}$

4.4.2 The Effect of Perturbation of A

Unfortunately perturbations of A have a severe effect than perturbations of b.

直接放最小二乘关于A和b扰动的总误差：

$| | δ x | | 2 | | x | | 2 \leq 2 k 2 ( A ) c o s θ ϵ b + 2 (k 2 (A) 2 t a n θ + k 2 (A)) ϵ A$ $\frac{||\delta x||_2}{||x||_2} \le \frac{2k_2(A)}{cos\theta}\epsilon_b + 2(k_2(A)^2tan\theta + k_2(A))\epsilon_A$

分析：

第一项是b的扰动带来的误差
第二项是有A的扰动带来的误差，注意，其与 $k_2(A)^2$ 正相关。

The presence of $k_2(A)^2$ means that even if A is only mildly ill conditioned, a small perturbation in A can cause a large change in x. So we should keep the condition number under control.

一种做法就是machine learning里的feature scaling.

4.4.3 Keeping the condition number under control

例如在用多项式逼近一堆数据时，基函数的选择会影响系数矩阵的条件数。

4.4.4 Accuracy of techniques for solving the least squares problem

QR分解modified Gram-Schmidt method是backward stable的。
normal equation不精确，因为 $k_2(A^TA)=k_2(A^2)$

再次强调：QR method is superior to the normal equations method when the condition number is bad.

1

1

1

1

吴一奇

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
第四章 SVD

SVD may be the most important matrix decomposition of all, for both theoretical and computational purposes.4.1 Introduction Theorem 4.1.1 (SVD Theorem) Let A∈Rn×mA∈Rn×mA \in R^{n\times m} ...
复制链接

扫一扫