第四章 SVD

SVD may be the most important matrix decomposition of all, for both theoretical and computational purposes.

4.1 Introduction

Theorem 4.1.1 (SVD Theorem) Let ARn×m A ∈ R n × m be a nonzero matrix with rank r r . Then A can be expressed as a product

A=UΣVT,(4.1.2)

where URn×n U ∈ R n × n and VRm×m V ∈ R m × m are orthogonal, and ΣRn×m Σ ∈ R n × m is a nonsquare “diagonal” matrix. σ1σ2σr>0 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0

4.1.1 Other Forms of the SVD Theorem

The SVD has a simple geometric interpretation.

Theorem 4.1.3 (Geometric SVD Theorem) Let ARn×m A ∈ R n × m be a nonzero matrix with rank r r . Then Rm has an orthonormal basis v1,,vm,Rn v 1 , ⋯ , v m , R n has an orthonormal basis u1,,un, u 1 , ⋯ , u n , and there exist σ1σ2σr>0 σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0 such that:

Avi=σiui A v i = σ i u i

ATui=σivi A T u i = σ i v i

4.1.2

Theorem 4.1.12 Let ARn×m A ∈ R n × m be a nonzero matrix with rank r r . Let σ1,,σr be the singular values of A A , with associated right and left sinular vectors v1,,vr and u1,,ur, u 1 , ⋯ , u r , respectively. Then

A=j=1rσjujvTj A = ∑ j = 1 r σ j u j v j T

4.2 SOME BASIC APPLICATIONS OF SINGULAR VALUES

4.2.1 Relationship to Norm and Condition Number

Geometrically ||A||2 | | A | | 2 represents the maximum magnification that can be undergone by any vector xRm x ∈ R m when acted on by A A .

Theorem 4.2.1 Let ARn×m have singular values σ1σ20. σ 1 ≥ σ 2 ≥ ⋯ ≥ 0. Then ||A||2=σ1. | | A | | 2 = σ 1 .

Since A A and AT have the same singular values, we have the following corollary.

Corollary 4.2.2 ||A||2=||AT||2 | | A | | 2 = | | A T | | 2

不知道有什么用,但很好玩的结论:Frobenius matrix norm 等于奇异值的平方和再开根。

||A||F=(i=1nj=1m|aij|2)1/2=(σ21+σ22++σ2r)1/2 | | A | | F = ( ∑ i = 1 n ∑ j = 1 m | a i j | 2 ) 1 / 2 = ( σ 1 2 + σ 2 2 + ⋯ + σ r 2 ) 1 / 2

方阵的2-条件数等于最大奇异值和最小奇异值之比:

Theorem 4.2.4 Let ARn×n A ∈ R n × n be a nonsigular matrix with singular values σ1σn>0. σ 1 ≥ ⋯ ≥ σ n > 0. Then

k2(A)=σ1σn k 2 ( A ) = σ 1 σ n

Another expresssion for the condition number that was given in Chapter 2 is

k2(A)=maxmag(A)minmag(A) k 2 ( A ) = m a x m a g ( A ) m i n m a g ( A )

Theorem 4.2.9 Let ARn×m A ∈ R n × m with nm. n ≥ m . Then ||ATA||2=||A||22 | | A T A | | 2 = | | A | | 2 2 and k2(ATA)=k2(A)2. k 2 ( A T A ) = k 2 ( A ) 2 .
证明很简单,用A的SVD分解形式代入即可。

伪逆:

(ATA)1AT ( A T A ) − 1 A T 称为A的伪逆(pseudoinverse)。

4.2.2 Numerical Rank Determination

数据中可能出现roundoff error和uncertainty.因此有些本该为0的奇异值,可能为很小的正数。这是就应该考虑采用numerical rank.

如果只考虑roundoff error,那么阈值可以设为 ϵ=10u||A||, ϵ = 10 u | | A | | , where u u is the unit roundoff error.

Matlab的rank命令求的就是numerical rank,而且可以用户设定threshold.

Every rank-deficient matrix has full-rank matrices arbitrarily close to it.

证明:rank-deficient matrix A的SVD分解 Σ Σ 对角线元素有若干为0,只要把为0的奇异值替换成非常小的正数 ϵ ϵ 得到的新矩阵 Aϵ A ϵ A A 的距离||AAepsilon||2 就等于 ϵ. ϵ .

Theorem 4.2.15
Let Rn×m R n × m with rank(A)=r>0. r a n k ( A ) = r > 0. Let A=UΣVT A = U Σ V T be the SVD of A, with singular values σ1σ2σr>0. σ 1 ≥ σ 2 ≥ ⋯ ≥ σ r > 0.
For k=1,,r1, k = 1 , ⋯ , r − 1 , define Ak=UΣkVT, A k = U Σ k V T , where ΣkRn×m Σ k ∈ R n × m is the diagonal matrix diagσ1,,σk,0,0. d i a g σ 1 , ⋯ , σ k , 0 ⋯ , 0 . Then rank(Ak)=k, r a n k ( A k ) = k , and

σk+1=||AAk||2=min||AB||2,|rank(B)k. σ k + 1 = | | A − A k | | 2 = m i n | | A − B | | 2 , | r a n k ( B ) ≤ k .

that is, of all matrices of rank k or less, Ak A k is closest to A.

一句话,秩不一样的两个矩阵,是两个世界的人(它们之间有个鸿沟,且秩相差越大,鸿沟越大)

Corollary 4.2.16 Suppose ARn×m A ∈ R n × m has full rank. Thus rank(A)=r, r a n k ( A ) = r , where r=min(n,m). r = m i n ( n , m ) . Let σ1σr σ 1 ≥ ⋯ ≥ σ r be the sigular values of A. Let BRn×m B ∈ R n × m satisfy ||AB||2σr. | | A − B | | 2 ≤ σ r . Then B also has full rank.

显然,矩阵AB之间的鸿沟太小,不足以成为两个世界的矩阵

结论:

  • 如果A满秩,则足够靠近A的矩阵皆为满秩(距离小于 σr σ r
  • 如果A降秩,则存在满秩矩阵arbitrarily close to it(可以任意靠近,但就是不能等于).
  • In topological language, the set of matrices of full rank is an open, dense subset of Rn×m. R n × m . Thus, in a certain sense, almost all matrices have full rank.
  • 如果一个矩阵降秩,任何很小的波动基本都会将其转换成满秩矩阵。因此,在存在浮点误差的情况下,it is impossible to calculate the (exact, theoretical) rank of a matrix or even detect it is rank deficent.

4.2.3 Orthogonal Decompostions

The QR decompostion with colum pivoting gives AE=QR A E = Q R or equivalently A=QRET, A = Q R E T , where E is a permutation matrix, a special type of orthogonal matrix.

The SVD gives A=UΣVT. A = U Σ V T . Both are examples of orthogonal decompositions A=YTZT A = Y T Z T , where Y and Z are orthogonal, and T has a simple form.

The QR decomposition is much cheaper to compute than SVD. However, the SVD always reveals the numerical rank of the matrix, whereas QR decomposition may sometimes fail to do so.

4.2.4 Distance to Nearest Singular Matrix

Corollary 4.2.22 Let As A s be the singular matrix that is closest to A, in the sense that ||AAs||2 | | A − A s | | 2 is as small as possible. Then ||AAs||=σn | | A − A s | | = σ n , and

||AAs||2||A||2=1k2(A) | | A − A s | | 2 | | A | | 2 = 1 k 2 ( A )

由上面的结论很容易得出这里的结论,不过是把 n×m n × m 的矩阵,变成了 n×n n × n 的方阵。注意:只有方阵才能叫singular,非方阵只能叫rank-deficient.

4.3 THE SVD AND THE LEAST SQUARES PROBLEM

当系数矩阵A不满秩时,最小二乘问题没有唯一解。但可以增加条件 ||x||2 | | x | | 2 is minimized. 此时具有唯一解。

用SVD解决最小二乘问题的推导如下:

||bAx||2=||UT(bAx)||2=||UTbΣ(VTx)||2. | | b − A x | | 2 = | | U T ( b − A x ) | | 2 = | | U T b − Σ ( V T x ) | | 2 .

Letting c=UTb c = U T b and y=VTx, y = V T x , we have

||bAx||22=||cΣy||22=i=1r|ciσiyi|2+i=r+1n|ci|2 | | b − A x | | 2 2 = | | c − Σ y | | 2 2 = ∑ i = 1 r | c i − σ i y i | 2 + ∑ i = r + 1 n | c i | 2

SVD is an expensive way to solve the least squares problem. Its principal advantage is that it gives a completely reliable means of determining the numerical rank for rank-deficient least squares problems.

4.3.2 The Pseudoinverse

Every ARn×m A ∈ R n × m has a pseudoinverse. The minimum-norm solution to a least squares problem can be expressed in terms of the pesudoinverse A A † as x=Ab, x = A † b , where

A=VΣUT A † = V Σ † U T

其中 Σ Σ † 的对角线元素是 Σ Σ 对角线元素的倒数。

Pseudoinverse的另一种形式: (ATA)1AT ( A T A ) − 1 A T

However, there is seldom any reason to compute the pseudoinverse; it is mainly a theoretical tool.

4.4 SENSITIVITY OF THE LEAST SQUARES PROBLEM

In this section we discuss the sensitivity of the solution of the least squares problem under perturbations of A A and b.

根据3.5节,求解最小二乘问题可以分成两步:
- First we find a yR(A) y ∈ R ( A ) whose distance from b is minmimal:

||by||2=minsR(A)||bs||2. | | b − y | | 2 = m i n s ∈ R ( A ) | | b − s | | 2 .

- Then the least squares solution xRm x ∈ R m is found by solving the quation Ax=y A x = y exactly.

即,如果将 R(A) R ( A ) 看成一个超平面,那么先将b在超平面上做投影。最小二乘的余项的模就等于b到超平面的距离。

看书上Page 281.

4.4.1 The Effect of Perturbation of b

根据上面的两步求解法可知,
- 如果b几近垂直于 R(A) R ( A ) ,那么 b b 很小的扰动,都会造成投影向量相对的改变||δy||y||2很大,从而造成最终的误差很大。
- 如果线性系统 Ax=y A x = y is ill conditioned,最终结果误差也会很大。

经过推导可得:

||δx||2||x||2k2Acosθ||δb||2||b||2 | | δ x | | 2 | | x | | 2 ≤ k 2 A c o s θ | | δ b | | 2 | | b | | 2

4.4.2 The Effect of Perturbation of A

Unfortunately perturbations of A have a severe effect than perturbations of b.

直接放最小二乘关于A和b扰动的总误差:

||δx||2||x||22k2(A)cosθϵb+2(k2(A)2tanθ+k2(A))ϵA | | δ x | | 2 | | x | | 2 ≤ 2 k 2 ( A ) c o s θ ϵ b + 2 ( k 2 ( A ) 2 t a n θ + k 2 ( A ) ) ϵ A

分析:

  • 第一项是b的扰动带来的误差
  • 第二项是有A的扰动带来的误差,注意,其与 k2(A)2 k 2 ( A ) 2 正相关。

The presence of k2(A)2 k 2 ( A ) 2 means that even if A is only mildly ill conditioned, a small perturbation in A can cause a large change in x. So we should keep the condition number under control.

一种做法就是machine learning里的feature scaling.

4.4.3 Keeping the condition number under control

例如在用多项式逼近一堆数据时,基函数的选择会影响系数矩阵的条件数。

4.4.4 Accuracy of techniques for solving the least squares problem

  • QR分解modified Gram-Schmidt method是backward stable的。
  • normal equation不精确,因为 k2(ATA)=k2(A2) k 2 ( A T A ) = k 2 ( A 2 )

再次强调:QR method is superior to the normal equations method when the condition number is bad.

1

1

1

1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值