Singular Value Decomposition（SVD）奇异值分解

最新推荐文章于 2022-06-09 13:28:41 发布

damaohao88

最新推荐文章于 2022-06-09 13:28:41 发布

阅读量4.7k

点赞数 1

分类专栏：机器学习实战文章标签：奇异值分解 svd

机器学习实战专栏收录该内容

34 篇文章 2 订阅

订阅专栏

In this article, we will offer a geometric explanation of singular value decompositions and look at some of the applications of them. ...

在本文中,我们将给出一种奇异值分解的几何解释,并给出了一些有关奇异值分解的应用。

Introduction

The topic of this article, the singular value decomposition, is one that should be a part of the standard mathematics undergraduate curriculum but all too often slips between the cracks. Besides being rather intuitive, these decompositions are incredibly useful. For instance, Netflix, the online movie rental company, is currently offering a $1 million prize for anyone who can improve the accuracy of its movie recommendation system by 10%. Surprisingly, this seemingly modest problem turns out to be quite challenging, and the groups involved are now using rather sophisticated techniques. At the heart of all of them is the singular valuedecomposition.

本文的主题，奇异值分解应该是标准的数学本科课程的一部分,但经常漏讲。这些分解不仅相当直观而且非常有用。例如,Netflix在线电影租赁公司Netflix为那些可以使电影推荐系统的准确性提高了10%的人们提供100万美元的奖金。令人惊讶的是,这个看似可以达到的结果却是非常具有挑战性,并且参与的团队正在使用相当复杂的技术。核心技术就是奇异值分解。

A singular value decomposition provides a convenient way for breaking a matrix, which perhaps contains some data we are interested in, into simpler, meaningful pieces. In this article, we will offer a geometric explanation of singular value decompositions and look at some of the applications of them.

奇异值分解为矩阵分解为简单有意义的小矩阵提供了便捷的方法，其中待分解矩阵或许包含了我们感兴趣的数据。在本文中,我们将给出一种奇异值分解的几何解释,并给出了一些有关奇异值分解的应用。

The geometry of linear transformations

Let us begin by looking at some simple matrices, namely those with two rows and two columns. Our first example is the diagonal matrix

我们先看一个2行2列简单的矩阵。首先第一个例子是一个对角矩阵

Geometrically, we may think of a matrix like this as taking a point (x, y) in the plane and transforming it into another point using matrix multiplication:

在几何意思上,我们可能认为矩阵的用法是这样的：平面上点(x,y)使用矩阵乘法把它转换到另一个点:

The effect of this transformation is shown below: the plane is horizontally stretched by a factor of 3, while there is no vertical change.

这种转变的效果如下图所示:这个平面横向拉伸了3倍,而纵向没有变化。

Now let's look at

which produces this effect

It is not so clear how to describe simply the geometric effect of the transformation. However, let's rotate our grid through a 45 degree angle and see what happens.

这样展示简单的几何转换的效果不是那么清楚。但是，让我们对网格旋转45度角，看看会发生什么。

Ah ha. We see now that this new grid is transformed in the same way that the original grid was transformed by the diagonal matrix: the grid is stretched by a factor of 3 in one direction.

我们现在看到新的坐标网格变换和对原来的坐标网格乘以一个对角矩阵后变换结果相同，即坐标在一个方向拉伸了3倍

解释：和

前一个等式是对原坐标轴先旋转45度然后X轴拉伸3倍，后一个等式是对变换后的新坐标轴旋转45度，两者最后结果相同

This is a very special situation that results from the fact that the matrix M is symmetric; that is, the transpose of M, the matrix obtained by flipping the entries about the diagonal, is equal to M. If we have a symmetric 2

2 matrix, it turns out that we may always rotate the grid in the domain so that the matrix acts by stretching and perhaps reflecting in the two directions. In other words, symmetric matrices behave like diagonal matrices.

这是一种非常特殊的情况，即矩阵M是对称的，翻转M也就是说对对角阵进行旋转等价于M。如果我们有一个对称的2×2矩阵，事实证明，我们总是可以在这个区域中旋转网格结果就是使这个矩阵拉伸，也许反映在两个方向。换句话说，对称矩阵和对角矩阵的行为很像。

Said with more mathematical precision, given a symmetric matrix M, we may find a set of orthogonal vectors v_i so that Mv_i is a scalar multiple of v_i; that is

说到数学精确度，给出一个对阵矩阵M，我们也许可以找到一系列的正交向量Vi，那么Mvi就是一个标量和Vi的乘积。

M v_i = λ _i v_i

where λ_i is a scalar. Geometrically, this means that the vectors v_i are simply stretched and/or reflected when multiplied by M. Because of this property, we call the vectors v_i eigenvectors of M; the scalars λ_i are called eigenvalues. An important fact, which is easily verified, is that eigenvectors of a symmetric matrix corresponding to different eigenvalues are orthogonal.

其中 λ_i 是个标量。几何意义上这意味着向量v_i 通过乘以矩阵M只是被简单的拉伸了。正是由于这个特性，我们称Vi为矩阵M的特征向量；标量 λ_i 称为特征值。一个重要的事实就是对称矩阵不同特征值的特征向量相互正交，这个事实很容易证明。

If we use the eigenvectors of a symmetric matrix to align the grid, the matrix stretches and reflects the grid in the same way that it does the eigenvectors.

如果我们用一个对称矩阵的特征向量来对齐网格，那么这个矩阵拉伸并映射到这个坐标系的方式和这个矩阵作用到特征向量的方式一样。

The geometric description we gave for this linear transformation is a simple one: the grid is simply stretched in one direction. For more general matrices, we will ask if we can find an orthogonal grid that is transformed into another orthogonal grid. Let's consider a final example using a matrix that is not symmetric:

我们给这个线性变换的几何描述是很简单的：坐标网格只是在一个方向拉伸。对于一般的矩阵，我们要问是否能够找到一个正交的坐标网格，这个坐标网格能够被变换到另一个正交坐标网格中。举最后一个例子，我们使用一个非对称矩阵：

This matrix produces the geometric effect known as a shear.

It's easy to find one family of eigenvectors along the horizontal axis. However, our figure above shows that these eigenvectors cannot be used to create an orthogonal grid that is transformed into another orthogonal grid. Nonetheless, let's see what happens when we rotate the grid first by 30 degrees,

我们很容易找到一组横坐标上的特征向量。然而，上图显示这些特征向量不能够创建一个变换到另外一个正交坐标轴的正交坐标网格。虽然如此，但是当我们先对坐标网格旋转30度会发生什么

Notice that the angle at the origin formed by the red parallelogram on the right has increased. Let's next rotate the grid by 60 degrees.

注意到右边红色平行四边形的角度增大了。接下来我们对坐标网格旋转60度

Hmm. It appears that the grid on the right is now almost orthogonal. In fact, by rotating the grid in the domain by an angle of roughly 58.28 degrees, both grids are now orthogonal.

右边坐标网格现在近似正交。事实上，通过对左边坐标网格旋转58.28度时，右边的坐标网格相互正交。

解释：由公式看出先将原坐标轴旋转58.28度然后再左乘矩阵，

等式右边可以验证这两列向量正交。

The singular value decomposition

This is the geometric essence of the singular value decomposition for 2

2 matrices: for any 2

2 matrix, we may find an orthogonal grid that is transformed into another orthogonal grid.

对于2×2矩阵的奇异值分解的几何意义是：任何2×2矩阵，我们也许能找到转化成另一种正交坐标网格的正交坐标网格。

We will express this fact using vectors: with an appropriate choice of orthogonal unit vectors v₁ and v₂, the vectors Mv₁ and Mv₂ are orthogonal.

我们可以使用向量来描述这个现象：适当旋转一组正交单位向量v1和v2，该向量MV1和MV2是正交的。

We will use u₁ and u₂ to denote unit vectors in the direction of Mv₁ and Mv₂. The lengths of Mv₁ and Mv₂--denoted by σ₁ and σ₂--describe the amount that the grid is stretched in those particular directions. These numbers are called the singular values of M. (In this case, the singular values are the golden ratio and its reciprocal, but that is not so important here.)

我们使用u₁ 和u₂ 来表示单位向量Mv₁ 和 Mv₂的方向。Mv₁ 和 Mv₂的长度用σ₁ 和 σ₂表示。σ₁ 和 σ₂描述了在坐标网格这些特定的方向上被拉伸的程度。这些数字称为矩阵M的奇异值。

We therefore have

M v₁ = σ ₁ u₁

M v₂ = σ ₂ u₂

We may now give a simple description for how the matrix M treats a general vector x. Since the vectors v₁ and v₂are orthogonal unit vectors, we have

我们对矩阵M如何作用于一般向量M给出了一种简单的描述。当向量 v₁ 和 v₂是正交的单位向量时，我们有

x = ( v₁

x) v₁ + ( v₂

x) v₂

例如：x = ，V1.x=.=2 , V2.x = 3, 因为x = 2 V1 + 3 V2，所以x = (v₁x) v₁ + (v₂x) v₂

This means that

M x = ( v₁

x) M v₁ + ( v₂

x) M v₂

M x = ( v₁

x) σ ₁ u₁ + ( v₂

x) σ ₂ u₂

Remember that the dot product may be computed using the vector transpose

记住点乘可以通过使用向量转置来计算

x = v ^T x

which leads to

M x = u₁σ ₁ v₁ ^T x + u₂σ ₂ v₂ ^T x

M = u₁σ ₁ v₁ ^T + u₂σ ₂ v₂ ^T

This is usually expressed by writing

M = UΣ V^T

where U is a matrix whose columns are the vectors u₁ and u₂, Σ is a diagonal matrix whose entries are σ₁ and σ₂, and V is a matrix whose columns are v₁ and v₂. The superscript T on the matrix V denotes the matrix transpose ofV.

其中U是其列向代表向量u₁ 和 u₂的矩阵， Σ是其对角值为σ₁ 和 σ₂的对角矩阵，V是其行向代表v₁ 和 v₂的矩阵。矩阵V上的上标T表示对矩阵V的转置。

This shows how to decompose the matrix M into the product of three matrices: V describes an orthonormal basis in the domain, and U describes an orthonormal basis in the co-domain, and Σ describes how much the vectors in Vare stretched to give the vectors in U.

这显示了如何将矩阵M分解为三个矩阵的相乘的形式：V描述了原来区域的正交基，U描述了变换后的正交基，Σ 描述了矩阵V中的向量变换到矩阵U中的向量被拉伸的程度。

How do we find the singular decomposition?

The power of the singular value decomposition lies in the fact that we may find it for any matrix. How do we do it? Let's look at our earlier example and add the unit circle in the domain. Its image will be an ellipse whose major and minor axes define the orthogonal grid in the co-domain.

奇异值分解的强大就在于我们可以对于任何矩阵进行奇异值分解。我们该如何做到？让我们看一下上面举得例子并在该区域增加一个单位圆。它的图像时一个椭圆，其长轴和短轴代表变换后的正交坐标轴。

Notice that the major and minor axes are defined by Mv₁ and Mv₂. These vectors therefore are the longest and shortest vectors among all the images of vectors on the unit circle.

注意到Mv₁ 和 Mv₂代表了长轴和短轴。因此这些向量是在单位圆上所有向量的最长和最短向量。

In other words, the function |Mx| on the unit circle has a maximum at v₁ and a minimum at v₂. This reduces the problem to a rather standard calculus problem in which we wish to optimize a function over the unit circle. It turns out that the critical points of this function occur at the eigenvectors of the matrix M^TM. Since this matrix is symmetric, eigenvectors corresponding to different eigenvalues will be orthogonal. This gives the family of vectorsv_i.

总之，在单位圆上的函数 |Mx| 在V1上有最大值和在V2上有最小值。这减少了微积分方面的问题的，在这个问题中我们希望在单位圆上进行函数优化。结果证明这个函数的极值点发生在M^TM矩阵的特征向量方向上。因为矩阵是对阵矩阵，所以对应不同特征值的特征向量相互正交。

The singular values are then given by σ_i = |Mv_i|, and the vectors u_i are obtained as unit vectors in the direction ofMv_i. But why are the vectors u_i orthogonal?

σ_i = |Mv_i| 给出了奇异值，向量u_i 是 Mv_i方向上的单位向量。但是为什么向量u_i 正交呢？

To explain this, we will assume that σ_i and σ_j are distinct singular values. We have

为了解释这一点，我们假设 σ_i and σ_j 是不同的奇异值。

M v_i = σ _i u_i

M v_j = σ _j u_j.

Let's begin by looking at the expression Mv_i

Mv_j and assuming, for convenience, that the singular values are non-zero. On one hand, this expression is zero since the vectors v_i, which are eigenvectors of the symmetric matrix M^TM are orthogonal to one another:

我们来看一下Mv_i的表达式，为了方便，假设Mv_j 的奇异值非零。一方面，这个表达式为零，因为向量v_i是对称矩阵 M^TM 的特征向量正交于另外的特征向量。

M v_i

M v_j = v_i ^T M ^T M v_j = v_i

M ^T M v_j = λ _j v_i

v_j = 0.

On the other hand, we have

M v_i

M v_j = σ _iσ _j u_i

u_j = 0

Therefore, u_i and u_j are othogonal so we have found an orthogonal set of vectors v_i that is transformed into another orthogonal set u_i. The singular values describe the amount of stretching in the different directions.

因此， u_i 和 u_j 正交，结果我们找到了变换到另一个正交集u_i的向量Vi的正交集，奇异值描述了不同方向拉伸的程度。

In practice, this is not the procedure used to find the singular value decomposition of a matrix since it is not particularly efficient or well-behaved numerically.

在实践中，这不是用来寻找矩阵的奇异值分解的步骤，因为它不是特别有效或具有良好的性能。

参考文献：1、英文原文地址：http://www.ams.org/samplings/feature-column/fcarc-svd
2、http://www.cnblogs.com/LeftNotEasy/archive/2011/01/19/svd-and-applications.html

奇异值分解SVD应用——LSI