Singular Value Decomposition (SVD) Tutorial - How does SVD work?

When you browse standard web sources like  Singular Value Decomposition (SVD) on Wikipedia , you find many equations, but not an intuitive explanation of what it is or how it works. Singular Value Decomposition is a way of factoring matrices into a series of linear approximations that expose the underlying structure of the matrix.

SVD is extraordinarily useful and has many applications such as data analysis, signal processing, pattern recognition, image compression, weather prediction, and Latent Semantic Analysis or LSA (also referred to as Latent Semantic Indexing or LSI).

How does SVD work?

As a simple example, let's look at golf scores. Suppose Phil, Tiger, and Vijay play together for 9 holes and they each make par on every hole. Their scorecard, which can also be viewed as a (hole x player) matrix might look like this.

HoleParPhilTigerVijay
14444
25555
33333
44444
54444
64444
74444
83333
95555

Let's look at the problem of trying to predict what score each player will make on a given hole. One idea is give each hole a HoleDifficulty factor, and each player a PlayerAbility factor. The actual score is predicted by multiplying these two factors together.

PredictedScore = HoleDifficulty * PlayerAbility

For the first attempt, let's make the HoleDifficulty be the par score for the hole, and let's make the player ability equal to 1. So on the first hole, which is par 4, we would expect a player of ability 1 to get a score of 4.

PredictedScore = HoleDifficulty * PlayerAbility = 4 * 1 = 4

For our entire scorecard or matrix, all we have to do is multiply the PlayerAbility (assumed to be 1 for all players) by the HoleDifficulty (ranges from par 3 to par 5) and we can exactly predict all the scores in our example.

In fact, this is the one dimensional (1-D) SVD factorization of the scorecard. We can represent our scorecard or matrix as the product of two vectors, the HoleDifficulty vector and the PlayerAbility vector. To predict any score, simply multiply the appropriate HoleDifficulty factor by the appropriate PlayerAbility factor. Following normal vector multiplication rules, we can generate the matrix of scores by multiplying the HoleDifficulty vector by the PlayerAbility vector, according to the following equation.


 
PhilTigerVijay
444
555
333
444
444
444
444
333
555
=
HoleDifficulty
4
5
3
4
4
4
4
3
5
*
PlayerAbility
PhilTigerVijay
111

Mathematicians like to keep everything orderly, so the convention is that all vectors should be scaled so they have length 1. For example, the PlayerAbility vector is modified so that the sum of the squares of its elements add to 1, instead of the current 12 + 12 + 12 = 3. To do this, we have to divide each element by the square root of 3, so that when we square it, it becomes 1/3 and the three elements add to 1. Similarly, we have to divide each HoleDifficulty element by the square root of 148. The square root of 3 times the square root of 148 is our scaling factor 21.07. The complete 1-D SVD factorization (to 2 decimal places) is:

 

PhilTigerVijay
444
555
333
444
444
444
444
333
555
=

HoleDifficulty
0.33
0.41
0.25
0.33
0.33
0.33
0.33
0.25
0.41
*

ScaleFactor
21.07
*

PlayerAbility
PhilTigerVijay
0.580.580.58

Our HoleDifficulty vector, that starts with 0.33, is called the Left Singular Vector. The ScaleFactor is the Singular Value, and our PlayerAbility vector, that starts with 0.58 is the Right Singular Vector. If we represent these 3 parts exactly, and multiply them together, we get the exact original scores. This means our matrix is a rank 1 matrix, another way of saying it has a simple and predictable pattern.

More complicated matrices cannot be completely predicted just by using one set of factors as we have done. In that case, we have to introduce a second set of factors to refine our predictions. To do that, we subtract our predicted scores from the actual scores, getting the residual scores. Then we find a second set of HoleDifficulty2 and PlayerAbility2 numbers that best predict the residual scores.

Rather than guessing HoleDifficulty and PlayerAbility factors and subtracting predicted scores, there exist powerful algorithms than can calculate SVD factorizations for you. Let's look at the actual scores from the first 9 holes of the 2007 Players Championship as played by Phil, Tiger, and Vijay.


HoleParPhilTigerVijay
14445
25455
33332
44454
54444
64354
74443
83244
95555

The 1-D SVD factorization of the scores is shown below. To make this example easier to understand, I have incorporated the ScaleFactor into the PlayerAbility and HoleDifficulty vectors so we can ignore the ScaleFactor for this example.


 
PhilTigerVijay
3.954.644.34
4.275.024.69
2.422.852.66
3.974.674.36
3.644.284.00
3.694.334.05
3.333.923.66
3.083.633.39
4.555.355.00
=
HoleDifficulty
4.34
4.69
2.66
4.36
4.00
4.05
3.66
3.39
5.00
*
PlayerAbility
PhilTigerVijay
0.911.071.00

Notice that the HoleDifficulty factor is almost the average of that hole for the 3 players. For example hole 5, where everyone scored 4, does have a factor of 4.00. However hole 6, where the average score is also 4, has a factor of 4.05 instead of 4.00. Similarly, the PlayerAbility is almost the percentage of par that the player achieved, For example Tiger shot 39 with par being 36, and 39/36 = 1.08 which is almost his PlayerAbility factor (for these 9 holes) of 1.07.

Why don't the hole averages and par percentages exactly match the 1-D SVD factors? The answer is that SVD further refines those numbers in a cycle. For example, we can start by assuming HoleDifficulty is the hole average and then ask what PlayerAbility best matches the scores, given those HoleDifficulty numbers? Once we have that answer we can go back and ask what HoleDifficulty best matches the scores given those PlayerAbility numbers? We keep iterating this way until we converge to a set of factors that best predict the score. SVD shortcuts this process and immediately give us the factors that we would have converged to if we carried out the process.

One very useful property of SVD is that it always finds the optimal set of factors that best predict the scores, according to the standard matrix similarity measure (the Frobenius norm). That is, if we use SVD to find the factors of a matrix, those are the best factors that can be found. This optimality property means that we don't have to wonder if a different set of numbers might predict scores better.

Now let's look at the difference between the actual scores and our 1-D approximation. A plus difference means that the actual score is higher than the predicted score, a minus difference means the actual score is lower than the prediction. For example, on the first hole Tiger got a 4 and the predicted score was 4.64 so we get 4 - 4.64 = -0.64. In other words, we must add -0.64 to our prediction to get the actual score.

Once these differences have been found, we can do the same thing again and predict these differences using the formula HoleDifficulty2 * PlayerAbility2. Since these factors are trying to predict the differences, they are the 2-D factors and we have put a 2 after their names (ex. HoleDifficulty2) to show they are the second set of factors.


 
PhilTigerVijay
0.05-0.640.66
-0.28-0.020.31
0.580.15-0.66
0.030.33-0.36
0.36-0.280.00
-0.690.67-0.05
0.670.08-0.66
-1.080.370.61
0.45-0.350.00
=
HoleDifficulty2
-0.18
-0.38
0.80
0.15
0.35
-0.67
0.89
-1.29
0.44
*
PlayerAbility2
PhilTigerVijay
0.82-0.20-0.53

There are some interesting observations we can make about these factors. Notice that hole 8 has the most significant HoleDifficulty2 factor (-1.29). That means that it is the hardest hole to predict. Indeed, it was the only hole on which none of the 3 players made par. It was especially hard to predict because it was the most difficult hole relative to par (HoleDifficulty - par) = (3.39 - 3) = 0.39, and yet Phil birdied it making his score more than a stroke below his predicted score (he scored 2 versus his predicted score of 3.08). Other holes that were hard to predict were holes 3 (0.80) and 7 (0.89) because Vijay beat Phil on those holes even though, in general, Phil was playing better.

The full SVD for this example matrix (9 holes by 3 players) has 3 sets of factors. In general, a m x n matrix where m >= n can have at most n factors, so our 9 x 3 matrix cannot have more than 3 sets of factors. Here is the full SVD factorization (to two decimal places).

 

PhilTigerVijay
445
455
332
454
444
354
443
244
555
=

HoleDifficulty 1-3
4.34-0.18-0.90
4.69-0.38-0.15
2.660.800.40
4.360.150.47
4.000.35-0.29
4.05-0.670.68
3.660.890.33
3.39-1.290.14
5.000.44-0.36
*

PlayerAbility 1-3
PhilTigerVijay
0.911.071.00
0.82-0.20-0.53
-0.210.76-0.62

By SVD convention, the HoleDifficulty and PlayerAbility vectors should all have length 1, so the conventional SVD factorization is:

 

PhilTigerVijay
445
455
332
454
444
354
443
244
555
=

HoleDifficulty 1-3
0.350.09-0.64
0.380.19-0.10
0.22-0.400.28
0.36-0.080.33
0.33-0.18-0.20
0.330.330.48
0.30-0.440.23
0.280.640.10
0.41-0.22-0.25
*

ScaleFactor 1-3
21.0700
02.010
001.42
*

PlayerAbility 1-3
PhilTigerVijay
0.530.620.58
-0.820.200.53
-0.210.76-0.62

Latent Semantic Analysis Tutorial

We hope that you have some idea of what SVD is and how it can be used. Next we'll cover how SVD is used in our Latent Semantic Analysis Tutorial. Although the domain is different, the concepts are the same. We are trying to predict patterns of how words occur in documents instead of trying to predict patterns of how players score on golf holes.



  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 奇异值分解(Singular Value DecompositionSVD)是一种矩阵分解的方法,将一个矩阵分解为三个矩阵的乘积,其中一个矩阵是正交矩阵,另外两个矩阵是对角矩阵。SVD在数据分析、信号处理、图像处理等领域有广泛的应用。它可以用于降维、数据压缩、矩阵近似、特征提取等任务。 ### 回答2: 奇异值分解(Singular Value DecompositionSVD)是矩阵分解的一种方法,它可以将一个复杂的数据矩阵分解成三个简单的矩阵的乘积的形式。这三个矩阵包括:左奇异向量矩阵、奇异值矩阵和右奇异向量矩阵。 在SVD中,奇异值是矩阵的特征值,奇异向量是矩阵的特征向量,而左奇异向量和右奇异向量分别代表数据矩阵在两个不同空间上的特殊变换。在数据处理和分析中,SVD可以用于减少噪声,压缩数据,以及解决线性方程组等问题。 SVD最初由数学家Eckart和Young在1936年提出,而在20世纪60年代和70年代,它才得到了广泛的应用。目前,SVD已经成为了很多数据分析、机器学习和人工智能领域中最常用的技术之一。 在实际应用中,SVD可以用于图像处理、推荐系统、自然语言处理、文本分类、维度约简和信号处理等领域。例如,在推荐系统中,SVD可以用于预测用户对产品的评分,从而为用户推荐最符合他们兴趣的商品。在文本分类中,SVD可以将高维的单词向量映射到低维空间中,从而提高分类的性能。 虽然SVD在许多应用中取得了成功,但其计算代价很高,因此通常需要进行优化以提高效率。一些优化技术包括截断SVD(Truncated SVD)、随机SVD(Randomized SVD)和增量SVD(Incremental SVD)等。这些技术可以降低计算复杂度和内存消耗,提高SVD的速度和可用性。 ### 回答3: 奇异值分解(singular value decomposition, 简称SVD)是一种用于矩阵分解的数学方法,它将一个复杂的矩阵分解成三个部分:U、Σ、V。其中U和V都是正交矩阵,而Σ是一个对角矩阵,对角线上的元素称为奇异值。SVD的应用广泛,例如在图像压缩、信号处理、语音识别、推荐系统等领域都有重要的作用。 SVD的本质目标是将矩阵M表示为下述的累加形式: M = UΣV^T 其中,U和V都是矩阵,Σ是一个对角线上元素按从大到小排列的矩阵,它们的关系是这样的:矩阵M的秩r等于Σ中非零元素的个数。因此,奇异值从大到小表示了矩阵中的信号能量大小,而U和V则表示了信号在不同的方向上的分解。 SVD可以应用于很多问题中。例如,在图像压缩中,可以使用SVD对图像矩阵进行分解,并选取前k个奇异值对应的列向量,再把它们相乘,得到一个近似于原图像的低维矩阵,从而实现图像的压缩。在推荐系统中,SVD可以用来将用户评价和物品特征分解成低维矩阵,从而实现对用户和物品的推荐。此外,SVD还被广泛地应用于语音识别、图像识别等领域。 总的来说,SVD是一种强有力的数学工具,它可以对矩阵进行分解,并提取出有用的信息。由于它的广泛应用和独特的分解方式,SVD也成为了计算机科学和应用数学中的一个热门研究领域。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值