What is an intuitive explanation of the relation between PCA and SVD?

最新推荐文章于 2024-09-10 12:36:53 发布

weixin_34068198

最新推荐文章于 2024-09-10 12:36:53 发布

阅读量176

点赞数

文章标签：人工智能

What is an intuitive explanation of the relation between PCA and SVD?

36 FOLLOWERS

Last asked: 30 Sep, 2014

QUESTION TOPICS

Singular Value Decomposition

Principal Component Analysis

Intuitive Explanations

Statistics (academic discipline)

Machine Learning

Algorithms

Mathematics

QUESTION STATS

Views 7,318

Followers36

Edits

3 Answers

Mike Tamir, CSO - GalvanizeU accredited Masters program creating top tier Data Scientists...

5.2k Views • Upvoted by Ricky Kwok, Ph.D. in Applied Math from UC Davis

There is a very direct mathematical relation between SVD (Singular Value Decomposition) and PCA (Principal Component Analysis) - see below. For this reason, the two algorithms deliver essentially the same result: a set of "new axes" constructed from linear combinations of the original the feature space axes in which the dataset is plotted. These “new axes” are useful because they systematically break down the variance in the data points (how widely the data points are distributed) based on each direction's contribution to the variance in the data:

The result of this process is a ranked list of "directions" in the feature space ordered from most variance to least. The directions along which there is greatest variance are referred to as the "principal components" (of variation in the data) and the common wisdom is that by focusing on the way the data is distributed along these dimensions exclusively, one can capture most of the information represented in in the original feature space without having to deal with such a high number of dimensions which can be of great benefit in statistical modeling and Data Science applications (see: When and where do we use SVD?).

What is the Formal Relation between SVD and PCA?
Let's let the matrix

And,

Note,

but since

Written 1 Dec, 2014 • View Upvotes

Related Questions

David Beniaguev

486 Views

I would like to refine two points that I think are important:

I'll be assuming your data matrix is an m×n matrix that is organized such that rows are data samples (m samples), and columns are features (d features).

The first point is that SVD preforms low rank matrix approximation.
Your input to SVD is a number k (that is smaller than m or d), and the SVD procedure will return a set of k vectors of d dimensions (can be organized in a k×d matrix), and a set of k coefficients for each data sample (there are m data samples, so it can be organized in a m×k matrix), such that for each sample, the linear combination of it's k coefficients multiplied by the k vectors best reconstructs that data sample (in the euclidean distance sense). and this is true for all data samples.
So in a sense, the SVD procedure finds the optimum k vectors that together span a subspace in which most of the data samples lie in (up to a small reconstruction error).

PCA on the other hand is:
1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.

So, the second point is that PCA is giving you as output the subspace that spans the deviations from the mean data sample, and SVD provides you with a subspace that spans the data samples themselves (or, you can view this as a subspace that spans the deviations from zero).

Note that these two subspaces are usually NOT the same, and will be the same only if the mean data sample is zero.

In order to understand a little better why they are not the same, let's think of a data set where all features values for all data samples are in the range 999-1001, and each feature's mean is 1000.

From the SVD point of view, the main way in which these sample deviate from zero are along the vector (1,1,1,...,1).
From the PCA point of view, on the other hand, the main way in which these data samples deviate from the mean data sample is dependent on the precise data distributions around the mean data sample...

In short, we can think of SVD as "something that compactly summarizes the main ways in which my data is deviating from zero" and PCA as "something that compactly summarizes the main ways in which my data is deviating from the mean data sample".

Written 11d ago • View Upvotes

Tigran Ishkhanov

1.3k Views

PCA is a statistical technique in which SVD is used as a low level linear algebra algorithm. One can apply SVD to any matrix C. In PCA this matrix C arises from the data and has a statistical meaning - the element c_ij is a covariance between i-th and j-th coordinates of your dataset after mean-normalization.

Written 30 Sep, 2014 • View Upvotes