Dimensionality Reduction - Choosing the number of principal components

最新推荐文章于 2022-05-01 17:03:22 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2022-05-01 17:03:22 发布

阅读量170

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/110404850

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第十五章《降维》中第119课时《主成分数量选择》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助.
————————————————

In the PCA algorithm, we take n dimensional features and reduce them to k dimensional feature representaion. This number k is a parameter of the PCA algorithm. This number k is also called the number of principal components. In this video, I'd like to give you some guidelines that will tell you how people tend to think about how to choose this parameter k for PCA.

In order to choose k, the number of principal components, here're a couple of useful concepts. What PCA tries to do is it tries to minimize the average squared projection error. That is:

$\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}$

Also, let's define the total variation of the data:

$\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2}$

When we were trying to choose k, pretty common rule of thumb is choose the smallest value so that the ratio between these is less than 0.01.

$\frac{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2} }\leq 0.01$

Another way to say this using the language of PCA is that 99% of variance is retained. Don't worry what this phrase means technically, but it just means that this quantity on the left is less than 0.01. So, if you're using PCA and want to tell someone how many principal components you retained, it will be more common to say, well, I chose the k so that 99% of the variance is retained. This number 0.01 is what people often use. Other common value should be 0.05 or 5%. Then we'll say 95% of variance is retained. And for many data sets, it would be surprise that in order to retain the 99% of the variance, you can often reduce the dimension of the data significantly and still retain most of the variance. Because for most real data sets, many features are just highly correlated. And it turns out to be possible to compress the data alot also retain 99% of the variance.

So, how you do implement this? Here's one algorithm you might use. If you start off with k=1 , then we can run PCA and compute $U_{reduced}, z^{(1)}, z^{(2)}...,z^{(m)}, x^{(1)}_{approx},...,x^{(m)}_{approx}$ . And check if 99% of variance is retained. If isn't, then we'll next try k=2 . Then run the same procesure and check if this expression is satisfied. If not, then we'll try k=3 etc. Maybe when k=17 , we find that the 99% variance is retained. Then we use k=17 . But this method is not efficiency.

Fortunately, when we implement PCA, it actually give us a quantity that makes things much easier. Specifically, when you calling svd to get these matrices U,S,V , it also gives us this matrix . And is a square matrix that is diagonal. It turns out for given value , we can do is check whether following is satisfied:

$\frac{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2} }=1-\frac{\sum _{i=1}^{k}s_{ii}}{\sum_{i=1}^{n}s_{ii}}\leq 0.01$

$\frac{\sum _{i=1}^{k}s_{ii}}{\sum_{i=1}^{n}s_{ii}}\geq 0.99$

For example, for k=3 , above is:

$1-\frac{s_{11}+s_{22}+s_{33}}{s_{11}+s_{22}+...,+s_{nn}}\leq 0.01$

$\frac{s_{11}+s_{22}+s_{33}}{s_{11}+s_{22}+...,+s_{nn}}\geq 0.99$

So what we can do is just slowly increase , and just test above quantity to see what is the smallest value of that ensures 99% of variance is retained. If you do this, you need call the svd function only once because that gives you the matrix. With the matrix, you can just keep on doing above calculation by increasing the value of . So don't need to keep on calling svd over and over again to test different value of . This would be much more efficient and not need to run PCA from scratch over and over.

To summarize, the way I often use to choose when using PCA for compression is call svd once with the covariance matrix. Then pick the smallest value of for which this expression is satisfied. By the way, even if you were to pick the value of manually, like for a 1000 dimensional data we want to choose k=100 , then if you want to explain to others the performance of your implementation, the good way is take this quantity and compute what this is and that will tell you what is the percentage of variance retained. And if you report that number, then people that were familiar with PCA can use this to try to understand how well your 100 representation is approximating your original data set.

<end>

王彩旗 edwardwangcq.com

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
Dimensionality Reduction - Choosing the number of principal components

In the PCA algorithm, we take n dimensional features and reduce them to k dimensional feature representaion. This number k is a parameter
复制链接

扫一扫

专栏目录