Dimensionality Reduction - Choosing the number of principal components

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十五章《降维》中第119课时《主成分数量选择》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————

In the PCA algorithm, we take n dimensional features and reduce them to k dimensional feature representaion. This number k is a parameter of the PCA algorithm. This number k is also called the number of principal components. In this video, I'd like to give you some guidelines that will tell you how people tend to think about how to choose this parameter k for PCA. 

In order to choose k, the number of principal components, here're a couple of useful concepts. What PCA tries to do is it tries to minimize the average squared projection error. That is:

\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}

Also, let's define the total variation of the data:

\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2}

When we were trying to choose k, pretty common rule of thumb is choose the smallest value so that the ratio between these is less than 0.01.

\frac{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2} }\leq 0.01

Another way to say this using the language of PCA is that 99% of variance is retained. Don't worry what this phrase means technically, but it just means that this quantity on the left is less than 0.01. So, if you're using PCA and want to tell someone how many principal components you retained, it will be more common to say, well, I chose the k so that 99% of the variance is retained. This number 0.01 is what people often use. Other common value should be 0.05 or 5%. Then we'll say 95% of variance is retained. And for many data sets, it would be surprise that in order to retain the 99% of the variance, you can often reduce the dimension of the data significantly and still retain most of the variance. Because for most real data sets, many features are just highly correlated. And it turns out to be possible to compress the data alot also retain 99% of the variance.

So, how you do implement this? Here's one algorithm you might use. If you start off with k=1, then we can run PCA and compute U_{reduced}, z^{(1)}, z^{(2)}...,z^{(m)}, x^{(1)}_{approx},...,x^{(m)}_{approx}. And check if 99% of variance is retained. If isn't, then we'll next try k=2. Then run the same procesure and check if this expression is satisfied. If not, then we'll try k=3 etc. Maybe when k=17, we find that the 99% variance is retained. Then we use k=17. But this method is not efficiency.

Fortunately, when we implement PCA, it actually give us a quantity that makes things much easier. Specifically, when you calling svd to get these matrices U,S,V, it also gives us this matrix S. And S is a square matrix that is diagonal. It turns out for given value k, we can do is check whether following is satisfied:

\frac{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)}-x^{(i)}_{approx} \right \|^{2}}{\frac{1}{m}\sum_{i=1}^{m}\left \| x^{(i)} \right \|^{2} }=1-\frac{\sum _{i=1}^{k}s_{ii}}{\sum_{i=1}^{n}s_{ii}}\leq 0.01

OR

\frac{\sum _{i=1}^{k}s_{ii}}{\sum_{i=1}^{n}s_{ii}}\geq 0.99

For example, for k=3, above is:

1-\frac{s_{11}+s_{22}+s_{33}}{s_{11}+s_{22}+...,+s_{nn}}\leq 0.01

OR

\frac{s_{11}+s_{22}+s_{33}}{s_{11}+s_{22}+...,+s_{nn}}\geq 0.99

So what we can do is just slowly increase k, and just test above quantity to see what is the smallest value of k that ensures 99% of variance is retained. If you do this, you need call the svd function only once because that gives you the S matrix. With the S matrix, you can just keep on doing above calculation by increasing the value of k. So don't need to keep on calling svd over and over again to test different value of k. This would be much more efficient and not need to run PCA from scratch over and over.

To summarize, the way I often use to choose k when using PCA for compression is call svd once with the covariance matrix. Then pick the smallest value of k for which this expression is satisfied. By the way, even if you were to pick the value of k manually, like for a 1000 dimensional data we want to choose k=100, then if you want to explain to others the performance of your implementation, the good way is take this quantity and compute what this is and that will tell you what is the percentage of variance retained. And if you report that number, then people that were familiar with PCA can use this to try to understand how well your 100 representation is approximating your original data set.

<end>

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值