清晰易懂、图文并茂、非常帅气的PCA说明

What is Principal Component Analysis?

PCA(Principal Component Analysis)
  • An unsupervised machine learning algorithm. (一个非监督的机器学习算法)
  • Mainly used for dimensionality reduction. (主要用于数据的降维)
  • Through dimensionality reduction, we can find features that are easier for humans to understand or analysis. (通过降维,可以发现更便于人类理解的特征)
  • Other applications: visualisation, de-noising. (其他应用:可视化;去噪)
Simple explanation of PCA


This is a 2D data sample, if we want to do dimensionality reduction, we can reduce it to 1 dimension. So, how to reduce?

One obviously way here is, for both features, choose one feature and discard the other. For example, if we choose feature 1, and discard the feature 2, then all points will be projected to the x axis accordingly. Then we can get the following figure:
在这里插入图片描述在这里插入图片描述
Similarly, if we choose feature 2 and discard the feature 1, then all points will be projected to the y axis.
在这里插入图片描述在这里插入图片描述
So, up to now, we have two solutions for dimensionality reduction.
在这里插入图片描述在这里插入图片描述
Left side, all points are projected in y-axis, which discards the feature 2. Right side, all points are projected in x-axis, which discards the feature 1. Now, let’s compare this two figures, which one is better ?

Obviously, we think the right side figure is better. So you may ask, WHY?
Because when we project the points in x-axis, the distance between different points are much larger, which indicates that it has higher degree of distinction. Also, the larger distance between points can better reflect the distance between the original sample points. However, the left side points are too dense, it differs greatly from the original points and their distribution, correspondingly.

So, if we want to choose one solution to reduce the dimension from 2 to 1, we obviously will choose the right side figure. Now there is one question:
Is this the BEST solution? Do we have better solution?
在这里插入图片描述在这里插入图片描述
Then we can find that, if we choose this axis to project points, there is no big difference between the distribution of the projected points and the original points. In other words, the distance between the projected points are larger than projecting points to the x-axis or y-axis.

Then, our problem become how to find the axis that maximizes the distance between samples? In order to find the target axis, first, we need to define the distance between samples.
Using Variance.
The larger the variance, the more sparse the samples; and the smaller the variance, the closer the samples are.
在这里插入图片描述
Each sample points subtract its mean value, then sum the squares, and then divide it by total number.

Our problem now become, we want to find an axis so that all points are projected to this axis with the largest variance.
To deal with this problem, before we do PCA, first, the value of all sample points will be averaged. The distribution of the sample points has not changed. In fact, we just moved the coordinate axis. To make the sample points have mean value (equals 0) in each dimension.
在这里插入图片描述在这里插入图片描述
Then, the variance can be simplified as follows:
在这里插入图片描述
Attention that, xi here are new sample points after projected to this axis. Since our example here is 2 dimension, the axis here can be written as (w1, w2).
So, what we call PCA, in summary:

  1. Demean all samples.
  2. Then we want to find the direction of the axis w=(w1, w2), to make all sample points projected in w, has the largest variance.
    在这里插入图片描述
    在这里插入图片描述
    Assume the red line is the axis we want to find. Here, the Xi is also a vector, we use this red line to show the Xi vector. Then we project Xi in the w axis, we can get the Xproject. What we want to find is:
    在这里插入图片描述
    It’s actually the length of the blue line.
    Actually, this kind of projection is the definition of dot product (two vectors do dot project A.B).
    在这里插入图片描述
    Here, we want to find the direction of w, w can be treated as a direction vector, so the ||w|| equals 1. Then we can derive:
    在这里插入图片描述
    Here, ||X(i)|| is the length of vector Xi, when it multiplies cos(theta), it equals to the length of Xproject.
    在这里插入图片描述
    Then we can easily find that, the length of Xproject equals to: do dot product between the given sample points Xi and the w axis.
    Then we can get the function as follows:
    在这里插入图片描述
    Hence, finally, our objective function is:
    Find w, to maximize the:
    在这里插入图片描述
    which also can be extended as follows
    在这里插入图片描述
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值