Dimensionality Reduction - Principle Component Analysis problem formulation

最新推荐文章于 2020-12-28 20:58:36 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2020-12-28 20:58:36 发布

阅读量147

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/109775252

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第十五章《降维》中第117课时《主成分分析问题规划》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助.
————————————————

For the problem of dimensionality reduction, by far the most popular and commonly used algorithm is something called principle components analysis or PCA. In this video, I'd like to start to talk about the problem formulation for PCA. In other words let us try to formulate precisely what would like PCA to do.

Let's say we have a data set like this. So this is a data set of example $x\in \mathbb{R}^{2}$ , and let's say I want to reduce the dimension of the data from two dimensional to one dimensional. In other words I would like to find a line onto which to project the data. So what seems a good line onto which to project the data? It seems like a line like this, might be a pretty good choice. And the reason this might be a god choice is that if you look at where the projected versions of the points goes. I'm gonna take this point and project it down here and get that. This point gets projected here etc. What we find is that the distance between each point and the projected version is pretty small. That is, these blue line segments are pretty short. So what PCA does is it tries to find a lower-dimensional surface, really a line in this case, onto which to project the data. So that the sum of squares of these little blue segments is minimized. The length of those blue line segments, that's sometimes also called the projection error, and so what PCA does is it tries to find the surface onto which to project the data so as to minimize that. As an aside, before applying PCA, it's standard practice to first perform mean normalization and feature scaling so that the features $x_{1}$ and $x_{2}$ should have zero mean and should have comparable ranges of values. I've already done this for this example, but I'll come back to this later and talk about more about features scaling and mean normalization in the context of PCA later. But coming back to this example, in contrast to the red lines that I just drew, here's a different line onto which I could project my data. This is the magenta line. And as you can see this magenta line is much a worse direction onto which to project my data, right? So if I were to project my data onto the magenta line, like the other set of points like that. And the projection errors, that is these blue line segments would be huge. So these points have to move a huge distance in order to get projected onto the magenta line. And so that's why PCA principle component analysis would choose something like the red line rather than like the magenta line down here.

Let's write out the PCA problem a little more formally. The goal of PCA if we want to reduce data from two-dimensional to one-dimensional is we're going to try to find a vector, that is a vector $u^{(i)}\in \mathbb{R}^{n}$ , so that would be in $\mathbb{R}^{2}$ in this case. I'm going to find a direction onto which to project the data so as to minimize the projection error. So in this example I'm hoping that PCA will find this vector, which I'm going to call $u^{(1)}$ , so that when I project the data onto the line that I defined by extending out this vector, I end up with pretty small reconstruction errors and reference data looks like this. By the way, I should mention that whether PCA gives me $u^{(1)}$ or $-u^{(1)}$ , it doesn't matter. So if it gives me a positive vector in this direction that's fine. If it gives me, sort of the opposite vector facing in the opposite direction, so that would be like $-u^{(1)}$ , it doesn't matter because each of these vectors defines the same red line onto which I'm projecting my data. So this is a case of reducing data from 2 dimensional to 1 dimensional. In the more general case, we have n-dimensional data and we want to reduce it to k dimensions. In that case, we want to find not just a single vector onto which to project the data, but we want to find k dimensions onto which to project the data. So as to minimize the projection error. So here's an example. If I have a 3D cloud point like this, then maybe what I want to do is find a pair of vectors, and I'm going to call these vectors, extending from the origin here's $u^{(1)}$ , and here's my second vector $u^{(2)}$ . And together these two vectors define a plane, or they define a 2D surface, kind of like this, sort of, 2D surface onto which I'm going to project my data. For those of you that are familiar with linear algebra, the formal definition of this is that we're going to find a set of vectors, $u^{(1)}, u^{(2)}... u^{(k)}$ , and what we're going to do is project the data onto the linear subspace spanned by this set of vectors. But if you're not familiar with linear algebra, just think of it as finding k directions instead of just one direction onto which to project the data. So, finding a k-dimensional surface, really finding a 2D plane in this case which is shown in this figure, we can define the position of the points in the plane using k directions. That's why for PCA, we want to find vectors onto which to project the data. And so, more formally in PCA, what we want to do is find this way to project the data so as to minimize the sort of projection distance, which is the distance between points and projections. And so in this 3D example, to a given point, we would take the point and project it onto this 2D surface. So the projection error would the the distance between the point and where it gets projected down to my 2D surface. And so what PCA does is it'll try to find a line or a plane or whatever onto which to project the data, to try to minimize that 90 degree, or that orthogonal projection error. Finally, one question I sometimes get asked is how does PCA relate to linear regression, because when explain PCA I sometimes end up drawing diagrams like these and that looks a little bit like linear regression.

It turns out PCA is not linear regression. Despite some cosmetic similarity these are totally different algorithms. If we were doing linear regression, what we would do on the left would be trying to predict the values of some variable y given some input features x. And so linear regression, what we're doing is we're fitting a straight line so as to minimize the squared error between a point and the straight line. And so what we are minimizing would be the squared magnitude of these blue lines. And notice I'm drawing these blue lines vertically, they are the vertical distance between the point and the value predicted by the hypothesis. Whereas in contrast, in PCA, what it does is it tries to minimize the magnitude of these blue lines, which are drawn at an angle, these are really the shortest orthogonal distances, the shortest distance between the point and this red line. And this gives very different effects, depending on the data set. And more generally, when you're doing linear regression there is this distinguished variable y that we're trying to predict, all that linear regression is about is taking all the values of x and use that to predict y. Whereas in PCA, there is no distinguished variable y that we're trying to predict, and instead we have a list of features $x_{1}, x_{2}...x_{n}$ , and all these features are treated equally. So no one of them is special.

As one last example, if I have three-dimensional data, and I want to reduce data from 3D to 2D. So maybe I want to find two directions $u_{1}$ and $u_{2}$ onto which to project my data. Then what I have is I have three features $x_{1}, x_{2}, x_{3}$ , and all of these are treated alike. All of these are treated symmetrically and there is no special variable y that I'm trying to predict. And so PCA is not linear regression, and even though at some cosmetic level they might look related, these are actually very different algorithms.

So hopefully you now understand what PCA is doing. It's trying to find a lower dimensional surface onto which to project the data, so as to minimize this squared projection error, to minimize the squared distance between each point and the location of where it gets projected. In the next video we'll start to talk about how to actually find this lower dimensional surface onto which to project the data.

<end>