Dimensionality Reduction - Motivation I: Data Compression

最新推荐文章于 2022-11-25 16:50:57 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2022-11-25 16:50:57 发布

阅读量97

点赞数

分类专栏：人工智能 # 机器学习

本文链接：https://blog.csdn.net/edward_wang1/article/details/109432111

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

机器学习

109 篇文章 0 订阅

订阅专栏

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第十五章《降维》中第115课时《目标一: 数据压缩》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助.
————————————————

In this video, I'd like to start talking about a second type of unsupervised learning problem called dimensionality reduction. There are a couple of different reasons why one might want to do dimensionality reduction. One is data compression. And as we'll see later, data compression not only allows us to compress the data and have it therefore use less computer memory or disk space, but will also allow us to speed up our learning algorithms. But first, let's start by talking about what is dimensionality reduction.

As a motivating example, let's say that we've collected a data set with many features, and I've plotted just two of them here. And let's say that unknow to us two of the features were actually the length of something in centimeters, and a different feature $x_{2}$ , is the length of the same thing in inches. So, this gives us a highly redundant representation and maybe instead of having two separate features $x_{1}$ and $x_{2}$ , both of which basically measure the length, maybe what we want to do is, reduce the data to one-dimensional and just have one number measuring this length. In case this example seems a bit contrived, this centimeter and inches example is actually not that unrealistic, and not that different from things that I see happening in industry. If you have hundreds or thousands of features, is often easy to lose track of exactly what features you have. And sometimes you may have a few different engineering teams, maybe one engineering team gives you 200 features, the second engineering team gives you another 300 features, and the third engineering team gives you 500 features. So you have 1000 features altogether. And it actually becomes hard to keep track of exactly which features you got from which team. And it's actually not that hard to have highly redundant features like these. And so if the length in centimeters were rounded off to the nearest centimeter and length in inches were rounded off to the nearest inch. Then, that's why these examples don't lie perfectly on a straight line, because round off errors to the nearest centimeter or the nearest inch. And if we can reduce the data to one dimension instead of two dimensions, that reduces the redundancy. For different example maybe one that seems slightly less contrived. For many years I've been working with autonomous helicopter pilots. And so, if you were to measure if you were to do a survey or do a test of these different pilots, you might have one feature, $x_{1}$ , which is maybe the skill of these helicopter pilots, and maybe $x_{2}$ could be the pilot enjoyment. That is how much they enjoy flying. Maybe these two features will be highly correlated. And what you really care about might be this direction, a different feature that really measures pilot aptitude. And I'm making up the name aptitude of course, and again, if have highly correlated features, maybe you really want to reduce the dimension. So, let me say a little bit more about what it really means to reduce the dimension of the data from 2 dimensions that is from 2D to 1 dimension or 1D. Let me color in these examples by using different colors. In this case, by reducing the dimension, what I mean is that I would like to find maybe this line, this direction on which most of the data seems to lie and project all the data onto that line that I just drew. And by doing so, what I can do is just measure the position of each of the examples on that line. And what I can do is come up with a new feature $z_{1}$ , and to specify the position on the line I need only one number. So $z_{1}$ is a new feature that specifies the location of each of those points on this green line. And what it means is whereas previously if I had an example $x^{(1)}$ . In order to represent $x^{(1)}$ , I need the two dimensional number, or a two dimensional feature vector. Instead now I can use $z^{(1)}$ to represent my first example and that's going to be real number. And similarly if $x^{(2)}$ is my second example, previously this required two numbers to represent, if I instead compute the projection of that black cross onto the line. Now, I only need one real number $z^{(2)}$ to represent the location. And so on, through my examples. Just to summarize, if we allow ourselves to approximate the original data set by projecting all of my original examples onto this green line over here, then I need only one number to specify the position of a point on the line. And so what I can do is therefore use just one number to represent the location of each of my traing examples after they ar projected onto that green line. So this is an approximation to the original training set because I have projected all of my training examples onto a line. Now, I need keep around only one number for each of my examples. So this halves the memory requirement or the disk space requirement for how to store the data. And perhaps more interestingly, more importantly, this will allow our learning algorithms run more quickly as well. And that is perhaps even the more interesting application of this data compression rather than reducing the memory or disk space requirement.

On the previous slide we showed an example of reducing data from 2D to 1 D. On this slide, I'm going to show another example of reducing data from three dimensional 3D to two dimensional 2D. By the way, in the more typical example of dimensionality reduction, we might have 1,000 dimensional data or 1000D data that we might want to reduce to let's say 100 dimensional or 100D. But because of the limitations of what I can plot on the slide, I'm going to use examples of 3D to 2D or 2D to 1D. So, let's have a data set like that shown here. And so, I would have a set of examples $x^{(i)}\in \mathbb{R}^{3}$ . It might be hard to see here, but maybe all of this data lie roughly on the plane, like so. So what we can do with dimensionality reduction is take all of these data and project all the data down onto a two dimensional plane. So what I have done is, I've taken all the data projected all of the data so that they all lie on the plane. Finally, in order to specify the location of a point within the plane, we need to numbers, right? We need to specify the location of a point along this axis, and also specify its location along that axis. So, we need two numbers maybe called $z_{1}$ and $z_{2}$ to specify the location of a point within a plane. What I mean is that we can now represent each example using two numbers that I've drawn here $z_{1}$ and $z_{2}$ . So our data can be represented using vector which are in $\mathbb{R}^{2}$ ( $z\in \mathbb{R}^{2}$ ). Or $z=\begin{bmatrix} z_{1}\\ z_{2} \end{bmatrix}$ . Generally we have $z^{(i)}=\begin{bmatrix} z^{(i)}_{1}\\ z^{(i)}_{2} \end{bmatrix}$ .

Now let me just make sure these figures make sense. Let me just re-show these exact three figures again but with 3D plots. So the process we went through was that shown on the left is the original data set, in the middle the data set projects on the 2D, and on the right the 2D dataset with $z_{1}$ and $z_{2}$ as the axis. Let's look at them a little bit further.

Here's my original data set shown on the left. And so I had started off with a 3D point cloud like so. For the axis are labeled $x_{1}, x_{2}, x_{3}$ . But most of the data maybe roughly lie on not too far from some 2D plane. So what we can do is take this data and I'm going to project it onto 2D. So, I've projected this data so that all of it now lies on this 2D surface. As you can see all the data lies on a plane. And so what this means is that now I need only two numbers $z_{1}$ and $z_{2}$ to represent the location on the plane.

And so that's the process that we can go through to reduce our data from three dimensional to two dimensional.

So that's dimensionality reduction and how we can use it to compress our data. And as we'll see later this will allow us to make some of our learning algorithms to run much faster as well.

<end>