机器学习学习笔记——1.1.2.1.1 Multiple features（多特征线性回归）

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/142334484

Welcome back. In this week, we'll learn to make linear regression much faster and much more powerful, and by the end of this week, you'll be two thirds of the way to finishing this first course. Let's start by looking at the version of linear regression that look at not just one feature, but a lot of different features. Let's take a look. In the original version of linear regression, you had a single feature x, the size of the house and you're able to predict y, the price of the house. The model was fwb of x equals wx plus b. But now, what if you did not only have the size of the house as a feature with which to try to predict the price, but if you also knew the number of bedrooms, the number of floors and the age of the home in years. It seems like this would give you a lot more information with which to predict the price.

To introduce a little bit of new notation, we're going to use the variables X_1, X_2, X_3 and X_4, to denote the four features. For simplicity, let's introduce a little bit more notation. We'll write X subscript j or sometimes I'll just say for short, X sub j, to represent the list of features. Here, j will go from one to four, because we have four features. I'm going to use lowercase n to denote the total number of features, so in this example, n is equal to 4. As before, we'll use X superscript i to denote the ith training example. Here X superscript i is actually going to be a list of four numbers, or sometimes we'll call this a vector that includes all the features of the ith training example.

As a concrete example, X superscript in parentheses 2, will be a vector of the features for the second training example, so it will equal to this 1416, 3, 2 and 40 and technically, I'm writing these numbers in a row, so sometimes this is called a row vector rather than a column vector. But if you don't know what the difference is, don't worry about it, it's not that important for this purpose. To refer to a specific feature in the ith training example, I will write X superscript i, subscript j, so for example, X superscript 2 subscript 3 will be the value of the third feature, that is the number of floors in the second training example and so that's going to be equal to 2. Sometimes in order to emphasize that this X^2 is not a number but is actually a list of numbers that is a vector, we'll draw an arrow on top of that just to visually show that is a vector and over here as well, but you don't have to draw this arrow in your notation. You can think of the arrow as an optional signifier. They're sometimes used just to emphasize that this is a vector and not a number.

Now that we have multiple features, let's take a look at what a model would look like. Previously, this is how we defined the model, where X was a single feature, so a single number. But now with multiple features, we're going to define it differently. Instead, the model will be, fwb of X equals w1x1 plus w2x2 plus w3x3 plus w4x4 plus b. Concretely for housing price prediction, one possible model may be that we estimate the price of the house as 0.1 times X_1, the size of the house, plus four times X_2, the number of bedrooms, plus ten times X_3, the number of floors, minus 2 times X_4, the age of the house in years plus 80. Let's think a bit about how you might interpret these parameters. If the model is trying to predict the price of the house in thousands of dollars, you can think of this b equals 80 as saying that the base price of a house starts off at maybe $80,000, assuming it has no size, no bedrooms, no floor and no age. You can think of this 0.1 as saying that maybe for every additional square foot, the price will increase by 0.1 $1,000 or by $100, because we're saying that for each square foot, the price increases by 0.1, times $1,000, which is $100. Maybe for each additional bathroom, the price increases by $4,000 and for each additional floor the price may increase by $10,000 and for each additional year of the house's age, the price may decrease by $2,000, because the parameter is negative 2.

In general, if you have n features, then the model will look like this. Here again is the definition of the model with n features. What we're going to do next is introduce a little bit of notation to rewrite this expression in a simpler but equivalent way. Let's define W as a list of numbers that list the parameters W_1, W_2, W_3, all the way through W_n. In mathematics, this is called a vector and sometimes to designate that this is a vector, which just means a list of numbers, I'm going to draw a little arrow on top. You don't always have to draw this arrow and you can do so or not in your own notation, so you can think of this little arrow as just an optional signifier to remind us that this is a vector. If you've taken the linear algebra class before, you might recognize that this is a row vector as opposed to a column vector. But if you don't know what those terms means, you don't need to worry about it.

Next, same as before, b is a single number and not a vector and so this vector W together with this number b are the parameters of the model. Let me also write X as a list or a vector, again a row vector that lists all of the features X_1, X_2, X_3 up to X_n, this is again a vector, so I'm going to add a little arrow up on top to signify. In the notation up on top, we can also add little arrows here and here to signify that that W and that X are actually these lists of numbers, that they're actually these vectors. With this notation, the model can now be rewritten more succinctly as f of x equals, the vector w dot and this dot refers to a dot product from linear algebra of X the vector, plus the number b. What is this dot product thing? Well, the dot products of two vectors of two lists of numbers W and X, is computed by checking the corresponding pairs of numbers, W_1 and X_1 multiplying that, W_2 X_2 multiplying that, W_3 X_3 multiplying that, all the way up to W_n and X_n multiplying that and then summing up all of these products.

Writing that out, this means that the dot products is equal to W_1X_1 plus W_2X_2 plus W_3X_3 plus all the way up to W_nX_n. Then finally we add back in the b on top. You notice that this gives us exactly the same expression as we had on top. The dot traffic notation lets you write the model in a more compact form with fewer characters. The name for this type of linear regression model with multiple input features is multiple linear regression. This is in contrast to univariate regression, which has just one feature. By the way, you might think this algorithm is called multivariate regression, but that term actually refers to something else that we won't be using here. I'm going to refer to this model as multiple linear regression.

That's it for linear regression with multiple features, which is also called multiple linear regression. In order to implement this, there's a really neat trick called vectorization, which will make it much simpler to implement this and many other learning algorithms. Let's go on to the next video to take a look at what is vectorization.

欢迎回来。在本周，我们将学习如何使线性回归变得更快、更强大，并且到本周末，你将完成这门课程的三分之二。让我们从不仅查看一个特征，而是查看许多不同特征的线性回归版本开始。我们来看看。在原始版本的线性回归中，你有一个单独的特征x，即房屋的大小，并能够预测y，即房屋的价格。模型是fwb of x等于wx加上b。但是现在，如果你不仅仅有房屋的大小作为特征来尝试预测价格，而且你还知道卧室的数量、楼层的数量和房屋的年龄（以年为单位）。看起来这将给你提供更多的信息来预测价格。

为了引入一些新的符号，我们将使用变量X_1, X_2, X_3和X_4来表示四个特征。为了简化，我们再引入一点符号。我们将用X下标j或有时我简称为X下j来代表特征列表。在这里，j将从一到四，因为我们有四个特征。我将用小写n来表示总特征数，所以在这个例子中，n等于4。像之前一样，我们将用X上标i来表示第i个训练样本。这里的X上标i实际上将是一个包含所有特征的数字列表，有时我们会称之为向量。

作为一个具体的例子，X上标括号2将是第二个训练样本的特征向量，所以它将等于1416, 3, 2和40。严格来说，我是按行写下这些数字的，所以有时这被称为行向量而不是列向量。但如果你不明白这两者的区别，不用担心，对我们这里的目的来说并不重要。要引用第i个训练样本中的特定特征，我将写成X上标i下标j，例如，X上标2下标3将是第二个训练样本中第三个特征的值，即楼层数，因此将等于2。有时为了强调这个X^2不是数字而实际上是一个数字列表，即向量，我们会在上面画一个箭头来视觉上显示这是一个向量，这里也是如此，但你不必在你的符号中画出这个箭头。你可以将箭头视为一个可选的标志符。它们有时被用来强调这是一个向量而不是一个数字。

现在我们有了多个特征，让我们看看模型会是什么样子。以前，我们是这样定义模型的，其中X是一个单一特征，因此是一个单一的数字。但现在有了多个特征，我们将不同地定义它。相反，模型将是，fwb of X等于w1x1加上w2x2加上w3x3加上w4x4加上b。具体来说对于房价预测，一个可能的模型可能是我们估计房屋价格为0.1倍的X_1，即房屋的大小，加上4倍的X_2，即卧室的数量，加上10倍的X_3，即楼层的数量，减去2倍的X_4，即房屋的年龄（以年计）再加上80。让我们稍微考虑一下你可能如何解释这些参数。如果模型试图以千美元为单位预测房屋价格，你可以认为这个b等于80意味着房屋的基本价格可能从80,000美元开始，假设它没有大小、没有卧室、没有楼层和没有年龄。你可以认为这个0.1意味着也许每增加一平方英尺，价格就会增加0.1 * 1,000或100美元，因为我们说每增加一平方英尺，价格就增加0.1，乘以1,000美元，就是100美元。也许每增加一间浴室，价格就会增加4,000美元，每增加一层楼层，价格可能会增加10,000美元，每增加一年房龄，价格可能会减少2,000美元，因为这个参数是负的2。

总的来说，如果你有n个特征，那么模型将如下所示。这里再次定义了具有n个特征的模型。我们接下来要做的是引入一点符号来以更简单但等效的方式重写这个表达式。让我们定义W为一个数字列表，列出参数W_1, W_2, W_3，一直到W_n。在数学中，这被称为向量，有时为了表示这是一个向量，这意味着一个数字列表，我会在上面画一个小箭头。你不总是必须画这个箭头，你可以自己决定是否在你的符号中使用它，所以你可以把这个小箭头视为只是一个可选的标志符，用来提醒我们这是一个向量。如果你之前上过线性代数课，你可能会发现这是一个行向量而不是列向量。但如果你不知道这些术语的含义，你不需要担心。

接下来，和以前一样，b是一个单一的数字而不是向量，因此这个向量W连同这个数字b是模型的参数。让我也把X写成一个列表或向量，同样是一个行向量，列出所有特征X_1, X_2, X_3到X_n，这又是一个向量，所以我会在上面加一个小箭头来表示。在上面的符号中，我们也可以在此处和此处加上小箭头，表示那个W和那个X实际上是这些数字列表，它们实际上是这些向量。有了这个符号，模型现在可以更简洁地重写为f of x等于，向量w点乘这个点指的是来自线性代数中的点积，向量X加上数字b。这个点积是什么呢？嗯，两个向量或两个数字列表W和X的点积，是通过检查相应的数字对，W_1和X_1相乘，W_2 X_2相乘，W_3 X_3相乘，一直到最后W_n和X_n相乘，然后所有这些乘积求和来计算的。

写出这个意味着，点积等于W_1X_1加上W_2X_2加上W_3X_3一直到最后W_nX_n。然后最后再加上b。你会注意到，这给了我们与上面完全相同的表达式。点积符号让你可以用更少的字符以更紧凑的形式写出模型。这种具有多个输入特征的线性回归模型的名称是多元线性回归。这与只有一个特征的单变量回归形成对比。顺便说一下，你可能认为这个算法被称为多变量回归，但实际上这个词指的是我们这里不会使用的另一种东西。我将这个模型称为多元线性回归。

这就是具有多个特征的线性回归的全部内容，也称为多元线性回归。为了实现这一点，有一个非常巧妙的技巧叫做向量化，这将使实现这个算法和许多其他学习算法变得更加简单。让我们继续下一个视频，看看什么是向量化。