第二章.Regression -- 01.Introduction to Regression翻译

最新推荐文章于 2024-01-07 02:48:27 发布

Stella__Lee

最新推荐文章于 2024-01-07 02:48:27 发布

阅读量297

点赞数

分类专栏： Artificial Intelligence

Artificial Intelligence 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

So let’s start our unit on regression. First I’ll just do a recap and talk about simple
linear regression, which is just one feature, multiple linear regression (which is many
features), how to evaluate regression models, and we’ll do a case study. And then we’ll
talk about outliers and the concept of leverage. Let’s start with a recap. Regression is
for predicting real-valued outcome – things like “how many customers will arrive at
our website next week?”, “how many tv’s will we sell next year?”, and “can we
predict someone’s income from their click through information?”. So here’s our training
data, the features are here and the labels, which are real-valued (in this case, it’s
income), and then we also have the predictions, f(x), over here on the right. Now just looking
at these two columns, we have to figure out how to evaluate the closeness of the truth,
y, to what we predicted, which is f(x). Now here’s what I propose, which is y-f(x),
and as we know, that is not a great idea. Because why didn’t I choose that one? If
I choose either this one or that one it would be bad, because I’m looking at errors in
only one direction and I want to penalize errors in either direction. So I could try
to use this penalty here, which makes sure we count deviations in either direction. But
that’s not actually what we’re going to use. We’re going to use the squared error.
It’s just easier computationally and analytically, because, you know, it’s differentiable.
But you should think of this as capturing errors in both directions, just like the absolute
value I had on the previous slide. So same deal; penalize how far y is from f in either
direction. So here’s a picture of it. So here we’d want to use f(x)-y and then here
is a case where we’d want to use y-f(x), and so we could just use the absolute value
to capture both of those things, but we’re not going to, we’re actually going to square
it – get the squared error. Now the sum of squares error, this is, you know, if we
add all these errors up, this is called the sum of squares error. And we’ll get back
to that in a minute, and that is right here. But I want you to remember that this is a
fundamental quality – quantity – in regression. And what I want to do now is talk about what
the model, f, might look like. We’re going to choose f so that it’s a good model, meaning
that it minimizes this sum of squares error. So let’s talk about simple linear regression.
In simple linear regression, we have only one feature. So maybe we’re predicting the
income based on a single feature, which is maybe the number of business week clicks the
person makes. So now we have to figure out what our function f is going to look like.
So here I’ve put on a very simple function f. It just gives everyone a baseline of $100000
just for existing. And it estimates that for each click they make on the business week
website, they’re $5000 richer. So this is kind of a silly model, since it predicts that
anyone who spends all of their time on business week is a gazillionaire but hey, it’s just
an example. But for our function that estimates y from x, we’re going to choose a model
of this form. A baseline plus the multiplier for however many business week clicks we have
(called b1) times the number of clicks, which is x1. So there’s the formula again. Before
we start doing this, all we have is data. We don’t know this $100000 and we don’t
know the $5000. I’ve just made those up. We have to estimate them by using data. Now
remember, we want the sum of squares error to be small. So what we’re going to do is
choose the b0 and the b1 to minimize the total error on the training set, and that is the
procedure of simple linear regression – least squares regression. So let’s pretend I did
this, and as it turned out, actually the model I had before wasn’t so good. When I fit
it to the data, I got this model instead. And this new model fits pretty well on the
data that I have in my training set, but how well does it perform out of sample? I didn’t
tell you, but I actually left part of the data out for evaluation, and there it is.
So let’s take a look at the errors, and they’re here – not too bad, looks like
we did a pretty good job. So that is the procedure of simple linear regression – least squares
regression for a single feature. You don’t need to solve the minimization problem yourself
to find b0 and b1; the machine learning algorithm will do it for you, it’s all under the hood.

让我们从回归开始。首先，我来简单介绍一下。

线性回归，这只是一个特征，多重线性回归(很多。

特征)，如何评价回归模型，我们将做一个案例研究。然后我们会

讨论离群值和杠杆的概念。让我们先回顾一下。回归

对于预测实际价值的结果，比如“有多少客户会到达”。

下周我们的网站吗?“我们明年要卖多少台电视?””、“我们可以

通过点击信息预测某人的收入?这是我们的培训

数据、特征和标签都是实值的(在本例中是这样的)。

收入，然后我们也有预测，f(x)在右边。现在只是看看

在这两列中，我们需要计算出如何评估真理的接近程度，

根据我们的预测，f(x)这是我的建议，即y-f(x)

我们知道，这不是一个好主意。为什么我没有选择那个?如果