吴恩达机器学习笔记二(Machine learning algorithms--Linear regression)

Linear regression

For example:

We’re going to use a data set of housing prices from the city of Portland, Oregon. And I’m gonna plot my data set of a number of houses that were different sizes that were sold for a range of different prices. Based on the data set, you have a friend, who are trying to sell a house. If the friend’s house is size of 1250 square feet, And you want to tell your friends how much they might be able to sell the house for. This is a regression problem.

More formally, in supervised learning, we have a data set and this data set called a training set(训练集),So for housing prices example, we have a training set of different housing prices, and our job is to learn from this data how to predict prices of the house.

Let’s define some notation that we’re using throughout this course. We’re going to define quite a lot of symbols. It’s ok if you don’t remember all the symbols right now, but as the course progresses, it will be useful to have a convenient notation.

1. Notation:

m = Number of training examples 样本

x = input variable/features. 特征

y = output variable/target variable 标签

(x,y) one training example 一个训练样本

( x ( i ) , y ( i ) ) (x^{(i)},y^{(i)}) (x(i),y(i)) i t h i_{th} ith training example 第i个训练样本

for example:

x ( 1 ) x^{(1)} x(1) =2104 x ( 2 ) x^{(2)} x(2) = 1416 y ( 1 ) y^{(1)} y(1) = 460

2. How this supervised learning algorithm works:

在这里插入图片描述

We saw that with the training set like our training set of housing prices, and we feed that to our learning algorithm. Is the job of a learning algorithm to then output a function, which by convention is usually denoted lowercase h, and h stands for hypothesis.(假设函数). And what the job of the hypothesis is, a function that takes as input the size of a house, like maybe the size of the new house your friend trying to sell, so it takes in the value of x, and it tries to output the estimated value of y for the corresponding house. So h is a function that maps from x’s to y’s. When designing a learning algorithm, the next thing we need to decide is how do we represent this hypothesis h.

We are going to represent h as follows:
h θ ( x ) = θ 0 + θ 1 ∗ x h_\theta(x) = \theta_0 + \theta_1*x hθ(x)=θ0+θ1x
And why a linear function? sometimes we’ll want to fit more complicated perhaps non-linear function as well. But since this linear case is the simple building block, we will start with this example first of fitting linear function, and we will build on this to eventually have more complex models, and more complex learning algorithms. Let us also give this particular model a name. This model is called linear regression, or this for example, id actually linear regression with one variable, with the variable being x. That’s the predicting all the prices as function of one variable x. And another name for this model is univariate linear regression.

3. how we go about implementing this model.

In this section we’ll define something called the cost function. This will let us figure out how to fit the best possible straight line to our data. In linear regression we have a training set like that shown here. Remember our notation M was the number of training examples. So maybe M=47. And the form of the hypothesis, which we use to make predictions, is this linear function. To introduce a little bit more terminology, these theta zero and theta one, these theta i’s are what we call the parameters of the model. What we’re going to do is talk about how to go about choosing these two parameters values, theta zero and theta one. With different choices of parameters theta zero and theta one we get different hypotheses, different hypothesis functions.

在这里插入图片描述

different hypothesis function: ( h(x) is the short of h theta(x) )

在这里插入图片描述

In linear regression we have a training set, like maybe the one I’ve plotted below here, What we want to do is come up with values for the parameters theta zero and theta one. So that the straight line we get out of this corresponds to a straight line that somehow fits the data well. Like maybe that line over here. So how do we come up with values theta zero, theta one that corresponds to a good fit to the data? The idea is we’re going to choose our parameters theta zero, theta one so that h(x), meaning the value we predict on input x, that this at least close to the values y for the examples in our training set, for our training examples. So in our training set we’re given a number of examples where we know x decides the house and we know the actual price of what it’s sold for, So let’s try to choose values for the parameters so that at least in the training set, given the x’s in the training set, we make reasonably accurate predictions for the y values. Let’s formalize this. So linear regression, what we’re going to do is that I’m going to want to solve a minimization problem. So I’m going to write minimize over theta zero, theta one. And, I want this( ( h ( x ) − y ) 2 (h(x)-y)^{2} (h(x)y)2) to be small.

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SgWe9xl2-1641473374580)(C:\Users\Censh77\AppData\Roaming\Typora\typora-user-images\image-20220106201918468.png)]

So what I want really is to sum over my training set. Sum from i equals 1 to M of the square difference between this( ( h ( x ) − y ) 2 (h(x)-y)^{2} (h(x)y)2) is the prediction of my hypothesis when it is input the size of house number i, minus the actual price that house number i will sell for and I want to minimize the sum of my training set sum from i equals 1 through M of the difference of the squared error, square difference between the predicted price of the house and the price that it will actually sell for. the cost function is :
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h ( x i ) − y i ) ) 2 J(\theta_0,\theta_1)= \frac{1}{2m}\sum_{i=1}^{m}{(h(x^{i})-y^{i}))^2} J(θ0,θ1)=2m1i=1m(h(xi)yi))2

m i n i m i z e J ( θ 0 , θ 1 ) minimize\quad J(\theta_0,\theta_1) minimizeJ(θ0,θ1)

i n a b l v e h ( x ( i ) ) = θ 0 + θ 1 x ( i ) in\quad ablve\quad h(x^{(i)}) = \theta_0 + \theta_1x^{(i)} inablveh(x(i))=θ0+θ1x(i)
This means that we should find the values of theta zero and theta one that causes this expression to be minimized. And this cost function is also called the squared error function or sometimes called the squared error cost function and it turns out that why, why do we take the squared of the errors, It turns out that the squared error cost function is a reasonable choice and will work well for most problems, for most problems, There are other cost functions will work pretty well, but the squared error cost function is probably the most commonly used one for regression problems.

(对所有训练样本进行一个求和,对i=1到i=M的样本,将第i号对用的预测结果减去第i号房子的实际价格,所得的差的平方相减得到总和,代价函数也被称作平方误差函数,有时也被称作平方误差代价函数)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值