Univaribe Linear Regression (单变量线性回归)

最新推荐文章于 2020-10-17 16:10:34 发布

NoahNash

最新推荐文章于 2020-10-17 16:10:34 发布

阅读量286

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/oXunWuQiShi1/article/details/83027508

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Let us use some motivating example of predicting housing prices, we are going to use a data set of housing prices, and here i'm gonna plot my data set of housing prices that were different sizes that were sold for a range of different prices.

Given that data set if the house's size was 1250 square feet, well, one thing you could do is fit a model, maybe fit a straight line to this data and the prices of the house for around 22000.

this is a example of a supervised learning algorithm, because given the "right answer" for each example in the data, regression problem too.

what defined by the train set is ?

so here is how this supervised learning algorithm works, we saw that with the train set like our training set of houses prices, and we feed that to our learning algorithm, is the job of a learning algorithm to then output a function , which by convention is usually denoted lowerase h and h stands for hypothesis. and what the job of the hopyothesis is, a function that takes as input the size of a house, and you wanted to output for the corresponding input .

$Hypothesis: h\partial (x) = \partial 0 + \partial 1x, parameter : \partial 0, \partial 1x.$

now the question is how to choose a0, a1.

The ide is to choose a0, a1 so that h(x) is close to y for our training example (x,y)

Given the x's in the training set, we make resonably accurate predictions for the y values , let 's formalize this, so linear regression, which we are going to do is that i'm going to want to solve a minimization problem, so i'm going to write minimize over theta one .

$J(\partial 0,\partial 1) = 1/m \sum_{i=1}^{m}(h\partial (x^{i}) - y^{i})$

$h\partial(x^{i}) = \partial 0 + \partial x^{i}$

$Minimize \ J (\partial 0, \partial 1)$ (1)

this functon is called cost function (Squared Error Function).

Example : i'm going to work with a simplified hypothesis function, which is just theta one times x.

so what if theta one is equal to 0.5 ?

this is theta one is equal to 0.

after a range of computation .

(the conclusion is when theta one equal to 1,that is indeed the best possble straight line)

so for each value of theta one, we wound up with a different value of J of theta one, and we can then use this to trace out this plot in about picture, the optimization objective for our learning algorithm is we want to choose the value of theta one, that minimizes J of the theta one, thise was our objective function for the linear regression.

Gradient Descent (梯度下降算法)

$purpose: Have some function J(\partial 0, \partial 1)$
$want : minJ(\partial 0, \partial 1).$

Outline

start with some theta one and theta zero.
keep changing theta one and theta zero to reduce $J(\partial 0, \partial 1)$ .

until we hopefully end up ai minimum.

so here is the problem setup and we want to come up with an algorithm for minimizing that as a function of J .

here is the idea for gradient descent, what we are going to do is going to start off with some initial guesses for theta zero and theta one . but a common choice would be we set theta one and theta zero as zero, we will keep changing theta one and theta zero ,a little bit to try to reduce J, until we wind up at a minimum.

The definitiong of the Gradieng Descent Algorithm

correct: Simultaneous update(theta one and theta zero)

":="为C语言中的赋值符号.

'a'这个a尔法符号 is called the learning rate , so if 'a' is very large , then that corresponds to a very aggressive gradient descent procedure.(奥尔发值越大，梯度下降速度越快；越小则相反)

In order to convey these intuitions(解释背后概念) , I want to do is use a slightly simpler example where we want to minimize the fnction of just one parameter the theta one is a real number, so we can have ONE D plots.

the top picture has a positive slope, so it has a positive derivative.

and the picture bottom has a positive derivative ,and the end will gradually closer to the minimum.

let us see what happen about 'a' 's variety.

what if your parameter theat onr is already at a local minimum? (局部最优点)

what do you think one step of gradient descent will do ?

the result is the function stop changing becasue the slop is equal to zero.

Gradient Descent can converge to a local minimum, even with the learning rate 'a' fixex.

As we approach a local minimum, gradient descent wil automatically take smaller steps , so no need to decrease a over time.

over .

thank you for reading.

NoahNash

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Univaribe Linear Regression (单变量线性回归)

Let us use some motivating example of predicting housing prices, we are going to use a data set of housing prices, and here i'm gonna plot my data set of housing prices that were different sizes...
复制链接

扫一扫