Linear regression

最新推荐文章于 2024-02-29 16:48:50 发布

crella___

最新推荐文章于 2024-02-29 16:48:50 发布

阅读量260

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/crella___/article/details/60322390

版权

机器学习专栏收录该内容

26 篇文章 0 订阅

订阅专栏

m=Number of training examples
h(hypothesis)=output function
这里写图片描述
Linear regression with one variable. Univariate linear regression.

Idea: Choose $\theta_0,\theta_1$ so that $h_\theta(x)$ is close to y for our training examples(x,y)

Cost Function:
squared error function

minimize J( $\theta_0,\theta_1$ )——overall objective for linear regression

Gradient Descent
outline:

start with some $\theta_0,\theta_1$ (initialize them to 0)
(the result depending on where to start on the graph,however J( $\theta_0,\theta_1$ ) is a convex function)

repeat until convergence{
$\theta_j:=\theta_j-\alpha\frac{\delta}{\delta\theta_j}J(\theta_0,\theta_1)\text{(for j=0 and j=1)}$
}
$\alpha\frac{\delta}{\delta\theta_j}J(\theta_0,\theta_1)$ is the derivative term
$\alpha$ is learning rate
*simultaneous update
$temp_0:=...$
$temp_1:=...$
$\theta_0=temp_0$
$\theta_1=temp_1$
if you do not do so,maybe it also works well, but we don’t call it GD algorithm.

compress:
这里写图片描述

这里写图片描述

if $\alpha$ is too small ,gradient descent can be slow.
if $\alpha$ is too large,gradient decent can overshoot the minimum.It may fail to converge, or even diverge.
GD can converge to a local minimum,even with the learning rate $\alpha$ fixed, because as we approach a local minimum, GD will automatically take smaller steps.

Linear Regression with Multiple Features
$X_j^{(i)}$ is the value of feature j in $i^{th}$ training example
$h_\theta(x)=\theta_0x_0+\theta_1x_1+...\theta_nx_n=\theta^TX$
(both with n+1 elements)
(define $x_0=1$ )
这里写图片描述

feature scaling
if different features takes on similar ranges of values,FD will
converge more quickly.
It speeds up gradient descent by making it require fewer iterations to get to a good solution.
maybe
-3 to 3
- $\frac{1}{3}$ to $\frac{1}{3}$
is well
else $x_1$ = $\frac{...}{10000}$
get every feature into approximately a $-1\le x_i \le 1$
range or standard deviation

mean normalization
makes features have approximately 0

to implement both measures above:
x = (value - average_value)/(max_value - min_value)

feature scaling doesn’t have to be too exact

converge judge
这里写图片描述
There is a example convergence test,but the threshold is not easy to find. So we had better plot the function.

crella___

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Linear regression

m=Number of training examples h(hypothesis)=output function Linear regression with one variable. Univariate linear regression.Idea: Choose θ0,θ1\theta_0,\theta_1 so that hθ(x)h_\theta(x)is close to
复制链接

扫一扫

专栏目录