吴恩达·Machine Learning || chap2 Linear regression with one variable 简记

2 Linear regression with one variable

2-1 Model representation

Training set

m = Number of training examples

x’s = “input” variable/features

y’s = “output” variable/“target” variable

(x,y) = one training example

Hypothesis 假设函数
h θ ( x ) = θ 0 + θ 1 x h_\theta (x)=\theta_0+\theta_1 x hθ(x)=θ0+θ1x

2-2 Cost Function

Goal: minimize
1 2 m ∑ 1 m ( h θ ( x ( i ) − y ( i ) ) 2 \frac{1}{2m}\sum_{1}^{m}(h_\theta(x^{(i)}-y^{(i)})^2 2m11m(hθ(x(i)y(i))2

除以m是使得误差平均到每个样本,除以2是一个微积分技巧,用于消除计算偏导数时出现的2。​​

Cost Function(Square error function )
J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) 2 J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta(x^{(i)}-y^{(i)})^2 J(θ0,θ1)=2m1i=1m(hθ(x(i)y(i))2

2-3 Cost Function Intuition I

Simplified

θ 0 = 0 \theta_0=0 θ0=0

2-4 Cost Function Intuition II

Contour plot/figure 等高线图

2-5 Gradient descent

Outline

  • Start with some θ 0 , θ 1 \theta_0,\theta_1 θ0,θ1

  • Keep changing θ 0 , θ 1 \theta_0,\theta_1 θ0,θ1 to reduce J ( θ 0 , θ 1 ) J(\theta_0,\theta_1) J(θ0,θ1)

    until we hopefully end up with a minimum

Gradient descent algorithm

repeat until convergence{

θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) \theta_j := \theta_j - \alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1) θj:=θjαθjJ(θ0,θ1) (for j=0 and j=1)

}

α \alpha α learning rate

Correct :Simultaneous update

t e m p 0 : \textcolor{blue}{temp0:} temp0 θ 0 : = θ 0 − α ∂ ∂ θ 0 J ( θ 0 , θ 1 ) \theta_0 := \theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta_0,\theta_1) θ0:=θ0αθ0J(θ0,θ1)​​

t e m p 1 : \textcolor{blue}{temp1:} temp1 θ 1 : = θ 1 − α ∂ ∂ θ 1 J ( θ 0 , θ 1 ) \theta_1 := \theta_1-\alpha\frac{\partial}{\partial\theta_1}J(\theta_0,\theta_1) θ1:=θ1αθ1J(θ0,θ1)

θ 0 : \theta_0: θ0:= t e m p 0 \textcolor{blue}{temp0} temp0

θ 1 \theta_1 θ1 := t e m p 1 \textcolor{blue}{temp1} temp1

2-6 Gradient descent Intuition

Learning rate α \alpha α

if α \alpha α is too small,gradient descent can be slow

if α \alpha α is too large,gradient descent can overshoot the minimum .It may fail to converge,or even diverge

learning rate α \alpha α​​ fixed

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α \alpha α over time

2-7 Gradient descent for linear regression

Gradient decent algorithm

repeat until convergence{
{ θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) θ 1 : = θ 1 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) − y ( i ) ) . x ( i ) \begin{cases}\theta_0 := \theta_0-\alpha\frac{1}{m}\sum_{i=1}^{m}{(h_\theta(x^{(i)}-y^{(i)})} \\\theta_1 := \theta_1-\alpha\frac{1}{m}\sum_{i=1}^{m}{(h_\theta(x^{(i)}-y^{(i)})}.x^{(i)} \end{cases} {θ0:=θ0αm1i=1m(hθ(x(i)y(i))θ1:=θ1αm1i=1m(hθ(x(i)y(i)).x(i)
}

update θ 0 \theta_0 θ0 and θ 1 \theta_1 θ1​ simultaneously

convex function 凸函数

”Batch“ Gradient Descent

Batch ": Each step of gradient descent uses all the training examples

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值