[Note] Machine Learning——Chapter 2

2-1 Model representation

Linear regression with one variable

2-2 Cost function

Also called squared error function.

  • Hypothesis:

h θ ( x ) = θ 0 + θ 1 x h_θ(x) = θ_0 + θ_1x hθ(x)=θ0+θ1x

  • parameters: θ 0 , θ 1 θ_0, θ_1 θ0,θ1

  • Cost function:

J ( θ 0 , θ 1 ) = 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 J(θ_0, θ_1) = \frac{1}{2m} \sum_{i = 1}^{m} {\left( h_θ(x^{(i)}) - y^{(i)} \right)^2} J(θ0,θ1)=2m1i=1m(hθ(x(i))y(i))2

  • Goal:

m i n i m i z e J ( θ 0 , θ 1 ) minimize J(θ_0, \theta_1) minimizeJ(θ0,θ1)

2-3 Gradient descent

Have some function: J ( θ 0 , θ 1 ) J(\theta_0, \theta_1) J(θ0,θ1)

Want: m i n J ( θ 0 , θ 1 ) min J(\theta_0, \theta_1) minJ(θ0,θ1)

Outlines:

  • Start with some θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1
  • Keep changing θ 0 , θ 1 \theta_0, \theta_1 θ0,θ1 to reduce J ( θ 0 , θ 1 ) J(\theta_0, \theta_1) J(θ0,θ1) until we hopefully end up at a minimum or maybe a local minimum

Gradient descent algorithm

repeat until convergence
θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) ( f o r   j = 0   a n d   j = 1 ) \theta_j := \theta_j - \alpha\frac{∂}{∂\theta_j}J(\theta_0, \theta_1) \quad (for\, j = 0 \, and \, j = 1) θj:=θjαθjJ(θ0,θ1)(forj=0andj=1)

  • “:=” means assignment, it is different from the truth assertion “=”.
  • “α” means learning rate. And what alpha does is, it basically controls how big a step we take downhill with gradient descent. So if alpha is very large, then that corresponds to a very aggressive gradient descent procedure.

Correct: Simultaneous update

  • t e m p 0 : = θ 0 − α ∂ ∂ θ 0 J ( θ 0 , θ 1 ) temp0 := \theta_0 - \alpha\frac{∂}{∂\theta_0}J(\theta_0, \theta_1) temp0:=θ0αθ0J(θ0,θ1)

  • t e m p 1 : = θ 1 − α ∂ ∂ θ 1 J ( θ 0 , θ 1 ) temp1 := \theta_1 - \alpha\frac{∂}{∂\theta_1}J(\theta_0, \theta_1) temp1:=θ1αθ1J(θ0,θ1)

  • θ 0 : = t e m p 0 \theta_0 := temp0 θ0:=temp0

  • θ 1 : = t e m p 1 \theta_1 := temp1 θ1:=temp1

Incorrect:

  • t e m p 0 : = θ 0 − α ∂ ∂ θ 0 J ( θ 0 , θ 1 ) temp0 := \theta_0 - \alpha\frac{∂}{∂\theta_0}J(\theta_0, \theta_1) temp0:=θ0αθ0J(θ0,θ1)
  • θ 0 : = t e m p 0 \theta_0 := temp0 θ0:=temp0
  • t e m p 1 : = θ 1 − α ∂ ∂ θ 1 J ( θ 0 , θ 1 ) temp1 := \theta_1 - \alpha\frac{∂}{∂\theta_1}J(\theta_0, \theta_1) temp1:=θ1αθ1J(θ0,θ1)
  • θ 1 : = t e m p 1 \theta_1 := temp1 θ1:=temp1

  • if α is too small, gradient descent can be slow.
  • if α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

Gradient descent can converge to a local minimum, even with the learning rage α fixed.

As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

Gradient descent for linear regression

Gradient descent algorithm

repeat until convergence
1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) = ∂ ∂ θ 0 J ( θ 0 , θ 1 ) \frac{1}{m} \sum_{i = 1}^{m} {\left( h_θ(x^{(i)}) - y^{(i)} \right)} = \frac{∂}{∂\theta_0}J(\theta_0, \theta_1) m1i=1m(hθ(x(i))y(i))=θ0J(θ0,θ1)

1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) = ∂ ∂ θ 1 J ( θ 0 , θ 1 ) \frac{1}{m} \sum_{i = 1}^{m} {\left( h_θ(x^{(i)}) - y^{(i)} \right)·x^{(i)}} = \frac{∂}{∂\theta_1}J(\theta_0, \theta_1) m1i=1m(hθ(x(i))y(i))x(i)=θ1J(θ0,θ1)

θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) \theta_0 := \theta_0 - \alpha\frac{1}{m} \sum_{i = 1}^{m} {\left( h_θ(x^{(i)}) - y^{(i)} \right)} θ0:=θ0αm1i=1m(hθ(x(i))y(i))

θ 1 : = θ 1 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) \theta_1 := \theta_1 - \alpha\frac{1}{m} \sum_{i = 1}^{m} {\left( h_θ(x^{(i)}) - y^{(i)} \right)·x^{(i)}} θ1:=θ1αm1i=1m(hθ(x(i))y(i))x(i)

Does gradient descent on this type of cost function which you get whenever you’re using linear regression, it will always convert to the global optimum.

“Batch” Gradient Descent

“Batch”: Each step of gradient descent uses all the training examples. So, in gradient descent, when computing derivatives, we’re computing these sums, that sums over our M training examples.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值