Notes on Machine Learning By Andrew Ng (4)

Notes on Machine Learning By Andrew Ng (4)

Click here to check former notes.

Regularization(正则化)

The problem of overfitting

[外链图片转存失败(img-94NBDjeH-1564231844496)(C:\Users\chenh\AppData\Local\Temp\1563888658663.png)]

Addressiong overfitting

Options:

  • Reduce # of features
    • Manually select which features to keep.
    • Model selection algorithm (later in course) .
    • abandon some useful info.
  • Regularization
    • Keep all the features, but reduce magnitude/ values of parameters θ j \theta_j θj.
    • Works well when we have a lot of features, each of which contributes a bit to predicting y y y.

Cost Function

Regularization

Small values for parameters θ i , 1 ≤ i ≤ n \theta_i, 1\leq i \leq n θi,1in

  • “Simpler” hypothesis
  • Less prone to overfitting

J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ i = 1 n θ i 2 ] J(\theta) = \frac{1}{2m}[\sum_{i=1}^m{(h_\theta(x^{(i)}) - y^{(i)})}^2 + \lambda \sum _{i =1} ^ n \theta_i^2] J(θ)=2m1[i=1m(hθ(x(i))y(i))2+λi=1nθi2]

Regularzation term: λ ∑ i = 1 n θ i 2 \lambda \sum _{i =1} ^ n \theta_i^2 λi=1nθi2

Regularzation parameter: λ \lambda λ, controls a trade off between 2 different goals.

  • First goal captured by the first term of the objective, (a.k.a ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 \sum_{i=1}^m{(h_\theta(x^{(i)}) - y^{(i)})}^2 i=1m(hθ(x(i))y(i))2), to fit the data well.
  • Second goal is keep the parameters small, so that avoid overfitting.

If λ \lambda λ is set too large, it will be like θ i ≈ 0 , 1 ≤ i ≤ n \theta_i \approx 0, 1\leq i \leq n θi0,1in, which will lead to underfitting.

Regularized linear regression

Gradient descent

Modify its process.

Repeat{

θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) \theta_0:=\theta_0- \alpha \frac{1}{m} \sum_{i = 1} ^m (h_\theta(x^{(i)}) - y^{(i)}) x_0^{(i)} θ0:=θ0αm1i=1m(hθ(x(i))y(i))x0(i)

θ j : = θ j − α [ 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) + λ m θ j ] ( j = 1 , 2 ⋯   , n ) \theta_j:=\theta_j - \alpha [\frac{1}{m} \sum_{i = 1} ^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)}+\frac{\lambda}{m}\theta_j](j =1,2\cdots, n) θj:=θjα[m1i=1m(hθ(x(i))y(i))xj(i)+mλθj](j=1,2,n)

}

→ θ j : = θ j ( 1 − α λ m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \rightarrow \theta_j:=\theta_j(1- \alpha\frac{\lambda}{m}) - \alpha \frac{1}{m} \sum_{i = 1} ^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} θj:=θj(1αmλ)αm1i=1m(hθ(x(i))y(i))xj(i)

It is a fact that 1 − α λ m &lt; 1 1- \alpha\frac{\lambda}{m} &lt; 1 1αmλ<1, so θ \theta θ descent a little in every progress.

Normal equation

Modify it.
θ = ( X T X + λ [ 0 1 1 ⋱ 1 ] ) − 1 X T y \mathbb{\theta} = (X^T X+\lambda \left[ \begin{matrix}0\\&amp;1\\&amp;&amp;1\\ &amp;&amp;&amp;\ddots\\ &amp;&amp;&amp;&amp;1 \end{matrix} \right])^{-1}X^Ty θ=(XTX+λ0111)1XTy

Non-invertibilty

If m ≤ m \leq m n, m m m is the # of examples and n n n is the # of features, then X T X X^T X XTX is singular matrix.

But when you add the λ \lambda λ part, this matrix is invertible.

Regularized logistic regression

Just like linear regression, we add the term to penalize θ \theta θs in the cost function.

J ( θ ) = − [ 1 m ∑ i = 1 m y ( i ) ∗ l o g ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) ] + λ m ∑ j = 1 n θ j 2 J(\theta)=-[\frac{1}{m}\sum_{i=1}^m y^{(i)}*log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))] +\frac{\lambda}{m}\sum_{j=1}^n \theta_j^2 J(θ)=[m1i=1my(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i)))]+mλj=1nθj2

Annotation

It has to be clear that although logistic and linear regression share the same math equation in the graident descent progress, they are two very different algorithm. They have their own hypothesis. The only reason they look alike in the gradient descent is that the cost fuction are different, so that we can unify the equation in graident descent.

Click here to see the following note.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip 【备注】 1、该资源内项目代码都经过测试运行成功,功能ok的情况下才上传的,请放心下载使用!有问题请及时沟通交流。 2、适用人群:计算机相关专业(如计科、信息安全、数据科学与大数据技术、人工智能、通信、物联网、自动化、电子信息等)在校学生、专业老师或者企业员工下载使用。 3、用途:项目具有较高的学习借鉴价值,不仅适用于小白学习入门进阶。也可作为毕设项目、课程设计、大作业、初期项目立项演示等。 4、如果基础还行,或热爱钻研,亦可在此项目代码基础上进行修改添加,实现其他不同功能。 欢迎下载!欢迎交流学习!不清楚的可以私信问我! 毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip毕设新项目-基于Java开发的智慧养老院信息管理系统源码+数据库(含vue前端源码).zip
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值