吴恩达机器学习笔记(五)正则化Regularization

正则化(regularization)

过拟合问题(overfitting)

  • Underfitting(欠拟合)–>high bias(高偏差)
  • Overfitting(过拟合)–>high variance(高方差)
  • Overfitting:If we have too many features, the learned hypothesis
    may fit the training set very well , but fail to generalize to new examples (predict prices on new examples).模型泛化能力差
    addressing overfitting
    options:
    1)reduce number of features(减少特征数量)
    –Manually select which features to keep
    –Model selection algorithm(模型选择算法)
    2)regularization(正则化)
    –keep all the features but reduce magnitude/values(但减少参数的大小/值) of parameters.
    –Works well when we have a lot of features,each of which contributes a bit to predicting y.

代价函数Cost function(正则化代价函数)

the effect of penalizing two of the parameter values being large.
加入惩罚增大了两个参数带来的效果。
θ j \theta_j θj 加入惩罚项:
In regularized linear regression,we choose θ \theta θ to minimize.
Regularization线性回归代价函数:
J ( θ ) = 1 2 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 m θ j 2 ] J(\theta)=\frac{1}{2m}\left[ \sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^m\theta_j^2\right] J(θ)=2m1[i=1m(hθ(x(i))y(i))2+λj=1mθj2]
目标: min ⁡ θ J ( θ ) \underset{\theta}{\min}J(\theta) θminJ(θ)
λ \lambda λ:regularization parameter(正则参数)

  • λ很大的结果?

在这里插入图片描述

线性回归的正则化(Regularized linear regression)

梯度下降(Gradient descent)
在这里插入图片描述梯度下降算法:
repeat: θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) \theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} θ0:=θ0αm1i=1m(hθ(x(i))y(i))x0(i)
θ j : = θ j − α 1 m [ ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) + λ θ j ] \theta_j:= \theta_j-\alpha\frac{1}{m}\left[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}+\lambda\theta_j\right] θj:=θjαm1[i=1m(hθ(x(i))y(i))x0(i)+λθj]
等价于:
θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x 0 ( i ) \theta_0:= \theta_0-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} θ0:=θ0αm1i=1m(hθ(x(i))y(i))x0(i)
θ j : = θ j ( 1 − α 1 m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \theta_j:= \theta_j(1-\alpha\frac{1}{m})-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} θj:=θj(1αm1)αm1i=1m(hθ(x(i))y(i))xj(i)

正规方程(Normal equation)
在这里插入图片描述
正规方程:
假设: m ≤ n ( e x a m p l e s ≤ f e a t u r e s ) m\leq n(examples\leq features) mn(examplesfeatures)
θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)1XTy
if λ>0,
θ = ( X T X + λ [ 0 1 1 ⋱ 1 ] ⏟ ( n + 1 ) × ( n + 1 ) ) − 1 X T y \theta=\left(X^TX+\lambda\underbrace{\begin{bmatrix} 0 \\ & 1 & &&\\&&1\\&&&⋱\\&&&&1 \end{bmatrix} }_{(n+1)\times(n+1)}\right)^{-1}X^Ty θ=XTX+λ(n+1)×(n+1) 01111XTy
只要λ>0,那么括号内的矩阵一定不是奇异矩阵,也就是可逆的。
在这里插入图片描述

逻辑回归的正则化(Regularization logistic regression)

逻辑回归代价函数:
J ( θ ) = − 1 m ∑ i = 1 m ( y ( i ) log ⁡ ( h θ ( x ( i ) ) ) + ( 1 − y ( i ) ) log ⁡ ( 1 − h θ ( x ( i ) ) ) ) + λ 2 m ∑ j = 1 m θ j 2 J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{m}\theta_j^2 J(θ)=m1i=1m(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))))+2mλj=1mθj2
在这里插入图片描述

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值