线性回归和逻辑斯蒂回归正则化

返回目录
当特征量过少时,会出现欠拟合。即训练数据不能进行较好的拟合。
当特征量过多时,会出现过拟合。即训练数据能很好的拟合,但测试数据不能进行较好的拟合。
针对过拟合,需要在代价函数上加上惩罚项,模型复杂度越大,惩罚项越大。

1. 线性回归正则化

代价函数:
J ( θ ⃗ ) = 1 2 m ( ∑ i = 1 i = m ( h ( x ⃗ ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 j = n θ j 2 ) J( \vec{\theta}) = \frac{1}{2m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^{j=n} \theta_j^2) J(θ )=2m1(i=1i=m(h(x (i))y(i))2+λj=1j=nθj2)
其中:
y ⃗ = [ y ( 1 ) , y ( 2 ) , . . . , y ( m ) ] T ∈ R m × 1 ( m 为 测 试 样 本 个 数 ) \begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{m\times1} \\ (m为测试样本个数) \end{aligned} y =[y(1),y(2),...,y(m)]TRm×1m
则:
∂ J ( θ ⃗ ) ∂ θ j = { 1 m ∑ i = 1 i = m ( h ( x ⃗ ( i ) ) − y ( i ) ) j = 0 1 m ( ∑ i = 1 i = m ( h ( x ⃗ ( i ) ) − y ( i ) ) x j ( i ) + λ θ j ) j = 1 , 2 , . . . , n \frac{\partial J(\vec{\theta})}{\partial\theta_j}= \begin{cases} \frac{1}{m}\sum_{i=1}^{i=m}( h(\vec{x}^{(i)})-y^{(i)})&j=0\\ \frac{1}{m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)}+\lambda\theta_j)&j=1,2,...,n \end{cases} θjJ(θ )={m1i=1i=m(h(x (i))y(i))m1(i=1i=m(h(x (i))y(i))xj(i)+λθj)j=0j=1,2,...,n
∂ J ( θ ⃗ ) ∂ θ j = 0 \frac{\partial J(\vec{\theta})}{\partial\theta_j}=0 θjJ(θ )=0,得:
( ( h ( x ⃗ ( 1 ) ) − y ( 1 ) ) ( h ( x ⃗ ( 2 ) ) − y ( 2 ) ) . . . ( h ( x ⃗ ( m ) ) − y ( m ) ) ) ( x j ( 1 ) x j ( 2 ) . . . x j ( m ) ) = − λ θ j ( ( h ( x ⃗ ( 1 ) ) − y ( 1 ) ) ( h ( x ⃗ ( 2 ) ) − y ( 2 ) ) . . . ( h ( x ⃗ ( m ) ) − y ( m ) ) ) T ( x j ( 1 ) x j ( 2 ) . . . x j ( m ) ) = − λ θ j ( ( h ( x ⃗ ( 1 ) ) − y ( 1 ) ) ( h ( x ⃗ ( 2 ) ) − y ( 2 ) ) . . . ( h ( x ⃗ ( m ) ) − y ( m ) ) ) T ( x 0 ( 1 ) x 1 ( 1 ) . . . x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) . . . x n ( 2 ) . . . . . . . . . . . . x 0 ( m ) x 1 ( m ) . . . x n ( m ) ) = − λ ( 0 θ 1 θ 2 . . . θ n ) ( x ⃗ ( 1 ) T θ ⃗ − y ( 1 ) ) x ⃗ ( 2 ) T θ ⃗ − y ( 2 ) ) . . . x ⃗ ( m ) T θ ⃗ − y ( m ) ) ) T ( x 0 ( 1 ) x 1 ( 1 ) . . . x n ( 1 ) x 0 ( 2 ) x 1 ( 2 ) . . . x n ( 2 ) . . . . . . . . . . . . x 0 ( m ) x 1 ( m ) . . . x n ( m ) ) = − λ ( θ 0 θ 1 θ 2 . . . θ n ) ( 0 1 ⋱ 1 ) \begin{aligned} &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) & (h(\vec{x}^{(2)})-y^{(2)}) & ...& (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix} \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} 0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \\ &\begin{pmatrix} {\vec{x}^{(1)}}^T\vec{\theta}-y^{(1)}) \\ {\vec{x}^{(2)}}^T\vec{\theta}-y^{(2)}) \\ ...\\ {\vec{x}^{(m)}}^T\vec{\theta}-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} \theta_0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \end{aligned} ((h(x (1))y(1))(h(x (2))y(2))...(h(x (m))y(m)))xj(1)xj(2)...xj(m)=λθj(h(x (1))y(1))(h(x (2))y(2))...(h(x (m))y(m))Txj(1)xj(2)...xj(m)=λθj(h(x (1))y(1))(h(x (2))y(2))...(h(x (m))y(m))Tx0(1)x0(2)...x0(m)x1(1)x1(2)...x1(m)............xn(1)xn(2)...xn(m)=λ(0θ1θ2...θn)x (1)Tθ y(1))x (2)Tθ y(2))...x (m)Tθ y(m))Tx0(1)x0(2)...x0(m)x1(1)x1(2)...x1(m)............xn(1)xn(2)...xn(m)=λ(θ0θ1θ2...θn)011
所以公式化求解得到:
( X θ ⃗ − y ⃗ ) T X = − λ θ ⃗ T ( 0 1 ⋱ 1 ) θ ⃗ = ( X T X + λ ( 0 1 ⋱ 1 ) ) − 1 X T y ⃗ \begin{aligned} (X\vec{\theta}-\vec{y})^TX=-\lambda\vec{\theta}^T \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \\ \vec{\theta} = (X^TX+\lambda \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix})^{-1}X^T\vec{y} \end{aligned} (Xθ y )TX=λθ T011θ =(XTX+λ011)1XTy
梯度下降法得到:
{ θ 0 : = θ 0 − α ∑ i = 1 i = m ( θ ⃗ T x ⃗ ( i ) − y ( i ) ) j = 0 θ j : = ( 1 − α λ m ) θ j − α m ∑ i = 1 i = m ( θ ⃗ T x ⃗ ( i ) − y ( i ) ) x j ( i ) j = 1 , 2 , . . . , n \begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})&j=0 \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases} {θ0:=θ0αi=1i=m(θ Tx (i)y(i))θj:=(1αmλ)θjmαi=1i=m(θ Tx (i)y(i))xj(i)j=0j=1,2,...,n

2. 逻辑斯蒂回归正则化

代价函数:
J ( θ ⃗ ) = − 1 m ( ∑ i = 1 i = m y ( i ) l n ( h ( x ⃗ ( i ) ) ) + ( 1 − y ( i ) ) l n ( 1 − h ( x ⃗ ( i ) ) ) ) + λ 2 m ∑ j = 1 j = n θ j 2 J( \vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{j=n} \theta_j^2 J(θ )=m1(i=1i=my(i)ln(h(x (i)))+(1y(i))ln(1h(x (i))))+2mλj=1j=nθj2
梯度下降法得到:
{ θ 0 : = θ 0 − α ∑ i = 1 i = m ( θ ⃗ T x ⃗ ( i ) − y ( i ) ) θ j : = ( 1 − α λ m ) θ j − α m ∑ i = 1 i = m ( θ ⃗ T x ⃗ ( i ) − y ( i ) ) x j ( i ) j = 1 , 2 , . . . , n \begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)}) \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases} {θ0:=θ0αi=1i=m(θ Tx (i)y(i))θj:=(1αmλ)θjmαi=1i=m(θ Tx (i)y(i))xj(i)j=1,2,...,n

3. Lasso回归

即在代价函数后面添加L1正则化项。
J ( θ ⃗ ) ^ = J ( θ ⃗ ) + α ∑ j = 1 j = n ∣ θ j ∣ \widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} |\theta_j| J(θ ) =J(θ )+αj=1j=nθj

4. 领回归

即在代价函数后面添加L2正则化项。1,2小节介绍的线性回归和Logistic回归正则化就是领回归。
J ( θ ⃗ ) ^ = J ( θ ⃗ ) + α ∑ j = 1 j = n θ j 2 \widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} \theta_j^2 J(θ ) =J(θ )+αj=1j=nθj2

5. ElasticNet回归

即在代价函数后面添加L1+L2正则化项。
J ( θ ⃗ ) ^ = J ( θ ⃗ ) + ∑ j = 1 j = n ( α ρ ∣ θ j ∣ + α ( 1 − ρ ) 2 θ j 2 ) \widehat{J( \vec{\theta}) }= J( \vec{\theta})+\sum_{j=1}^{j=n} (\alpha\rho|\theta_j|+\frac{\alpha(1-\rho)}{2} \theta_j^2) J(θ ) =J(θ )+j=1j=n(αρθj+2α(1ρ)θj2)
返回目录

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值