线性回归和逻辑斯蒂回归正则化

最新推荐文章于 2021-05-04 22:05:11 发布

蓬某某

最新推荐文章于 2021-05-04 22:05:11 发布

阅读量287

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/wang_yunpeng/article/details/103401722

版权

机器学习专栏收录该内容

10 篇文章 1 订阅

订阅专栏

返回目录
当特征量过少时，会出现欠拟合。即训练数据不能进行较好的拟合。
当特征量过多时，会出现过拟合。即训练数据能很好的拟合，但测试数据不能进行较好的拟合。
针对过拟合，需要在代价函数上加上惩罚项，模型复杂度越大，惩罚项越大。

1. 线性回归正则化

代价函数：
$\vec{\theta}) = \frac{1}{2m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^{j=n} \theta_j^2)$
其中：
$\begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{m\times1} \\ （m为测试样本个数） \end{aligned}$
则：
$\frac{\partial J(\vec{\theta})}{\partial\theta_j}= \begin{cases} \frac{1}{m}\sum_{i=1}^{i=m}( h(\vec{x}^{(i)})-y^{(i)})&j=0\\ \frac{1}{m}(\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})x_j^{(i)}+\lambda\theta_j)&j=1,2,...,n \end{cases}$
令 $\frac{\partial J(\vec{\theta})}{\partial\theta_j}=0$ ，得：
$\begin{aligned} &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) & (h(\vec{x}^{(2)})-y^{(2)}) & ...& (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix} \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_j^{(1)} \\ x_j^{(2)} \\ ...\\ x_j^{(m)} \\ \end{pmatrix} =-\lambda\theta_j \\ &\begin{pmatrix} (h(\vec{x}^{(1)})-y^{(1)}) \\ (h(\vec{x}^{(2)})-y^{(2)}) \\ ...\\ (h(\vec{x}^{(m)})-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} 0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \\ &\begin{pmatrix} {\vec{x}^{(1)}}^T\vec{\theta}-y^{(1)}) \\ {\vec{x}^{(2)}}^T\vec{\theta}-y^{(2)}) \\ ...\\ {\vec{x}^{(m)}}^T\vec{\theta}-y^{(m)}) \end{pmatrix}^T \begin{pmatrix} x_0^{(1)} & x_1^{(1)} &...& x_n^{(1)}\\ x_0^{(2)} & x_1^{(2)} &...& x_n^{(2)}\\ ...&...&...&...\\ x_0^{(m)} & x_1^{(m)} &...& x_n^{(m)}\\ \end{pmatrix} =-\lambda \begin{pmatrix} \theta_0&\theta_1 &\theta_2 & ...&\theta_n \end{pmatrix} \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \end{aligned}$
所以公式化求解得到：
$\begin{aligned} (X\vec{\theta}-\vec{y})^TX=-\lambda\vec{\theta}^T \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix} \\ \vec{\theta} = (X^TX+\lambda \begin{pmatrix} 0\\ &1\\ &&\ddots\\ &&&1 \end{pmatrix})^{-1}X^T\vec{y} \end{aligned}$
梯度下降法得到：
$\begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})&j=0 \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases}$

2. 逻辑斯蒂回归正则化

代价函数：
$\vec{\theta}) = -\frac{1}{m}(\sum_{i=1}^{i=m}y^{(i)}ln(h(\vec{x}^{(i)}))+(1-y^{(i)})ln(1-h(\vec{x}^{(i)})))+\frac{\lambda}{2m}\sum_{j=1}^{j=n} \theta_j^2$
梯度下降法得到：
$\begin{cases} \theta_0:=\theta_0 - \alpha\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)}) \\ \theta_j:=(1-\alpha\frac{\lambda}{m})\theta_j - \frac{\alpha}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})x_j^{(i)}&j=1,2,...,n \end{cases}$

3. Lasso回归

即在代价函数后面添加L1正则化项。
$\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} |\theta_j|$

4. 领回归

即在代价函数后面添加L2正则化项。1,2小节介绍的线性回归和Logistic回归正则化就是领回归。
$\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\alpha\sum_{j=1}^{j=n} \theta_j^2$

5. ElasticNet回归

即在代价函数后面添加L1+L2正则化项。
$\widehat{J( \vec{\theta}) }= J( \vec{\theta})+\sum_{j=1}^{j=n} (\alpha\rho|\theta_j|+\frac{\alpha(1-\rho)}{2} \theta_j^2)$
返回目录

蓬某某

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
线性回归和逻辑斯蒂回归正则化

返回目录当特征量过少时，会出现欠拟合。即训练数据不能进行较好的拟合。当特征量过多时，会出现过拟合。即训练数据能很好的拟合，但测试数据不能进行较好的拟合。针对过拟合，需要在代价函数上加上惩罚项，模型复杂度越大，惩罚项越大。1. 线性回归正则化代价函数：J(θ)=12m∑i=1i=m(h(x(i))−y(i))2+β∑i=1i=nθi2J( \pmb{\theta}) = \frac{1...
复制链接

扫一扫

专栏目录