ML(七)_正则化

最新推荐文章于 2020-09-24 10:39:36 发布

ROY_MENG

最新推荐文章于 2020-09-24 10:39:36 发布

阅读量200

点赞数

分类专栏： ML

本文链接：https://blog.csdn.net/weixin_31270811/article/details/79641666

版权

ML 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

正则化（Regularization）

标签（空格分隔）： ML standford机器学习视频笔记

过拟合的问题

正则化用来定位、处理过拟合的问题。无论是预测还是分类问题，都会有过拟合、欠拟合的问题出现，表现为假设函数与训练集的趋势没有很好匹配，通常因为函数模型太过简单或太过复杂。一般而言，函数模型中高次方相影响越多，更容易出现过拟合的问题。
对于过拟合问题，一般有两种方法：
1. 减少特征的数量
人工选择要保留的特征
通过一个选择算法
2. 正则化：减小（注意不是减少！！！）参数 $\theta_j$ ，当有很多有点小用的特征是效果好。

代价函数

通过修改代价函数来减小某些参数 $\theta$ 的影响。如想要消除下式中3，4次项的影响：

θ 0 + θ 1 x + θ 2 x 2 + θ 3 x 3 + θ 4 x 4

$\theta_0 + \theta_1x + \theta_2x^2 + \theta_3x^3 + \theta_4x^4$

m i n θ 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2 + 1000 \cdot θ 23 + 1000 \cdot θ 24

$min_\theta\ \dfrac{1}{2m}\sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + 1000\cdot\theta_3^2 + 1000\cdot\theta_4^2$
此时，为了减小代价函数，显然需要

θ3 θ 3 $\theta_3$ 和

θ4 θ 4 $\theta_4$ 尽可能接近于0.同样可以对所有参数同时进行“惩罚”：

m i n θ 1 2 m [\sum i = 1 m (h θ (x (i)) - y (i)) 2 + λ \sum j = 1 n θ 2 j]

$min_\theta\ \dfrac{1}{2m}\ \left[ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2 \right]$
通过控制

λ λ $\lambda$ 来控制拟合程度。显然

λ λ $\lambda$ 越大，拟合程度越低，曲线更光滑。
正则化适用于线性回归和逻辑回归。

正则线性回归

对于线性回归的求解，我们之前推导了两种学习算法：一种基于梯度下降，一种基于正规方程。
1.基于梯度下降
修正梯度下降方程，注意 $\theta_0$ 需要排除在外。

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$
可以整理成：

θ j : = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$
理解为在每一次更新时都人为减小

θj θ j $\theta_j$ 的量。
2.基于正规方程（Normal Equation）
方程修正为：

θ = (X T X + λ \cdot L) - 1 X T y where L = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 011 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$
其中L用来实现正则化。之前当m<=n时，

XTX X T X $X^TX$ 不可逆，加了L矩阵也能解决这一问题。

正则逻辑回归

逻辑回归的代价函数如下：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)})) \large]$
修正后如下：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$
注意，这里同样没有对

θ0 θ 0 $\theta_0$ 进行正则化。则逻辑回归的梯度下降为：

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*}& \text{Repeat}\ \lbrace \newline& \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline& \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline& \rbrace\end{align*}$

ROY_MENG

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ML(七)_正则化

正则化（Regularization）标签（空格分隔）： ML standford机器学习视频笔记过拟合的问题正则化用来定位、处理过拟合的问题。无论是预测还是分类问题，都会有过拟合、欠拟合的问题出现，表现为假设函数与训练集的趋势没有很好匹配，通常因为函数模型太过简单或太过复杂。一般而言，函数模型中高次方相影响越多，更容易出现过拟合的问题。对于过拟合问题，一般有两种方法： ...
复制链接

扫一扫