机器学习笔记——正则化、欠拟合与过拟合

最新推荐文章于 2024-09-06 21:32:24 发布

Simp丶

最新推荐文章于 2024-09-06 21:32:24 发布

阅读量762

点赞数 1

分类专栏：机器学习文章标签：正则化过拟合

本文链接：https://blog.csdn.net/sp1206/article/details/80272935

版权

机器学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

欠拟合(underfitting)

高偏差（high bias），曲线不能很好的拟合训练数据。

过拟合(overfitting)

高方差（high variance），通常因为过多的特征导致，曲线能够很准确的拟合训练数据，但是不能泛化到新的数据。

这里写图片描述

解决过拟合问题的方法

减少特征变量数目

人工选择保留的特征、模型选择算法
正则化

保留所有特征，但降低 $\theta_j$ 的数量级

正则化

当假设函数出现过拟合现象，我们可以通过提高某些系数的代价来降低他们的权重值。
比如：
对于假设函数 $\theta_0+\theta_1x+\theta_2x^2+\theta_3x^3+\theta_4x^4$ ，我们想降低 $\theta_3x^3$ 与 $\theta_4x^4$ 对假设函数的影响，使其更加逼近于一个二次函数，在不舍弃这些特征或者更换假设函数的形式的前提下，可以通过改造代价函数，通过增加 $\theta_3$ 、 $\theta_4$ 在代价函数的代价值，使 $\theta_3$ $\theta_4$ 趋近于0，达到我们的目的。
如将代价函数修改为这里写图片描述，当最小化代价函数时， $\theta_3$ $\theta_4$ 趋向于0
定义新的代价函数， $\lambda$ 是正则化参数，表示这些 $\theta$ 参数膨胀的成本。
使用正则化的代价函数，可以解决过拟合问题，但也要注意 $\lambda$ 的取值，过大会出现欠拟合，过小仍不能解决过拟合问题。

线性回归正则化

在梯度下降中应用正则化，修改梯度下降的迭代更新公式，除 $\theta_0$ 之外的 $\theta$ 值更新时需要加上 $\frac{\lambda}{m}\theta_j$ ，即

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$
进一步得到

θj:=θj(1−αλm)−α 1m ∑mi=1(hθ(x(i))−y(i))x(i)j θ j := θ j ( 1 − α λ m ) − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) $\theta_j := \theta_j(1-\alpha\frac{\lambda}{m}) - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$ ,

1−λm 1 − λ m $1-\frac{\lambda}{m}$ 总是小于1，能够直观地看到

θj θ j $\theta_j$ 被缩小。

正规方程正则化修改

θ = (X T X + λ \cdot L) - 1 X T y where L = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 011 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$
L为n+1阶方阵。
当样本数量m小于特征数量n时，会出现

XTX X T X $X^TX$ 不可逆的现象，在使用正则化正规方程计算时，当

λ>0 λ > 0 $\lambda >0$ ，

XTX+λL X T X + λ L $X^TX+\lambda L$ 成为可逆矩阵，因此正则化也帮助解决了某些不可逆的问题。

逻辑回归正则化

正则化代价函数
这里写图片描述

注意：
在正则化时没有对偏置项正则化，不包含 $\theta_0$

梯度下降更新公式修改

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

∂J(θ)∂θj=(1m ∑mi=1(hθ(x(i))−y(i))x(i)j)+λmθj ∂ J ( θ ) ∂ θ j = ( 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) ) + λ m θ j $\frac{\partial J(\theta)}{\partial \theta_j}= \left(\frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j$