5-Regularization

最新推荐文章于 2024-04-19 20:23:02 发布

DawnRanger

最新推荐文章于 2024-04-19 20:23:02 发布

阅读量612

点赞数

分类专栏： machine-learning 文章标签： machine-learning

本文链接：https://blog.csdn.net/DawnRanger/article/details/48007707

版权

machine-learning 专栏收录该内容

22 篇文章 1 订阅

订阅专栏

1 - The Problem of Overfitting 过拟合问题

housing price prediction

Overfitting:太多features，Hypothesis函数能非常好的拟合训练集，使得J(θ)≈0。但是不能适应一般情况，对测试集预测效果较差。
解决方法：
- 减少属性数目：人工选择应该保留的属性，使用模型选择算法（后续章节会讲到）
- Regularization：
  - 保留所有的features，但是减小参数 θ 的值
  - 即使在features很多的情况下效果也很好，每个feature对y的预测都有贡献。

2 - Cost Function 代价函数

$J(\theta)=\frac{1}{2m}\bigg[\sum\limits_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}) +\lambda \sum\limits_{j=1}^n\theta_j^2 \bigg]$

更小的 θ 值：hypothesis函数更简单、过拟合可能性更小。
如果此时 λ 选取了一个非常大的值呢？
- 算法运行正常（许多属性将被舍弃）
- 无法消除过拟合问题
- 算法出现欠学习现象
- 梯度下降法无法收敛
注意：

只要是用到 regularization 的地方，都要记住：只考虑对输入x有权值的参数 θ 的影响。不如，这里不能加上 θ0^2，而只能计算从 θ1~θn 的情况！！！！

3 - Regularized Linear Regression 线性回归的规范化

Gradient Descent

$} R e p e a t {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j$ $\begin{aligned} &Repeat\{ \\ &\quad\theta_0 := \theta_0-\alpha\frac{1}{m}\sum\limits_{i=1}^m (h_\theta(x^{(i)}) -y^{(i)} )x_0^{(i)} \\ &\quad\theta_j :=\theta_j(1-\alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} \\ \} \end{aligned}$
Normal equation

$X = ⎡ ⎣ ⎢ ⎢ ⎢ (x (1)) T ⋮ (x (m)) T ⎤ ⎦ ⎥ ⎥ ⎥ ， y = ⎡ ⎣ ⎢ ⎢ y (1) ⋮ y (m) ⎤ ⎦ ⎥ ⎥ θ = (X T X + λ ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ 01 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥) - 1 X T y$ $\begin{aligned} &X =\begin{bmatrix} (x^{(1)})^T\\ \vdots \\ (x^{(m)})^T \end{bmatrix}， y = \begin{bmatrix} y^{(1)}\\ \vdots \\ y^{(m)} \end{bmatrix} \\ &\theta= (X^TX + \lambda \begin{bmatrix} 0 & & & \\ & 1 \\ &&\ddots \\ &&&1 \end{bmatrix})^{-1} X^Ty \end{aligned}$
Non-invertibility 矩阵不可逆问题在这里得到了解决
- 另外，可以证明，当λ>0时，上面的 θ 表达式一定是可逆的。
- m<=n(样本数量比属性还少)：

4 - Regularized Logistic Regression 逻辑回归的规范化

Hypothesis公式和Cost function：
$h_\theta(x)=g(\theta^Tx)$
$J(\theta)=[-\frac{1}{m}\sum\limits_{i=1}^my^{(i)}log(h_\theta(x^{(i)})) +(1-y^{(i)})log(1-h_\theta(x^{(i)})) ] + \frac{\lambda}{2m}\sum\limits_{j=1}^n\theta_j^2$
$J(\theta)=\frac{1}{2m}\bigg[\sum\limits_{i=1}^m (h_\theta(x^{(i)})-y^{(i)}) +\lambda \sum\limits_{j=1}^n\theta_j^2 \bigg]$

下面是具体的算法：

Gradient Descent

$} R e p e a t {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 i f j = 0 θ j : = θ j - α [1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j - λ m θ j] i f j = 1, 2, 3, \dots, n$ $\begin{aligned} &Repeat\{ \\ &\quad\theta_0 := \theta_0-\alpha\frac{1}{m}\sum\limits_{i=1}^m (h_\theta(x^{(i)}) -y^{(i)} )x_0^{(i)} \quad if \; j=0\\ &\quad\theta_j :=\theta_j - \alpha[\frac{1}{m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} -\frac{\lambda}{m}\theta_j ] \quad if \; j =1,2,3,\dots,n\\ \} \end{aligned}$

其实也就是Linear Regression中的公式：

$} R e p e a t {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j$ $\begin{aligned} &Repeat\{ \\ &\quad\theta_0 := \theta_0-\alpha\frac{1}{m}\sum\limits_{i=1}^m (h_\theta(x^{(i)}) -y^{(i)} )x_0^{(i)} \\ &\quad\theta_j :=\theta_j(1-\alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum\limits_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} \\ \} \end{aligned}$
Advanced optimization

DawnRanger

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
5-Regularization

1 - The Problem of Overfitting 过拟合问题Overfitting:太多features，Hypothesis函数能非常好的拟合训练集，使得J(θ)≈0。但是不能适应一般情况，对测试集预测效果较差。解决方法：减少属性数目：人工选择应该保留的属性，使用模型选择算法（后续章节会讲到）Regularization：保留所有的features，但是减小参数 θ 的值
复制链接

扫一扫

专栏目录