【学习笔记】吴恩达机器学习 | 第六章 | 正则化

最新推荐文章于 2024-07-20 08:27:55 发布

Benjamin Chen.

最新推荐文章于 2024-07-20 08:27:55 发布

阅读量387

点赞数 2

分类专栏：学习笔记【学习笔记】吴恩达机器学习文章标签：学习机器学习人工智能线性回归线性代数

本文链接：https://blog.csdn.net/jermy00/article/details/131712393

版权

学习笔记同时被 2 个专栏收录

27 篇文章 25 订阅

订阅专栏

【学习笔记】吴恩达机器学习

17 篇文章 23 订阅

订阅专栏

在这里插入图片描述

简要声明

课程学习相关网址
由于课程学习内容为英文，文本会采用英文进行内容记录，采用中文进行简要解释。
本学习笔记单纯是为了能对学到的内容有更深入的理解，如果有错误的地方，恳请包容和指正。
非常感谢Andrew Ng吴恩达教授的无私奉献！！！

文章目录

简要声明
专有名词
The problem of overfitting
- - Overfitting
  - Addressing overfitting
Regularization cost function
- - Regularization
Regularized linear regression
Regularized logistic regression
- - Regularized logistic regression
  - Gradient descent
吴恩达教授语录

专有名词

Underfitting	欠拟合	high bias	高偏差
Overfitting	过拟合	high variance	高方差
generalize	泛化	Regularization	正则化

The problem of overfitting

Overfitting

在这里插入图片描述

underfitting 欠拟合 → high bias 高偏差
overfitting 过拟合 → high variance 高方差
Overfitting: If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples →当特征过多的时候，训练出的假设函数能很好地拟合训练集（代价函数几乎等于零），但是导致它无法泛化到新的样本中（无法预测新样本）
Generalize 泛化 →一个假设模型应用到新样本的能力

Addressing overfitting

Reduce number of features →减少特征数量
1. Manually select which features to keep →人工选择应该保留的特征
2. Model selection algorithm →模型选择算法（自动选择保留特征变量）
3. 舍弃一部分特征变量也舍弃了一些问题相关信息
Regularization 正则化
1. Keep all the features, but reduce magnitude/values of parameters θ_j →保留所有特征变量，但是减少量级或参数θ_j的大小
2. Works well when we have a lot of features, each of which contributes a bit to predicting y →当有很多特征变量时，其中每一个变量都能对预测的y值产生一点影响

Regularization cost function

Regularization

Small values for parameters θ →如果我们参数值较小时（加入惩罚项）
1. “Simpler” hypothesis →一个更简单的假设函数
2. Less prone to overfitting →更不容易出现过拟合
在代价函数加一个额外的正则化项 →缩小每一个参数的值
1. 没有给θ_0增加惩罚项，无论是否包括θ_0实际上对结果影响都不大
2. λ称为正则化参数 →控制两个不同目标之间的取舍 →控制两项的平衡关系
  1. 第一个目标（与第一项有关）：更好地拟合训练集数据
  2. 第二个目标（与正则化项有关）：保持参数尽量的小
3. 如果λ被设的太大的话，参数的惩罚程度过大，参数都会接近于0 →h_θ(x) = θ_0 →欠拟合
4. 需要选择一个合适的正则化参数λ

$J(\theta)=\frac{1}{2m} [\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda\sum_{j=1}^n\theta_j^2]$

Regularized linear regression

$J(\theta)=\frac{1}{2m} [\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda\sum_{j=1}^n\theta_j^2]$

$\min\limits_{\theta} \ J(\theta)$

Gradient descent

Repeat {

$\theta_0:=\theta_0-\alpha\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x_0^{(i)}$

$\theta_j:=\theta_j-\alpha[\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x_j^{(i)} + \frac{\lambda}{m}\theta_j] \quad (j=1,2,3,\cdots,n)$

}

$\theta_j:=\theta_j(1-\frac{\lambda}{m})-\alpha\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x_j^{(i)} \quad (j=1,2,3,\cdots,n)$

正则化项没有θ_0，需要进行分类讨论
1-α*(λ/m)<1 →α*(λ/m)是个正数，通常学习率α很小但m却很大 →α*(λ/m)很小
每次迭代时都将θ_j乘以一个比1略小的数来缩小参数

Normal equation

$\theta=(X^TX+\lambda \begin{bmatrix} 0 \\ & 1 \\ & & 1 \\ & & & \ddots \\ & & & &1 \end{bmatrix} )^{-1}X^Ty$

有趣的矩阵：除了最左上角的元素是0以外，其余对角线元素都是1，其余元素都是0
Non-invertibility (optional/advanced) →不可逆问题
1. 如果 m ≤ n →样本数量小于等于特征数量 →X转置乘以X的矩阵不可逆（奇异矩阵/退化矩阵）
2. 只要正则化参数λ大于0，那么X的转置乘以X加上λ乘以有趣的矩阵一定不是奇异矩阵 →一定是可逆的

Regularized logistic regression

$h_{\theta}(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}$

$J(\theta)=-\frac{1}{m}\sum_{i=1}^{m} [y\ log(h_{\theta}(x))+(1-y)\ log(1-h_{\theta}(x))] + \frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$

Gradient descent

Repeat {

$\theta_0:=\theta_0-\alpha\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x_0^{(i)}$

$\theta_j:=\theta_j-\alpha[\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})\cdot x_j^{(i)} + \frac{\lambda}{m}\theta_j] \quad (j=1,2,3,\cdots,n)$

}

吴恩达教授语录

“When I walk around Silicon Valley, I live here in Silicon Valley, there are a lot of engineers that are frankly making a ton of money for the companies using machine learning algorithms.”
“By now, frankly, you probably know quite a lot more machine learning than many certainly now, but you probably know quit a lot more machine learning right now than frankly, many of the Silicon Valley engineers, while they’re having very successful careers, making tons of money for the companies or building great products using machine learning algorithms.”
“So, congratulations, you’ve actually come a long ways and you can actually know enough to apply this stuff and get to work or many problems.”