机器学习学习笔记（二）——梯度下降法和正规方程法解决多项式回归问题

最新推荐文章于 2024-05-27 09:38:13 发布

lancetop-stardrms

最新推荐文章于 2024-05-27 09:38:13 发布

阅读量1.2k

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/weixin_41645983/article/details/89498613

版权

机器学习专栏收录该内容

16 篇文章 2 订阅

订阅专栏

对假设函数（hypothesis function）建模：

在机器学习中，现有一个数据集，k个特征。大致画出特征点与结果集的映射。

预测一个假设函数模型，如果用当前的k个特征，不一定能符合假设函数的公式：

We can combine multiple features into one. For example, we can combine x_1x1 and x_2x2 into a new feature x_3x3 by taking x_1x1⋅x_2x2.

那么我们就需要把 k 个特征合并或扩展为 n 个特征。比如 { x1, x2 } -> { x3=x1*x2 }; { x1 } -> { x1=x1, x2=x1^2 }。图像不一定是二维的。这样我们就可以建立一个合理的假设函数模型，仅对特征值的合并和扩展，而不改变假设函数的式子。

得到代价函数（cost function） :

$J(\Theta )=\frac{1}{2m}\sum_{i=1}^{m}(h_{\Theta }(x_{i})-y_{i})^{2}$

Octave矩阵表达式：J = sum((X * theta - y) .^ 2) / 2 / m;

求

$\overset{minimized}{\Theta }J(\theta )$

这样就把多项式回归转化成了线性回归问题来求解了。

接下来详细说一说梯度下降法（gradient descent）和正规方程法（normal equation）实现步骤：

梯度下降法（gradient descent）:

伪代码如下：

每一层循环中的Octave矩阵表达式： theta = theta - alpha / m * X' * (X * theta - y);

首先要对数据集进行特征缩放（feature scaling）也可叫归一化（normalization），否则下降的步数很多会导致算法缓慢。尽量把数据集的范围缩小到 [-1, 1] 的范围。

$x_{i}= \frac{x_{i}-\mu _{i}}{s_{i}}$

Where μi is the average of all the values for feature (i) and s_isi is the range of values (max - min), or s_isi is the standard deviation.

$\mu _{i}$ 是 $x_{i}$ 的平均值或者是标准差 $s_{i}$ 是 $x_{i}$ 的范围值的 (max - min).

其次，估算一个合理的学习率 $\alpha$ ，可以通过绘制代价函数 $J(\theta )$ 的图像来调试学习率 $\alpha$ 大小。正确的代价函数应该逐渐收敛的。

If \alphaα is too small: slow convergence.

If \alphaα is too large: may not decrease on every iteration and thus may not converge.

正规方程法（normal equation）:

The normal equation formula is given below:

注：把 $x_{0}$ 都置为 1.

两种方法的选择：

The following is a comparison of gradient descent and the normal equation:

Gradient Descent Normal Equation
Need to choose alpha No need to choose alpha
Needs many iterations No need to iterate
O (kn^2) O (n^3), need to calculate inverse of X^T*X
Works well when n is large Slow if n is very large

With the normal equation, computing the inversion has complexity O(n3). So if we have a very large number of features, the normal equation will be slow. In practice, when n exceeds 10,000 it might be a good time to go from a normal solution to an iterative process.

Gradient Descent	Normal Equation
Need to choose alpha	No need to choose alpha
Needs many iterations	No need to iterate
O (kn^2)	O (n^3), need to calculate inverse of X^T*X
Works well when n is large	Slow if n is very large

总结就是，数据量如果很大（>10,000）那就用梯度下降法，否则使用正规方程法。

lancetop-stardrms

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
机器学习学习笔记（二）——梯度下降法和正规方程法解决多项式回归问题

对假设函数（hypothesis function）建模：在机器学习中，现有一个数据集，k个特征。大致画出特征点与结果集的映射。预测一个假设函数模型，如果用当前的k个特征，不一定能符合假设函数的公式：We cancombinemultiple features into one. For example, we can combinex_1x1andx_2x2i...
复制链接

扫一扫