first week of machine learning on Coursera

最新推荐文章于 2022-02-14 23:41:13 发布

腾原

最新推荐文章于 2022-02-14 23:41:13 发布

阅读量263

点赞数

分类专栏： coursera机器学习笔记文章标签： coursera机器学习笔记

本文链接：https://blog.csdn.net/tengyuan93/article/details/78075222

版权

coursera机器学习笔记专栏收录该内容

7 篇文章 0 订阅

订阅专栏

first week of machine learning on Coursera

@(Coursera)
惯例是，先在Matlab/octava上实现算法原型，确定可用再迁移到其他编译环境。因为Matlab/octava集成了很多机器学习算法和常用的计算，对于算法实现速度很快，而且代码比较简单。

平方误差函数是解决回归问题最常用的代价函数(cost function)。
我们的目的是使我们作出的假设函数hypothesis function最接近于实际的训练集样本点集 $(x,y)$ ,假设函数用 $h(\theta)=\theta_1 x+\theta_0$ 表示衡量假设函数拟合训练样本的情况是通过代价函数来衡量的，代价函数用 $J(\theta)$ 来表示， $J(\theta)$ 是 $\theta_1$ 和 $\theta_0$ 的函数。
所以我们的目的就是找到一组 $\theta_1$ 和 $\theta_0$ ，使得 $J(\theta)$ 的值最小。
我们使用梯度下降法来寻找 $\theta_1$ 和 $\theta_0$ 的值。
梯度下降法的直观描述就是，当人在山顶，每次迈出一步长 $\alpha$ ，他可以选择任意方向来下山，但是需要以最短时间下山。那么肯定选择和自身位置等高线垂直的方向下山，此时正好是梯度下降的方向。

θ j = θ j - α \partial J ( θ 0 , θ 1 ) \partial θ j (f o r j = 0 a n d j = 1)

$\theta_j=\theta_j-\alpha\frac{\partial J(\theta_0,\theta_1)}{\partial \theta_j}(for j=0 and j=1)$
这里的

α $\alpha$ 代表学习速率，也就是下山时的步长。
temp0:

=θ0−α∂J(θ0,θ1)∂θ0 $=\theta_0-\alpha\frac{\partial J(\theta_0,\theta_1)}{\partial \theta_0}$
temp1:

=θ1−α∂J(θ0,θ1)∂θ1 $=\theta_1-\alpha\frac{\partial J(\theta_0,\theta_1)}{\partial \theta_1}$

θ0:=temp0 $\theta_0:=temp0$

θ1:=temp1 $\theta_1:=temp1$
通过梯度下降不断的更新

θ0和θ1 $\theta_0和\theta_1$ ，知道

J(θ) $J(\theta)$ 收敛为止。如下图的

J(θ) $J(\theta)$ 是个凸函数（Convex function）它收敛时为全局最小值。
步长太大可能会导致无法收敛：
![Alt text](./屏幕快照 2017-09-23 下午7.14.33.png)

线性模型时：
假设函数 $h_\theta (x^i)=\theta_1 x+\theta_0$
成本函数cost function: $J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\theta (x^i)-y^i)^2$
这里为什么乘以 $\frac{1}{2m}$ 系数呢，乘以 $\frac{1}{m}$ 是使用均方误差来衡量平方误差，这是数理统计中经常使用的方法；乘以 $\frac{1}{2}$ 是因为后续对 $J(\theta_0,\theta_1)$ 求导时会多出个2，用 $\frac{1}{2}$ 来抵消，是式子看起来更简便。因为我们的目的就是求cost function最小化时的 $\theta_0和\theta_1$ ，前面乘以个系数并不影响。
此时，

θ 0 : θ 0 - α d d θ 0 J (θ 0) = θ 0 - α 1 m \sum i = 1 m (h θ (x i) - y i)

$\theta_0:\theta_0-\alpha \frac{d}{d\theta_0}J(\theta_0) =\theta_0-\alpha \frac{1}{m} \sum_{i=1}^{m}(h_\theta(x^i)-y^i)$

θ 1 : θ 1 - α d d θ 1 J (θ 1) = θ 1 - α 1 m (h θ (x i) - y i) x i

$\theta_1:\theta_1-\alpha \frac{d}{d\theta_1}J(\theta_1)=\theta_1-\alpha \frac{1}{m}(h_\theta (x^i)-y^i) x^i$
Batch:表示步长，也称为学习速率，就是上式中的

α $\alpha$ .

Vector:an N*1 matrix
矩阵计算：
单位矩阵(Identity matrix):对角线元素为1，其余元素为0的方阵。

I = ⎡ ⎣ ⎢ ⎢ 100010001 ⎤ ⎦ ⎥ ⎥

$I= \begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&0&1 \end{bmatrix}$
矩阵乘法：

A×B≠B×A $A\times B \neq B \times A$
除非：

A×I=I×A $A\times I=I\times A$
矩阵的逆，当矩阵是个方阵时，

m×m $m\times m$ ,并且矩阵

A $A$ 存在逆矩阵，则满足

A \times A - 1 = A - 1 \times A = I

$A\times A^{-1}=A^{-1}\times A=I$

腾原

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
first week of machine learning on Coursera

first week of machine learning on Coursera@(Coursera) 惯例是，先在octava上实现算法原型，确定可用再迁移到其他编译环境。因为octava集成了很多机器学习算法和常用的计算，对于算法实现速度很快，而且代码比较简单。平方误差函数是解决回归问题最常用的代价函数(cost function)。我们的目的是使我们作出的假设函数hypothesi
复制链接

扫一扫

专栏目录