UFLDL是吴恩达团队编写的较早的一门深度学习入门,里面理论加上练习的节奏非常好,每次都想快点看完理论去动手编写练习,因为他帮你打好了整个代码框架,也有详细的注释,所以我们只要实现一点核心的代码编写工作就行了,上手快!
第一节就是:Linear Regression(线性回归)
线性回归,顾名思义就是用一个线性的模型去预测。我们就是要用
{
(
x
(
1
)
,
y
(
1
)
)
,
…
,
(
x
(
m
)
,
y
(
m
)
)
}
\left\{\left(x^{(1)}, y^{(1)}\right), \ldots,\left(x^{(m)}, y^{(m)}\right)\right\}
{(x(1),y(1)),…,(x(m),y(m))}数据去训练一个线性模型或者线性函数:
h
θ
(
x
)
=
∑
i
θ
j
x
j
=
θ
⊤
x
h_{\theta}(x)=\sum_{i} \theta_{j} x_{j}=\theta^{\top} x
hθ(x)=i∑θjxj=θ⊤x
使得对每一个训练样本,都能够有这样的效果:
y
(
i
)
≈
h
(
x
(
i
)
)
y^{(i)} \approx h\left(x^{(i)}\right)
y(i)≈h(x(i))
我们现在要做的就是:
- 找到一个需要优化的目标函数,或者叫做损失函数cost function。它用来衡量预测值偏离真实值的情况,这也是监督学习supervised learning 的标志。
- 找到objective function之后,就要找到使之损失值下降的优化方法,这里最常见的就是:梯度下降(Gradient Descent)。
依照上面两个原则,我们的loss function是这样的,类似L2范数:
J
(
θ
)
=
1
2
∑
i
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
2
=
1
2
∑
i
(
θ
⊤
x
(
i
)
−
y
(
i
)
)
2
J(\theta)=\frac{1}{2} \sum_{i}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)^{2}=\frac{1}{2} \sum_{i}\left(\theta^{\top} x^{(i)}-y^{(i)}\right)^{2}
J(θ)=21i∑(hθ(x(i))−y(i))2=21i∑(θ⊤x(i)−y(i))2
而梯度下降法就是,要计算出梯度
∇
θ
J
(
θ
)
\nabla_{\theta} J(\theta)
∇θJ(θ),代入公式
J
(
θ
)
=
J
(
θ
)
−
∇
θ
J
(
θ
)
J(\theta)=J(\theta)-\nabla_{\theta} J(\theta)
J(θ)=J(θ)−∇θJ(θ),使loss不断减小的过程,我们这里关键要求出梯度,教程里面已经给了公式:
∇
θ
J
(
θ
)
=
[
∂
J
(
θ
)
∂
θ
1
∂
J
(
θ
)
∂
θ
2
⋮
∂
J
(
θ
)
∂
θ
n
]
\nabla_{\theta} J(\theta)=\left[\begin{array}{c}{\frac{\partial J(\theta)}{\partial \theta_{1}}} \\ {\frac{\partial J(\theta)}{\partial \theta_{2}}} \\ {\vdots} \\ {\frac{\partial J(\theta)}{\partial \theta_{n}}}\end{array}\right]
∇θJ(θ)=⎣⎢⎢⎢⎢⎡∂θ1∂J(θ)∂θ2∂J(θ)⋮∂θn∂J(θ)⎦⎥⎥⎥⎥⎤
其中对每一个
θ
j
\theta_{j}
θj的偏导数是这样的:
∂
J
(
θ
)
∂
θ
j
=
∑
i
x
j
(
i
)
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
\frac{\partial J(\theta)}{\partial \theta_{j}}=\sum_{i} x_{j}^{(i)}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right)
∂θj∂J(θ)=∑ixj(i)(hθ(x(i))−y(i))
之后练习的话只要我们在linear_regression.m里面把写好的目标函数值赋给 f f f,把梯度赋给 g g g就可以了,在主脚本ex1_linreg.m中把相应向量化写法的注释掉(这个之后会有一个对应的向量化写法,现在还是用循环实现的)就可以运行了,下面是我的linear_regression.m代码:
function [f,g] = linear_regression(theta, X,y)
%
% Arguments:
% theta - A vector containing the parameter values to optimize.14 rows,1 column
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The target value for each example. y(j) is the target for example j.
%
m=size(X,2);
n=size(X,1);
f=0;
g=zeros(size(theta));
%
% TODO: Compute the linear regression objective by looping over the examples in X.
% Store the objective function value in 'f'.
%
% TODO: Compute the gradient of the objective with respect to theta by looping over
% the examples in X and adding up the gradient for each example. Store the
% computed gradient in 'g'.
%%% YOUR CODE HERE %%%
for i=1:m
temp = 0;
for j=1:n
temp = temp + theta(j) * X(j,i);
end
f = f + 0.5 * (temp - y(i))^2;
end
for j=1:n
for i=1:m
temp = 0;
for k=1:n
temp = temp + theta(k) * X(k,i);
end
g(j) = g(j) + X(j,i) * (temp - y(i));
end
end
这是我的训练结果:
与教程中吻合:
(Yours may look slightly different depending on the random choice of training and testing sets.) Typical values for the RMS training and testing error are between 4.5 and 5.
有理解不到位之处,还请指出,有更好的想法,可以在下方评论交流!