数据生成
通过下式产生数据,作为训练集
y
=
4
x
4
+
3
x
3
+
2
x
2
+
x
+
1
+
r
a
n
d
y=4x^4+3x^3+2x^2+x+1+rand
y=4x4+3x3+2x2+x+1+rand
加入0~1之间的随机数使数据更加接近实际数据
假设
我们假设一个函数用于拟合上述数据
h
(
x
)
=
θ
0
+
θ
1
x
+
θ
2
x
2
+
θ
3
x
3
+
θ
4
x
4
h(x)=\theta_0 +\theta_1x+\theta_2x^2+\theta_3x^3+\theta_4x^4
h(x)=θ0+θ1x+θ2x2+θ3x3+θ4x4
令
Θ
=
[
θ
0
,
θ
1
,
θ
2
,
θ
3
,
θ
4
]
\Theta = \begin {matrix} [\theta_0 ,\theta_1,\theta_2,\theta_3,\theta_4] \end{matrix}
Θ=[θ0,θ1,θ2,θ3,θ4]
代价函数
J
(
Θ
)
=
1
2
m
∑
i
=
1
m
(
h
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\Theta)=\frac {1}{2m}\sum^{m}_{i=1}(h(x^{(i)}) - y^{(i)})^2
J(Θ)=2m1i=1∑m(h(x(i))−y(i))2
m
m
m为数据个数
梯度下降法
θ
i
k
=
θ
i
k
−
1
−
α
∂
J
(
Θ
)
∂
θ
i
\theta_i ^k= \theta_i^{k-1} - \alpha\frac {\partial J(\Theta)} {\partial \theta_i}
θik=θik−1−α∂θi∂J(Θ)
α
\alpha
α为学习率,用于控制梯度下降得快慢
这个式子表示新的
θ
i
\theta_i
θi等于上一个
θ
i
\theta_i
θi减去
α
∂
J
(
Θ
)
∂
θ
i
\alpha\frac {\partial J(\Theta)} {\partial \theta_i}
α∂θi∂J(Θ)使得
θ
i
\theta_i
θi始终朝着使代价函数
J
(
Θ
)
J(\Theta)
J(Θ)下降得方向变化
使用梯度下降法循环100次,或者使
J
(
Θ
)
J(\Theta)
J(Θ)的值下降到可以容许的误差范围内
∂
J
(
Θ
)
∂
θ
i
\frac {\partial J(\Theta)} {\partial \theta_i}
∂θi∂J(Θ)的偏导结果
∂
J
(
Θ
)
∂
θ
0
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
1
)
\frac {\partial J(\Theta)} {\partial \theta_0}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot 1)
∂θ0∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅1)
∂
J
(
Θ
)
∂
θ
1
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
)
\frac {\partial J(\Theta)} {\partial \theta_1}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x)
∂θ1∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x)
∂
J
(
Θ
)
∂
θ
2
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
2
)
\frac {\partial J(\Theta)} {\partial \theta_2}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x^2)
∂θ2∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x2)
∂
J
(
Θ
)
∂
θ
3
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
3
)
\frac {\partial J(\Theta)} {\partial \theta_3}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x^3)
∂θ3∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x3)
∂
J
(
Θ
)
∂
θ
4
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
4
)
\frac {\partial J(\Theta)} {\partial \theta_4}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x^4)
∂θ4∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x4)
代码
%数据的产生,并加入噪声
close all;
x = 0:0.01:1;
y = 4*x.^4+3*x.^3+2*x.^2+x+1+rand(1,length(x));
figure(1);
hold on;
plot(x,y,'r');
%假设函数为h(x) = theata0 + theata1*x+ theata2*x^2 + theata3*x^4
theata = 10*rand(1,5);%随机产生系数
alpha = 1.2;%学习率
for i=1:100%迭代100次
%假设函数
h_theata = theata(1)*ones(1,length(x))+theata(2)*x+theata(3)*x.^2+theata(4)*x.^3+theata(5)*x.^4;
plot(x,h_theata,'y');
var(i) = 1/length(x)*sum((h_theata - y).^2);%方差
%梯度下降法迭代求解新系数,代价函数1/2m *sum((h(x) - y)^2)
theata(1) = theata(1) - alpha*1/length(x)*sum(h_theata - y);
theata(2) = theata(2) - alpha*1/length(x)*sum((h_theata - y).*x);
theata(3) = theata(3) - alpha*1/length(x)*sum((h_theata - y).*x.^2);
theata(4) = theata(4) - alpha*1/length(x)*sum((h_theata - y).*x.^3);
theata(5) = theata(5) - alpha*1/length(x)*sum((h_theata - y).*x.^4);
end
%最终图形
h_theata = theata(1)*ones(1,length(x))+theata(2)*x+theata(3)*x.^2+theata(4)*x.^3+theata(5)*x.^4;
plot(x,h_theata,'b');
title('数据变化');
hold off;
%代价函数1/2m *sum((h(x) - y)^2)的变化曲线
figure(2);
plot(1:100,var);
title('代价函数大小变化');
图中红线为训练集数据的原始图线,黄色为拟合过程中图线的变化,蓝色为最终拟合曲线
在拟合过程中方差或者说代价函数的值的大小变化