数据生成
通过下式产生数据,作为训练集
y
=
1
+
r
a
n
d
+
4
∗
x
1
+
3
∗
x
2
+
2
∗
x
3
+
x
4
y = 1+rand+4*x_1+3*x_2+2*x_3+x_4
y=1+rand+4∗x1+3∗x2+2∗x3+x4
加入0~1之间的随机数使数据更加接近实际数据
假设
我们假设一个函数用于拟合上述数据
h
(
x
)
=
θ
0
+
θ
1
x
+
θ
2
x
2
+
θ
3
x
3
+
θ
4
x
4
h(x)=\theta_0 +\theta_1x+\theta_2x_2+\theta_3x_3+\theta_4x_4
h(x)=θ0+θ1x+θ2x2+θ3x3+θ4x4
令
Θ
=
[
θ
0
,
θ
1
,
θ
2
,
θ
3
,
θ
4
]
X
=
[
o
n
e
,
x
1
,
x
2
,
x
3
,
x
4
]
T
h
(
x
)
=
Θ
∗
X
\Theta = \begin {matrix} [\theta_0 ,\theta_1,\theta_2,\theta_3,\theta_4] \end{matrix} \\ X = \begin {matrix} [one ,x_1,x_2,x_3,x_4] ^T\end{matrix}\\ h(x)=\Theta*X
Θ=[θ0,θ1,θ2,θ3,θ4]X=[one,x1,x2,x3,x4]Th(x)=Θ∗X
代价函数
J
(
Θ
)
=
1
2
m
∑
i
=
1
m
(
h
(
x
(
i
)
)
−
y
(
i
)
)
2
J(\Theta)=\frac {1}{2m}\sum^{m}_{i=1}(h(x^{(i)}) - y^{(i)})^2
J(Θ)=2m1i=1∑m(h(x(i))−y(i))2
m
m
m为数据个数
梯度下降法
θ
i
k
=
θ
i
k
−
1
−
α
∂
J
(
Θ
)
∂
θ
i
\theta_i ^k= \theta_i^{k-1} - \alpha\frac {\partial J(\Theta)} {\partial \theta_i}
θik=θik−1−α∂θi∂J(Θ)
α
\alpha
α为学习率,用于控制梯度下降得快慢
这个式子表示新的
θ
i
\theta_i
θi等于上一个
θ
i
\theta_i
θi减去
α
∂
J
(
Θ
)
∂
θ
i
\alpha\frac {\partial J(\Theta)} {\partial \theta_i}
α∂θi∂J(Θ)使得
θ
i
\theta_i
θi始终朝着使代价函数
J
(
Θ
)
J(\Theta)
J(Θ)下降得方向变化
使用梯度下降法循环100次,或者使
J
(
Θ
)
J(\Theta)
J(Θ)的值下降到可以容许的误差范围内
∂
J
(
Θ
)
∂
θ
i
\frac {\partial J(\Theta)} {\partial \theta_i}
∂θi∂J(Θ)的偏导结果
∂
J
(
Θ
)
∂
θ
0
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
1
)
\frac {\partial J(\Theta)} {\partial \theta_0}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot 1)
∂θ0∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅1)
∂
J
(
Θ
)
∂
θ
1
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
1
)
\frac {\partial J(\Theta)} {\partial \theta_1}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x_1)
∂θ1∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x1)
∂
J
(
Θ
)
∂
θ
2
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
2
)
\frac {\partial J(\Theta)} {\partial \theta_2}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x_2)
∂θ2∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x2)
∂
J
(
Θ
)
∂
θ
3
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
3
)
\frac {\partial J(\Theta)} {\partial \theta_3}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x_3)
∂θ3∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x3)
∂
J
(
Θ
)
∂
θ
4
=
1
m
∑
i
=
1
m
(
(
h
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
4
)
\frac {\partial J(\Theta)} {\partial \theta_4}=\frac {1}{m}\sum^{m}_{i=1}((h(x^{(i)}) - y^{(i)})\cdot x_4)
∂θ4∂J(Θ)=m1∑i=1m((h(x(i))−y(i))⋅x4)
令
D
=
h
x
(
i
)
)
−
y
(
i
)
D=hx^{(i)})- y^{(i)}
D=hx(i))−y(i)
Θ
k
=
Θ
k
−
1
−
α
1
m
∑
i
=
1
m
(
D
.
∗
X
)
\Theta^{k}=\Theta^{k-1}-\alpha \frac{1}{m}\sum_{i=1}^m(D.*X)
Θk=Θk−1−αm1∑i=1m(D.∗X)
代码
%数据的产生,并加入噪声
close all;
clear;
x1 = 0:0.01:1;
x2 = 0:0.01:1;
x3 = 0:0.01:1;
x4 = 0:0.01:1;
%x的数据最大与最小之间倍数不宜过大,否则梯度下降法会失效引起代价函数迅猛增加
m=length(x1);
y = 1+rand(1,m)+4*x1+3*x2+2*x3+x4;
%假设函数为h(x) = theata0 + theata1*x1+ theata2*x2 + theata3*x3+theata4*x4;
theata = rand(1,5);%随机产生系数
alpha = 0.1;%学习率
X=[ones(1,m);x1;x2;x3;x4];
var = zeros(1,10);
for i=1:100%迭代1000次
h_theata = theata*X;%假设函数
d=h_theata - y;%假设函数与原始数据间的差
var(i) = 1/(2*m)*sum((d).^2);%方差
%梯度下降法迭代求解新系数,代价函数1/2m *sum((h(x) - y)^2)
theata = theata -alpha*1/m*sum(transpose(d.*X));
end
%代价函数1/2m *sum((h(x) - y)^2)的变化曲线
plot(1:i,var);
title('代价函数大小变化');