数据:
假设线性模型
y
^
=
x
∗
w
\hat{y}=x*w
y^=x∗w,那么对w值进行预测:
在这个过程中,对权重w进行更新,即
w
=
w
−
α
∂
c
o
s
t
∂
ω
w=w-\alpha \frac{\partial cost}{\partial \omega}
w=w−α∂ω∂cost。
公式推导:
∂
cost
(
ω
)
∂
ω
=
∂
∂
ω
1
N
∑
n
=
1
N
(
x
n
⋅
ω
−
y
n
)
2
=
1
N
∑
n
=
1
N
∂
∂
ω
(
x
n
⋅
ω
−
y
n
)
2
=
1
N
∑
n
=
1
N
2
⋅
(
x
n
⋅
ω
−
y
n
)
∂
(
x
n
⋅
ω
−
y
n
)
∂
ω
=
1
N
∑
n
=
1
N
2
⋅
x
n
⋅
(
x
n
⋅
ω
−
y
n
)
\begin{aligned} \frac{\partial \operatorname{cost}(\omega)}{\partial \omega} &=\frac{\partial}{\partial \omega} \frac{1}{N} \sum_{n=1}^{N}\left(x_{n} \cdot \omega-y_{n}\right)^{2} \\ &=\frac{1}{N} \sum_{n=1}^{N} \frac{\partial}{\partial \omega}\left(x_{n} \cdot \omega-y_{n}\right)^{2} \\ &=\frac{1}{N} \sum_{n=1}^{N} 2 \cdot\left(x_{n} \cdot \omega-y_{n}\right) \frac{\partial\left(x_{n} \cdot \omega-y_{n}\right)}{\partial \omega} \\ &=\frac{1}{N} \sum_{n=1}^{N} 2 \cdot x_{n} \cdot\left(x_{n} \cdot \omega-y_{n}\right) \end{aligned}
∂ω∂cost(ω)=∂ω∂N1n=1∑N(xn⋅ω−yn)2=N1n=1∑N∂ω∂(xn⋅ω−yn)2=N1n=1∑N2⋅(xn⋅ω−yn)∂ω∂(xn⋅ω−yn)=N1n=1∑N2⋅xn⋅(xn⋅ω−yn)
于是,可以得到:
ω
=
ω
−
α
1
N
∑
n
=
1
N
2
⋅
x
n
⋅
(
x
n
⋅
ω
−
y
n
)
\omega=\omega-\alpha \frac{1}{N} \sum_{n=1}^{N} 2 \cdot x_{n} \cdot\left(x_{n} \cdot \omega-y_{n}\right)
ω=ω−αN1∑n=1N2⋅xn⋅(xn⋅ω−yn)
实例:
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]#数据
y_data = [2.0, 4.0, 6.0]
w = 1.0#初始化权重
def forward(x):#定义模型
return x * w
def cost(xs,ys):#定义损失函数
cost=0
for x,y in zip(xs,ys):
y_pred=forward(x)
cost+=(y_pred-y)**2
return cost/len(xs)
def gradient(xs,ys):#定义梯度函数
grad=0
for x,y in zip(xs,ys):
grad+=2*x*(x*w-y)
return grad/len(xs)
print("Predict before training",4,forward(4))
for epoch in range(100):#训练100次
cost_val = cost(x_data,y_data)
grad_val = gradient(x_data,y_data)
w -= 0.01*grad_val
print("Epoch:",epoch,'w=',w,"loss=",cost_val)
print('Predict after training',4,forward(4))
训练结果:
随机梯度下降:
实例:
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]#数据
y_data = [2.0, 4.0, 6.0]
w = 1.0#初始化权重
def forward(x):#定义模型
return x * w
def loss(x,y):#计算损失函数
y_pred = forward(x)
return (y_pred - y) ** 2
def gradient(x,y):#定义损失函数导数
return 2 * x * (x * w - y)
print('Predict before training',4,forward(4))
for epoch in range(100):#按每个梯度更新权重训练数据
for x,y in zip(x_data,y_data):
grad = gradient(x,y)
w = w - 0.01 * grad
print("\tgrad:",x,y,grad)
l = loss(x,y)
print("progress:",epoch,"w=",w,"loss=",l)
print('Predict after training',4,forward(4))
结果: