一、理论介绍
给定数据集
D
D
D = {
(
x
1
,
y
1
)
\displaystyle ( x_{1} ,y_{1})
(x1,y1),
(
x
2
,
y
2
)
\displaystyle ( x_{2} ,y_{2})
(x2,y2),
(
x
3
,
y
3
)
\displaystyle ( x_{3} ,y_{3})
(x3,y3),…,
(
x
m
,
y
m
)
\displaystyle ( x_{m} ,y_{m})
(xm,ym)},线性回归希望能够求出函数f(x),使
y
1
y_{1}
y1与
f
(
x
)
=
w
x
i
+
b
f(x) = wx_{i}+b
f(x)=wxi+b足够接近。因样本点一致,所以模型的训练目的是学习出左右的w,b。使得每个样本的点
y
1
y_{1}
y1与
f
(
x
)
f(x)
f(x)差值最小。
1、定义一维线性函数
f
(
x
i
)
=
w
x
i
+
b
f(x_{i}) = wx_{i}+b
f(xi)=wxi+b
2、定义损失函数
(
w
∗
,
b
∗
)
=
a
r
g
m
i
n
w
,
b
∑
i
=
1
m
(
f
(
x
i
)
−
y
i
)
2
=
a
r
g
m
i
n
w
,
b
∑
i
=
1
m
(
y
i
−
w
x
i
−
b
)
2
\displaystyle \left( w^{*} ,b^{*}\right) =\underset{w,b}{arg\ min}\sum ^{m}_{i=1}( f( x_{i}) -y_{i})^{2} =\underset{w,b}{arg\ min}\sum ^{m}_{i=1}( y_{i} -wx_{i} -b)^{2}
(w∗,b∗)=w,barg mini=1∑m(f(xi)−yi)2=w,barg mini=1∑m(yi−wxi−b)2
3、求出是损失函数最小的参数w,b(导数等于0的点是极小值点)
∂
L
o
s
s
w
,
b
∂
w
=
2
(
w
∑
i
=
1
m
x
i
2
−
∑
i
=
1
m
(
y
i
−
b
)
x
i
)
=
0
\displaystyle \frac{\partial Loss_{w,b}}{\partial w} =2\left( w\sum ^{m}_{i=1} x^{2}_{i} -\sum ^{m}_{i=1}( y_{i} -b) x_{i}\right) =0
∂w∂Lossw,b=2(wi=1∑mxi2−i=1∑m(yi−b)xi)=0
∂
L
o
s
s
w
,
b
∂
b
=
2
(
m
b
−
∑
i
=
1
m
(
y
i
−
w
x
i
)
)
=
0
\displaystyle \frac{\partial Loss_{w,b}}{\partial b} =2\left( mb-\sum ^{m}_{i=1}( y_{i} -wx_{i})\right) =0
∂b∂Lossw,b=2(mb−i=1∑m(yi−wxi))=0
一元函数,可以直接通过导数等于0,求出极小值点,不需要进行随机梯度下降法。
有上两个式子联立求得
w
w
w和
b
b
b的最优解:
w
=
∑
i
=
1
m
y
i
(
x
i
−
x
‾
)
∑
i
=
1
m
x
i
2
−
1
m
(
∑
i
=
1
m
x
i
)
2
\displaystyle w\ =\ \frac{\sum ^{m}_{i=1} y_{i}\left( x_{i} -\overline{x}\right)}{\sum ^{m}_{i=1} x^{2}_{i} -\frac{1}{m}\left(\sum ^{m}_{i=1} x_{i}\right)^{2}}
w = ∑i=1mxi2−m1(∑i=1mxi)2∑i=1myi(xi−x)
b
=
1
m
∑
i
=
1
m
(
y
i
−
w
x
i
)
\displaystyle b\ =\ \frac{1}{m}\sum ^{m}_{i=1}( y_{i} -wx_{i})
b = m1i=1∑m(yi−wxi)
二、代码实现
import torch
import matplotlib.pyplot as plt
import torch.optim as optim
from torch.autograd import Variable
import torch.nn as nn
import numpy as np
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)
y = 3*x + 10 + torch.rand(x.size()) #构造y= 3x+10的数据集,同时加上torch.rand()从[0, 1)的均匀分布中抽取的一组随机数作为噪音
#定义模型
class LinearRegression(nn.Module):
def __init__(self):
super(LinearRegression,self).__init__()
self.linear = nn.Linear(1,1) #input and output is 1 dimension
def forward(self,x):
output = self.linear(x)
return output
if torch.cuda.is_available():
model = LinearRegression().cuda()
else:
model = LinearRegression()
#定义损失和优化函数
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(),lr=1e-3)
#模型训练
num_epoch = 3000
for epoch in range(num_epoch):
if torch.cuda.is_available():
inputs = Variable(x).cuda()
target = Variable(y).cuda()
else:
inputs = Variable(x)
target = Variable(y)
# 向前传播
out = model(inputs)
loss = criterion(out,target)
#b反向求导
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 20 == 0:
print(f'Epoch[{epoch+1}/{num_epoch}],loss:{loss.item():.6f}')
import matplotlib.pyplot as plt
model.eval()
if torch.cuda.is_available():
predict = model(Variable(x).cuda())
predict = predict.data.cpu().numpy()
else:
predict = model(Variable(x))
plt.plot(x.numpy(),y.numpy(),'ro',label='Original data')
plt.plot(x.numpy(),predict,label='Fitting Line')
plt.show()
拟合结果如下图:
参考文献:本文为《深度学习之Pytorch》读书笔记