前言
记《动手学深度学习》组队学习第二次打卡
打卡内容
线性回归代码实现(基于Pytorch)
理论复习
线性回归理论部分可参考上一篇博客
线性回归模型从零开始的实现
借助jupyter运行代码,方便清晰展示各环节的输出情况。
1. 导入基础模块
In [ ]:
# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
from mpl_toolkits import mplot3d as p3d
import numpy as np
import random
print(torch.__version__)
2. 生成数据集
使用线性模型来生成数据集,生成一个1000个样本的数据集,下面是用来生成数据的线性关系:
p
r
i
c
e
=
w
a
r
e
⋅
a
r
e
a
+
w
a
g
e
⋅
a
g
e
+
b
price=w_{are} \cdot area+w_{age} \cdot age + b
price=ware⋅area+wage⋅age+b
In [ ]:
# set input feature number
num_inputs = 2
# set example number
num_examples = 1000
# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2
features = torch.randn(num_examples, num_inputs,
dtype=torch.float32)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
dtype=torch.float32)
这里随机产生的features是一个
1000
∗
2
1000*2
1000∗2的张量,第一列代表属性
a
r
e
a
area
area,第二列代表属性
a
g
e
age
age;labels对应公式中的
p
r
i
c
e
price
price,是一个
1000
∗
1
1000*1
1000∗1的张量。代码最后在公式计算所得的labels基础上附加了随机扰动,使得数据更有真实性(显然,
p
r
i
c
e
price
price不会只受
a
r
e
a
area
area和
a
g
e
age
age的影响,加入随机干扰相当于加入其它未知因素对
p
r
i
c
e
price
price的影响)。
当然,此处只是简单地举个例子,并未对各属性值做过多的模拟以使其更符合实际情况。(比如:在实际生活中,
a
r
e
a
area
area和
a
g
e
age
age必然是正数,且
a
r
e
are
are和
a
g
e
age
age在数量级上有着较大差距。)
本文重在帮助读者从代码实现角度理解线性回归的过程,在实际项目开发过程中,数据自然有其获取途径。
3. 用图像展示生成的数据
In [ ]: 展示二维散点图,某一列属性与labels的关系示意
plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);
In [ ]: 展示三维散点图,展示两列属性与labels的关系示意
fig = plt.figure()
ax = p3d.Axes3D(fig)
X = features[:, 0].numpy()
Y = features[:, 1].numpy()
Z = labels.numpy()
ax.scatter3D(X, Y, Z);
4. 读取数据集
In [ ]:
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices) # random read 10 samples
for i in range(0, num_examples, batch_size):
j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
yield features.index_select(0, j), labels.index_select(0, j)
In [ ]: 取出10个样本进行查看
batch_size = 10
for X, y in data_iter(batch_size, features, labels):
print(X, '\n', y)
break
5. 初始化模型参数
In [ ]:
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)
w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)
对模型中需要学习的参数进初始化,并开启梯度属性(在优化过程中,需要通过梯度不断迭代更新这两个参数)。
w
w
w为
2
∗
1
2*1
2∗1的张量,相当于
[
w
a
r
e
a
w
a
g
e
]
\begin {bmatrix} w_{area} \\ w_{age} \end {bmatrix}
[wareawage];
b
b
b为标量。
6. 定义模型
定义用来训练参数的训练模型:
p
r
i
c
e
=
w
a
r
e
a
⋅
a
r
e
a
+
w
a
g
e
⋅
a
g
e
+
b
price=w_{area} \cdot area + w_{age} \cdot age + b
price=warea⋅area+wage⋅age+b
In [ ]:
def linreg(X, w, b):
return torch.mm(X, w) + b
7. 定义损失函数
使用均方误差损失函数:
l
(
i
)
(
w
,
b
)
=
1
2
(
y
^
(
i
)
−
y
(
i
)
)
2
l^{(i)}(\bm w, b) = \frac 12 (\hat y^{(i)} - y^{(i)})^2
l(i)(w,b)=21(y^(i)−y(i))2
In [ ]:
def squared_loss(y_hat, y):
return (y_hat - y.view(y_hat.size())) ** 2 / 2
说明:y.view()相当于y.reshape()
8. 定义优化函数
使用小批量随机梯度下降:
(
w
,
b
)
←
(
w
,
b
)
−
η
∣
B
∣
∑
i
∈
B
∂
(
w
,
b
)
l
(
i
)
(
w
,
b
)
(\bm w, b) \leftarrow (\bm w, b) - \frac {\eta} {|B|} \sum_{i \in B} \partial_{(\bm w, b)}l^{(i)}(\bm w, b)
(w,b)←(w,b)−∣B∣ηi∈B∑∂(w,b)l(i)(w,b)
In [ ]:
def sgd(params, lr, batch_size):
for param in params:
param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track
9. 训练
当数据集、模型、损失函数和优化函数定义完了之后就可来准备进行模型的训练了。
In [ ]:
# super parameters init
lr = 0.03
num_epochs = 5
net = linreg
loss = squared_loss
# training
for epoch in range(num_epochs): # training repeats num_epochs times
# in each epoch, all the samples in dataset will be used once
# X is the feature and y is the label of a batch sample
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y).sum()
# calculate the gradient of batch sample loss
l.backward()
# using small batch random gradient descent to iter model parameters
sgd([w, b], lr, batch_size)
# reset parameter gradient
w.grad.data.zero_()
b.grad.data.zero_()
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))
In [ ]: 简单展示一下,训练后得到的参数与真实参数
w, true_w, b, true_b
以上,便完成了从零实现线性回归模型。
当然,我们也可以借助PyTorch来实现线性回归模型
1. 导入基础模块
In [ ]:
import torch
from torch import nn
import numpy as np
torch.manual_seed(1)
print(torch.__version__)
torch.set_default_tensor_type('torch.FloatTensor')
2. 生成数据集
In [ ]:
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)
3. 读取数据集
In [ ]:
import torch.utils.data as Data
batch_size = 10
# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)
# put dataset into DataLoader
data_iter = Data.DataLoader(
dataset=dataset, # torch TensorDataset format
batch_size=batch_size, # mini batch size
shuffle=True, # whether shuffle the data or not
num_workers=2, # read data in multithreading
)
In [ ]: 取出10个样本进行查看
for X, y in data_iter:
print(X, '\n', y)
break
4. 定义模型
In [ ]:
class LinearNet(nn.Module):
def __init__(self, n_feature):
super(LinearNet, self).__init__() # call father function to init
self.linear = nn.Linear(n_feature, 1) # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`
def forward(self, x):
y = self.linear(x)
return y
net = LinearNet(num_inputs)
print(net)
5. 初始化模型参数
In [ ]:
from torch.nn import init
init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0) # or you can use `net[0].bias.data.fill_(0)` to modify it directly
In [ ]: 查看网络参数
for param in net.parameters():
print(param)
6. 定义损失函数
In [ ]:
loss = nn.MSELoss() # nn built-in squared loss function
# function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`
7. 定义优化函数
In [ ]:
import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=0.03) # built-in random gradient descent function
print(optimizer) # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)`
8. 训练
In [ ]:
num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
output = net(X)
l = loss(output, y.view(-1, 1))
optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
l.backward()
optimizer.step()
print('epoch %d, loss: %f' % (epoch, l.item()))
In [ ]: 简单展示一下,训练后得到的参数与真实参数
# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)
两种实现方式的比较
- 从零开始的实现(推荐用来学习)
能够更好的理解模型和神经网络底层的原理 - 使用PyTorch的简洁实现
能够更加快速地完成模型的设计与实现