学习资源
- PyTorch 动态神经网络
- https://pytorch.org/docs/stable/index.html(官方手册)
- [深度学习框架]PyTorch常用代码段
后文链接:
文章目录
1 Why PyTorch?
PyTorch 是 PyTorch 在 Python 上的衍生. 因为 PyTorch 是一个使用 PyTorch 语言的神经网络库, Torch 很好用, 但是 Lua 又不是特别流行, 所有开发团队将 Lua 的 Torch 移植到了更流行的语言 Python 上.
比较耳熟能详的是,kaiming he 何恺明- FAIR 有些开源的工作用的是 PyTorch,cs231n 的教程是 PyTorch!
安装的方法如下:
进官网 https://pytorch.org/ ,选择你要下载的版本,选择完成后根据系统提示的 command 安装:
更多历史版本可以参考
https://pytorch.org/get-started/previous-versions/#linux-and-windows-30
eg:pip install torch torchvision
- torch 是主模块, 用来搭建神经网络的,
- torchvision 是辅模块, 有数据库, 还有一些已经训练好的神经网络等着你直接用, 比如 (VGG, AlexNet, ResNet).
安装成功会提示如下信息!
查看下版本
pytorch 和 torchvision 要对应上,否者日后某些功能会报错,下面链接可以查看对应关系
https://github.com/pytorch/vision
2 Tensor or Numpy
2.1 Tensor 定义和基本属性
Tensor 的属性
torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)
-
data: 数据,可以是list,numpy的ndarray
-
dtype: 数据类型,默认与data的类型一致
-
device: 所在设备,gpu/cpu
-
requires_grad: 是否需要梯度,因为神经网络结构经常会要求梯度
-
pin_memory: 是否存于锁页内存
参考 从零开始深度学习Pytorch笔记(2)——张量的创建(上)
参考 torch之DataLoader参数pin_memory解析
阻塞和非阻塞通常用来形容多线程间的相互影响。比如一个线程占用临界区资源,那么其它所有需要这个资源的线程就必须在这个临界区中进行等待,等待会导致线程挂起。这种情况就是阻塞。此时,如果占用资源的线程一直不愿意释放资源,那么其它所有阻塞在这个临界区上的线程都不能工作。
非阻塞允许多个线程同时进入临界区
torch.cuda(non_blocking = True)
来自 阻塞(Blocking)和非阻塞(Non-Blocking)
tensor = torch.randn(3,4,5)
print(tensor.type()) # 数据类型
print(tensor.size()) # 张量的shape,是个元组
print(tensor.dim()) # 维度的数量
mean, add, sin
# abs
data = [-1,-2,1,2]
tensor = torch.FloatTensor(data)
print('abs numpy:\n',np.abs(data))
print('abs torch:\n',torch.abs(tensor),'\n')
# mean
print('mean numpy:\n',np.mean(data))
print('mean torch:\n',torch.mean(tensor),'\n')
# sin
print('sin numpy:\n',np.sin(data))
print('sin torch:\n',torch.sin(tensor),'\n')
output
abs numpy:
[1 2 1 2]
abs torch:
tensor([1., 2., 1., 2.])
mean numpy:
0.0
mean torch:
tensor(0.)
sin numpy:
[-0.84147098 -0.90929743 0.84147098 0.90929743]
sin torch:
tensor([-0.8415, -0.9093, 0.8415, 0.9093])
matrix multiply
data = np.array([[1,2],[3,4]])
tensor = torch.from_numpy(data)
print('matrix multiply numpy:\n',np.matmul(data,data),'\n')
print('matrix multiply torch:\n',torch.mm(tensor,tensor),'\n')
print('element-wise multiply torch:\n',torch.mul(tensor,tensor),'\n')
tensor1 = torch.from_numpy(data.flatten())
print('dot numpy:\n',data.dot(data),'\n')
print('dot torch:\n',tensor1.dot(tensor1))
# if no flatten()
# RuntimeError: dot: Expected 1-D argument self, but got 2-D
output
matrix multiply numpy:
[[ 7 10]
[15 22]]
matrix multiply torch:
tensor([[ 7, 10],
[15, 22]])
element-wise multiply torch:
tensor([[ 1, 4],
[ 9, 16]])
dot numpy:
[[ 7 10]
[15 22]]
dot torch:
tensor(30)
# Matrix multiplcation: (m*n) * (n*p) * -> (m*p).
result = torch.mm(tensor1, tensor2)
# Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p)
result = torch.bmm(tensor1, tensor2)
# Element-wise multiplication.
result = tensor1 * tensor2
reshape
# 在将卷积层输入全连接层的情况下通常需要对张量做形变处理,
# 相比torch.view,torch.reshape可以自动处理输入张量不连续的情况。
tensor = torch.rand(2,3,4)
shape = (6, 4)
tensor = torch.reshape(tensor, shape)
2.2 Tensor CPU or Tensor GPU
9种 CPU tensor 类型和 9 种 GPU tensor 类型
数据类型转换
# 设置默认类型,pytorch中的FloatTensor远远快于DoubleTensor
torch.set_default_tensor_type(torch.FloatTensor)
# 类型转换
tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()
2.3 Torch to Numpy, Numpy to Torch
import torch
import numpy as np
np_data = np.arange(6).reshape((2,3))
# numpy to torch tensor
torch_data = torch.from_numpy(np_data)
# torch tensor to numpy
tensor2nmupy = torch_data.numpy()
print("np_data:\n",np_data,'\n')
print("torch_data:\n",torch_data,'\n')
print("tensor2nmupy:\n",tensor2nmupy)
output
np_data:
[[0 1 2]
[3 4 5]]
torch_data:
tensor([[0, 1, 2],
[3, 4, 5]])
tensor2nmupy:
[[0 1 2]
[3 4 5]]
注意:从torch.from_numpy
创建的 tensor 和 ndarray 共享内存,当修改其中一个的数据,另外一个也会被修改。
除了 CharTensor,其他所有 CPU 上的张量都支持转换为 numpy 格式然后再转换回来。
ndarray = tensor.cpu().numpy()
tensor = torch.from_numpy(ndarray).float()
tensor = torch.from_numpy(ndarray.copy()).float() # If ndarray has negative stride.
2.4 Tensor 命名
张量命名是一个非常有用的方法,这样可以方便地使用维度的名字来做索引或其他操作,大大提高了可读性、易用性,防止出错。
# 在PyTorch 1.3之前,需要使用注释
# Tensor[N, C, H, W]
images = torch.randn(32, 3, 56, 56)
images.sum(dim=1)
images.select(dim=1, index=0)
# PyTorch 1.3之后
NCHW = [‘N’, ‘C’, ‘H’, ‘W’]
images = torch.randn(32, 3, 56, 56, names=NCHW)
images.sum('C')
images.select('C', index=0)
# 也可以这么设置
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))
# 使用align_to可以对维度方便地排序
tensor = tensor.align_to('N', 'C', 'H', 'W')
2.5 Torch.tensor to PIL.Image
# pytorch中的张量默认采用[N, C, H, W]的顺序,并且数据范围在[0,1],需要进行转置和规范化
# torch.Tensor -> PIL.Image
image = PIL.Image.fromarray(torch.clamp(tensor*255, min=0, max=255).byte().permute(1,2,0).cpu().numpy())
image = torchvision.transforms.functional.to_pil_image(tensor) # Equivalently way
# PIL.Image -> torch.Tensor
path = r'./figure.jpg'
tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))).permute(2,0,1).float() / 255
tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way
np.ndarray 与 PIL.Image 的转换
image = PIL.Image.fromarray(ndarray.astype(np.uint8))
ndarray = np.asarray(PIL.Image.open(path))
3 Variable
后来好像取消了这个功能
在 Torch 中的 Variable 就是一个存放会变化的值的地理位置. 里面的值会不停的变化. 就像一个裝鸡蛋的篮子, 鸡蛋数会不停变动. 那谁是里面的鸡蛋呢, 自然就是 Torch 的 Tensor 咯. 如果用一个 Variable 进行计算, 那返回的也是一个同类型的 Variable.
import torch
import numpy as np
from torch.autograd import Variable
tensor = torch.FloatTensor([[1,2],[3,4]]) # no backpropagation
variable = Variable(tensor,requires_grad=True) # yes backpropagation
print(tensor)
print(variable)
t_out = torch.mean(tensor*tensor)
v_out = torch.mean(variable*variable)
print(t_out)
print(v_out)
output
tensor([[1., 2.],
[3., 4.]])
tensor([[1., 2.],
[3., 4.]], requires_grad=True)
tensor(7.5000)
tensor(7.5000, grad_fn=<MeanBackward1>)
Variable 计算时, 它在背景幕布后面一步步默默地搭建着一个庞大的系统, 叫做计算图, computational graph. 这个图是用来干嘛的? 原来是将所有的计算步骤 (节点) 都连接起来, 最后进行误差反向传递的时候, 一次性将所有 variable 里面的修改幅度 (梯度) 都计算出来, 而 tensor 就没有这个能力啦.
v_out.backward()
# v_out = 1/4*sum(variable*variable
# d(v_out) = 1/4*2*variable = bariable/2
print(v_out.grad,'\n') # gradient
print(variable.grad,'\n') # gradient
print(variable,'\n') # Variable
print(variable.data,'\n') # tensor
print(variable.data.numpy()) # numpy
output
None
tensor([[0.5000, 1.0000],
[1.5000, 2.0000]])
tensor([[1., 2.],
[3., 4.]], requires_grad=True)
tensor([[1., 2.],
[3., 4.]])
[[1. 2.]
[3. 4.]]
tensor to variable
import torch
from torch.autograd import Variable
import torch.nn as nn
"tensor to variable"
data = torch.arange(1, 10, 1).float()
data1 = Variable(data, requires_grad=True)
print(data)
print(data1)
"""
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.], requires_grad=True)
"""
variable to tensor
"variable to tensor"
data = torch.arange(1, 10, 1).float()
data1 = Variable(data, requires_grad=True).data
print(data)
print(data1)
"""
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
"""
4 Activation Function
import torch
import numpy as np
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt
# fake data
x = torch.linspace(-5,5,200)# x data
x = Variable(x)
x_np = x.data.numpy() # .data 是 tensor,.data.numpy 是 numpy
y_relu = F.relu(x).data.numpy()
y_sigmoid = F.sigmoid(x).data.numpy()
y_tanh = F.tanh(x).data.numpy()
y_softplus = F.softplus(x).data.numpy()
# plt
plt.figure(1, figsize=(8, 6))
plt.subplot(221)
plt.plot(x_np, y_relu, c='red', label='relu')
plt.ylim((-1, 5))
plt.legend(loc='best')
plt.grid()
plt.subplot(222)
plt.plot(x_np, y_sigmoid, c='red', label='sigmoid')
plt.ylim((-0.2, 1.2))
plt.legend(loc='best')
plt.grid()
plt.subplot(223)
plt.plot(x_np, y_tanh, c='red', label='tanh')
plt.ylim((-1.2, 1.2))
plt.legend(loc='best')
plt.grid()
plt.subplot(224)
plt.plot(x_np, y_softplus, c='red', label='softplus')
plt.ylim((-0.2, 6))
plt.legend(loc='best')
plt.grid()
plt.show()
output
5 Regression and Classification
5.1 Regression
代码 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/301_regression.py
产生数据集
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1)
plt.scatter(x.data.numpy(), y.data.numpy())
plt.show()
搭建网络,三层,input,hidden,output
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.predict = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
net = Net(n_feature=1, n_hidden=10, n_output=1) # define the network
print(net) # net architecture
output
Net(
(hidden): Linear(in_features=1, out_features=10, bias=True)
(predict): Linear(in_features=10, out_features=1, bias=True)
)
定义优化器和损失
optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss() # this is for regression mean squared loss
迭代 200 次,每5次看下结果
plt.ion() # something about plotting
for t in range(200):
prediction = net(x) # input x and predict based on x
loss = loss_func(prediction, y) # must be (1. nn output, 2. target)
optimizer.zero_grad() # clear gradients for next train
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
if t % 5 == 0:
# plot and show learning process
plt.cla()
plt.scatter(x.data.numpy(), y.data.numpy())
plt.grid()
plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 20, 'color': 'red'})
plt.pause(0.1)
plt.ioff()
plt.show()
节选如下
1,25
50,75
100,200
5.2 Classification
代码 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/302_classification.py
生成数据并可视化出来
import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
# make fake data
n_data = torch.ones(100, 2)
x0 = torch.normal(2*n_data, 1) # class0 x data (tensor), shape=(100, 2)
y0 = torch.zeros(100) # class0 y data (tensor), shape=(100, 1)
x1 = torch.normal(-2*n_data, 1) # class1 x data (tensor), shape=(100, 2)
y1 = torch.ones(100) # class1 y data (tensor), shape=(100, 1)
x = torch.cat((x0, x1), 0).type(torch.FloatTensor) # shape (200, 2) FloatTensor = 32-bit floating
y = torch.cat((y0, y1), ).type(torch.LongTensor) # shape (200,) LongTensor = 64-bit integer
# The code below is deprecated in Pytorch 0.4. Now, autograd directly supports tensors
# x, y = Variable(x), Variable(y)
plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=y.data.numpy(), s=100, lw=0, cmap='RdYlGn')
plt.grid(ls='--')
plt.show()
搭建网络,定义 optimizer 和 loss function
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.out = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.out(x)
return x
net = Net(n_feature=2, n_hidden=10, n_output=2) # define the network
print(net) # net architecture
optimizer = torch.optim.SGD(net.parameters(), lr=0.02)
loss_func = torch.nn.CrossEntropyLoss() # the target label is NOT an one-hotted
output
Net(
(hidden): Linear(in_features=2, out_features=10, bias=True)
(out): Linear(in_features=10, out_features=2, bias=True)
)
迭代 100 次,每两次打印下结果
plt.ion() # something about plotting
for t in range(100):
out = net(x) # input x and predict based on x
loss = loss_func(out, y) # must be (1. nn output, 2. target), the target label is NOT one-hotted
optimizer.zero_grad() # clear gradients for next train
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
if t % 2 == 0:
# plot and show learning process
plt.cla()
prediction = torch.max(out, 1)[1] # 每个输出有两维,max 按 dimension=1 计算。[0]是两个中较大的输出,[1]是索引
pred_y = prediction.data.numpy()
target_y = y.data.numpy()
plt.grid(ls='--')
plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=pred_y, s=100, lw=0, cmap='RdYlGn')
accuracy = float((pred_y == target_y).astype(int).sum()) / float(target_y.size)
plt.text(1.5, -4, 'Accuracy=%.2f' % accuracy, fontdict={'size': 20, 'color': 'red'})
plt.pause(0.1)
plt.ioff()
plt.show()
节选如下 1,3
5.3 Sequential
哈哈,上面定义类有点繁琐(like tensorflow),下面用一个比较快的方式(like keras 的 Sequential)
替换
class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.out = torch.nn.Linear(n_hidden, n_output) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.out(x)
return x
net = Net(n_feature=2, n_hidden=10, n_output=2) # define the network
print(net) # net architecture
为
net = torch.nn.Sequential(
torch.nn.Linear(2, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 2)
)
print(net) # net architecture
output
Sequential(
(0): Linear(in_features=2, out_features=10, bias=True)
(1): ReLU()
(2): Linear(in_features=10, out_features=2, bias=True)
)
区别于前者的
Net(
(hidden): Linear(in_features=2, out_features=10, bias=True)
(out): Linear(in_features=10, out_features=2, bias=True)
)
6 Save and Load
代码来自于 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/304_save_reload.py
两种保存模型的方式(save):
- 保存整个网络
torch.save(net1, 'net.pkl')
- 仅保存参数
torch.save(net1.state_dict(), 'net_params.pkl')
两种保存方法的载入(load):
net2 = torch.load('net.pkl')
net3.load_state_dict(torch.load('net_params.pkl'))
,注意这个net3 需要把原模型再搭建一遍才可以载入
import torch
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
# fake data
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1) # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size()) # noisy y data (tensor), shape=(100, 1)
# The code below is deprecated in Pytorch 0.4. Now, autograd directly supports tensors
# x, y = Variable(x, requires_grad=False), Variable(y, requires_grad=False)
def save():
# save net1
net1 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)
loss_func = torch.nn.MSELoss()
for t in range(100):
prediction = net1(x)
loss = loss_func(prediction, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# plot result
plt.figure(1, figsize=(10, 3))
plt.subplot(131)
plt.title('Net1')
plt.scatter(x.data.numpy(), y.data.numpy())
plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
# 2 ways to save the net
torch.save(net1, 'net.pkl') # save entire net
torch.save(net1.state_dict(), 'net_params.pkl') # save only the parameters
def restore_net():
# restore entire net1 to net2
net2 = torch.load('net.pkl')
prediction = net2(x)
# plot result
plt.subplot(132)
plt.title('Net2')
plt.scatter(x.data.numpy(), y.data.numpy())
plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
def restore_params():
# restore only the parameters in net1 to net3
net3 = torch.nn.Sequential(
torch.nn.Linear(1, 10),
torch.nn.ReLU(),
torch.nn.Linear(10, 1)
)
# copy net1's parameters into net3
net3.load_state_dict(torch.load('net_params.pkl'))
prediction = net3(x)
# plot result
plt.subplot(133)
plt.title('Net3')
plt.scatter(x.data.numpy(), y.data.numpy())
plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
plt.show()
# save net1
save()
# restore entire net (may slow)
restore_net()
# restore only the net parameters
restore_params()
output
7 批训练(DataLoader)
"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou
Dependencies:
torch: 0.1.11
"""
import torch
import torch.utils.data as Data
torch.manual_seed(1) # reproducible
BATCH_SIZE = 5
# BATCH_SIZE = 8
x = torch.linspace(1, 10, 10) # this is x data (torch tensor)
y = torch.linspace(10, 1, 10) # this is y data (torch tensor)
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
dataset=torch_dataset, # torch TensorDataset format
batch_size=BATCH_SIZE, # mini batch size
shuffle=True, # random shuffle for training
num_workers=2, # subprocesses for loading data
)
def show_batch():
for epoch in range(3): # train entire dataset 3 times
for step, (batch_x, batch_y) in enumerate(loader): # for each training step
# train your data...
print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
batch_x.numpy(), '| batch y: ', batch_y.numpy())
show_batch()
output
Epoch: 0 | Step: 0 | batch x: [ 5. 7. 10. 3. 4.] | batch y: [6. 4. 1. 8. 7.]
Epoch: 0 | Step: 1 | batch x: [2. 1. 8. 9. 6.] | batch y: [ 9. 10. 3. 2. 5.]
Epoch: 1 | Step: 0 | batch x: [ 4. 6. 7. 10. 8.] | batch y: [7. 5. 4. 1. 3.]
Epoch: 1 | Step: 1 | batch x: [5. 3. 2. 1. 9.] | batch y: [ 6. 8. 9. 10. 2.]
Epoch: 2 | Step: 0 | batch x: [ 4. 2. 5. 6. 10.] | batch y: [7. 9. 6. 5. 1.]
Epoch: 2 | Step: 1 | batch x: [3. 9. 1. 8. 7.] | batch y: [ 8. 2. 10. 3. 4.]
把 shuffle
改成 False
output
Epoch: 0 | Step: 0 | batch x: [1. 2. 3. 4. 5.] | batch y: [10. 9. 8. 7. 6.]
Epoch: 0 | Step: 1 | batch x: [ 6. 7. 8. 9. 10.] | batch y: [5. 4. 3. 2. 1.]
Epoch: 1 | Step: 0 | batch x: [1. 2. 3. 4. 5.] | batch y: [10. 9. 8. 7. 6.]
Epoch: 1 | Step: 1 | batch x: [ 6. 7. 8. 9. 10.] | batch y: [5. 4. 3. 2. 1.]
Epoch: 2 | Step: 0 | batch x: [1. 2. 3. 4. 5.] | batch y: [10. 9. 8. 7. 6.]
Epoch: 2 | Step: 1 | batch x: [ 6. 7. 8. 9. 10.] | batch y: [5. 4. 3. 2. 1.]
把 batch
设置为8,shuffle
设为 False
看看
output
Epoch: 0 | Step: 0 | batch x: [1. 2. 3. 4. 5. 6. 7. 8.] | batch y: [10. 9. 8. 7. 6. 5. 4. 3.]
Epoch: 0 | Step: 1 | batch x: [ 9. 10.] | batch y: [2. 1.]
Epoch: 1 | Step: 0 | batch x: [1. 2. 3. 4. 5. 6. 7. 8.] | batch y: [10. 9. 8. 7. 6. 5. 4. 3.]
Epoch: 1 | Step: 1 | batch x: [ 9. 10.] | batch y: [2. 1.]
Epoch: 2 | Step: 0 | batch x: [1. 2. 3. 4. 5. 6. 7. 8.] | batch y: [10. 9. 8. 7. 6. 5. 4. 3.]
Epoch: 2 | Step: 1 | batch x: [ 9. 10.] | batch y: [2. 1.]
8 Optimizers
SGD
Momentum(惯性)
Adagrad(错误方向给阻力)
RMSprop(部分 Momentum 和 Adagrad 的结合)
Adam(完整 Momentum 和 Adagrad 的结合)
"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou
Dependencies:
torch: 0.4
matplotlib
"""
import torch
import torch.utils.data as Data
import torch.nn.functional as F
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
LR = 0.01
BATCH_SIZE = 32
EPOCH = 12
# fake dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))
# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()
# put dateset into torch dataset
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(dataset=torch_dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=2,)
# default network
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.hidden = torch.nn.Linear(1, 20) # hidden layer
self.predict = torch.nn.Linear(20, 1) # output layer
def forward(self, x):
x = F.relu(self.hidden(x)) # activation function for hidden layer
x = self.predict(x) # linear output
return x
if __name__ == '__main__':
# different nets
net_SGD = Net()
net_Momentum = Net()
net_Adagrad = Net()
net_RMSprop = Net()
net_Adam = Net()
nets = [net_SGD, net_Momentum, net_Adagrad, net_RMSprop, net_Adam]
# different optimizers
opt_SGD = torch.optim.SGD(net_SGD.parameters(), lr=LR)
opt_Momentum = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
opt_Adagrad = torch.optim.Adagrad(net_Adagrad.parameters(),lr=LR)
opt_RMSprop = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
opt_Adam = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
optimizers = [opt_SGD, opt_Momentum, opt_Adagrad, opt_RMSprop, opt_Adam]
loss_func = torch.nn.MSELoss()
losses_his = [[], [], [], [], []] # record loss
# training
for epoch in range(EPOCH):
print('Epoch: ', epoch)
for step, (b_x, b_y) in enumerate(loader): # for each training step
for net, opt, l_his in zip(nets, optimizers, losses_his):
output = net(b_x) # get output for every net
loss = loss_func(output, b_y) # compute loss for every net
opt.zero_grad() # clear gradients for next train
loss.backward() # backpropagation, compute gradients
opt.step() # apply gradients
l_his.append(loss.data.numpy()) # loss recoder
labels = ['SGD', 'Momentum', 'Adagrad', 'RMSprop', 'Adam']
for i, l_his in enumerate(losses_his):
plt.plot(l_his, label=labels[i])
plt.legend(loc='best')
plt.xlabel('Steps')
plt.ylabel('Loss')
plt.ylim((0, 0.2))
#plt.savefig("1.png")
plt.show()
数据集
结果
结合图再看看,下面这个
SGD
Momentum(惯性)
Adagrad(错误方向给阻力)
RMSprop(部分 Momentum 和 Adagrad 的结合)
Adam(完整 Momentum 和 Adagrad 的结合)
9 CNN for MNIST(CPU)
"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou
Dependencies:
torch: 0.4
torchvision
matplotlib
"""
# library
# standard library
import os
# third-party library
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
# torch.manual_seed(1) # reproducible
# Hyper Parameters
EPOCH = 1 # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 50
LR = 0.001 # learning rate
DOWNLOAD_MNIST = False
# Mnist digits dataset
if not(os.path.exists('./mnist/')) or not os.listdir('./mnist/'):
# not mnist dir or mnist is empyt dir
DOWNLOAD_MNIST = True
train_data = torchvision.datasets.MNIST(
root='./mnist/',
train=True, # this is training data
transform=torchvision.transforms.ToTensor(), # Converts a PIL.Image or numpy.ndarray to
# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
download=DOWNLOAD_MNIST,
)
# plot one example
print(train_data.train_data.size()) # (60000, 28, 28)
print(train_data.train_labels.size()) # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[0])
plt.show()
# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = test_data.test_labels[:2000]
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 14, 14)
)
self.conv2 = nn.Sequential( # input shape (16, 14, 14)
nn.Conv2d(16, 32, 5, 1, 2), # output shape (32, 14, 14)
nn.ReLU(), # activation
nn.MaxPool2d(2), # output shape (32, 7, 7)
)
self.out = nn.Linear(32 * 7 * 7, 10) # fully connected layer, output 10 classes
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
output = self.out(x)
return output, x # return x for visualization
cnn = CNN()
print(cnn) # net architecture
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted
# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('Please install sklearn for layer visualization')
def plot_with_labels(lowDWeights, labels):
plt.cla()
X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
for x, y, s in zip(X, Y, labels):
c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)
plt.ion()
# training and testing
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader
output = cnn(b_x)[0] # cnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
if step % 50 == 0:
test_output, last_layer = cnn(test_x)
pred_y = torch.max(test_output, 1)[1].data.numpy()
accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
if HAS_SK:
# Visualization of trained flatten layer (T-SNE)
tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
plot_only = 500
low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
labels = test_y.numpy()[:plot_only]
plot_with_labels(low_dim_embs, labels)
plt.ioff()
# print 10 predictions from test data
test_output, _ = cnn(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')
output
torch.Size([60000, 28, 28])
torch.Size([60000])
CNN(
(conv1): Sequential(
(0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv2): Sequential(
(0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(1): ReLU()
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(out): Linear(in_features=1568, out_features=10, bias=True)
)
只训练一个 epoch,batch-size 是 50,每隔 50 个 steps 会打印一下 loss,accuracy 以及分类情况,下面只截取第一个 step 和最后一个 step(60000/50/50 = 24 step)在截取的 2000 个样例上的结果(CPU 太慢,就不用所有的测试集了)
有 98%的 acc
最后一项输出是打印 10 个预测的结果和真实的 label
[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
CPU 实在是太慢了,下面看看 GPU 的使用
10 CNN for MNIST(GPU)
在上一章的基础上,我们加入了 GPU 的使用,主要是 cuda(数据和模型部分)
model=model.cuda()
x=x.cuda()
y=y.cuda()
所有更改的地方都用 change
注释了,可以体会下差别,主要是 numpy 的冲突
我注释掉了画分类结果图的部分
import os
# third-party library
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt
torch.cuda.set_device(1) # change
# torch.manual_seed(1) # reproducible
# Hyper Parameters
EPOCH = 1 # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 50
LR = 0.001 # learning rate
DOWNLOAD_MNIST = False
# Mnist digits dataset
if not(os.path.exists('./mnist/')) or not os.listdir('./mnist/'):
# not mnist dir or mnist is empyt dir
DOWNLOAD_MNIST = True
train_data = torchvision.datasets.MNIST(
root='./mnist/',
train=True, # this is training data
transform=torchvision.transforms.ToTensor(), # Converts a PIL.Image or numpy.ndarray to
# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
download=DOWNLOAD_MNIST,
)
# plot one example
print(train_data.train_data.size()) # (60000, 28, 28)
print(train_data.train_labels.size()) # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[0])
plt.show()
# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
# !!!!!!!! Change in here !!!!!!!!! #
#test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
#test_y = test_data.test_labels[:2000]
test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000].cuda()/255. # Tensor on GPU
test_y = test_data.test_labels[:2000].cuda()
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 14, 14)
)
self.conv2 = nn.Sequential( # input shape (16, 14, 14)
nn.Conv2d(16, 32, 5, 1, 2), # output shape (32, 14, 14)
nn.ReLU(), # activation
nn.MaxPool2d(2), # output shape (32, 7, 7)
)
self.out = nn.Linear(32 * 7 * 7, 10) # fully connected layer, output 10 classes
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
output = self.out(x)
return output, x # return x for visualization
cnn = CNN()
print(cnn) # net architecture
# !!!!!!!! Change in here !!!!!!!!! #
cnn.cuda() # Moves all model parameters and buffers to the GPU.
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted
# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('Please install sklearn for layer visualization')
def plot_with_labels(lowDWeights, labels):
plt.cla()
X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
for x, y, s in zip(X, Y, labels):
c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)
plt.ion()
# training and testing
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader
# !!!!!!!! Change in here !!!!!!!!! #
b_x = b_x.cuda() # Tensor on GPU
b_y = b_y.cuda() # Tensor on GPU
output = cnn(b_x)[0] # cnn output
loss = loss_func(output, b_y) # cross entropy loss
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
optimizer.step() # apply gradients
if step % 50 == 0:
test_output, last_layer = cnn(test_x)
# !!!!!!!! Change in here !!!!!!!!! #
#pred_y = torch.max(test_output, 1)[1].data.numpy()
pred_y = torch.max(test_output, 1)[1].cuda().data # move the computation in GPU
#accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
accuracy = torch.sum(pred_y == test_y).type(torch.FloatTensor) / test_y.size(0)
#print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.2f' % accuracy)
# if HAS_SK:
# # Visualization of trained flatten layer (T-SNE)
# tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
# plot_only = 50 # 改小了一点
# # !!!!!!!! Change in here !!!!!!!!! #
# #low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
# #labels = test_y.numpy()[:plot_only]
# low_dim_embs = tsne.fit_transform(last_layer.data.cpu().numpy()[:plot_only, :])
# labels = test_y.data.cpu().numpy()[:plot_only]
# plot_with_labels(low_dim_embs, labels)
plt.ioff()
# print 10 predictions from test data
test_output, _ = cnn(test_x[:10])
# !!!!!!!! Change in here !!!!!!!!! #
#pred_y = torch.max(test_output, 1)[1].data.numpy()
pred_y = torch.max(test_output, 1)[1].cuda().data # move the computation in GPU
print(pred_y, 'prediction number')
# !!!!!!!! Change in here !!!!!!!!! #
#print(test_y[:10].numpy(), 'real number')
print(test_y[:10], 'real number')
看下训练的结果
Epoch: 0 | train loss: 2.2915 | test accuracy: 0.10
Epoch: 0 | train loss: 0.2737 | test accuracy: 0.84
Epoch: 0 | train loss: 0.3125 | test accuracy: 0.90
Epoch: 0 | train loss: 0.1717 | test accuracy: 0.92
Epoch: 0 | train loss: 0.1851 | test accuracy: 0.94
Epoch: 0 | train loss: 0.4500 | test accuracy: 0.94
Epoch: 0 | train loss: 0.0509 | test accuracy: 0.94
Epoch: 0 | train loss: 0.2175 | test accuracy: 0.94
Epoch: 0 | train loss: 0.0376 | test accuracy: 0.95
Epoch: 0 | train loss: 0.0455 | test accuracy: 0.96
Epoch: 0 | train loss: 0.1451 | test accuracy: 0.96
Epoch: 0 | train loss: 0.1239 | test accuracy: 0.97
Epoch: 0 | train loss: 0.2214 | test accuracy: 0.97
Epoch: 0 | train loss: 0.3967 | test accuracy: 0.97
Epoch: 0 | train loss: 0.2205 | test accuracy: 0.97
Epoch: 0 | train loss: 0.1078 | test accuracy: 0.97
Epoch: 0 | train loss: 0.0480 | test accuracy: 0.97
Epoch: 0 | train loss: 0.1128 | test accuracy: 0.97
Epoch: 0 | train loss: 0.0227 | test accuracy: 0.98
Epoch: 0 | train loss: 0.1140 | test accuracy: 0.97
Epoch: 0 | train loss: 0.0608 | test accuracy: 0.96
Epoch: 0 | train loss: 0.0785 | test accuracy: 0.97
Epoch: 0 | train loss: 0.1001 | test accuracy: 0.98
Epoch: 0 | train loss: 0.0149 | test accuracy: 0.98
tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], device='cuda:1') prediction number
tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], device='cuda:1') real number
推荐阅读:https://github.com/lxztju/pytorch_classification