《PyTorch | MorvanZhou 》learning notes（上）

bryant_meng

已于 2024-07-04 21:19:40 修改

阅读量472

点赞数 1

分类专栏： PyTorch/Keras/Caffe/TensroFlow 文章标签： pytorch 深度学习 python

于 2019-11-06 16:38:07 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/89666584

版权

PyTorch/Keras/Caffe/TensroFlow 专栏收录该内容

63 篇文章 9 订阅

订阅专栏

在这里插入图片描述

学习资源

PyTorch 动态神经网络
https://pytorch.org/docs/stable/index.html（官方手册）
[深度学习框架]PyTorch常用代码段

后文链接：

《PyTorch | MorvanZhou 》learning notes（下）

1 Why PyTorch?

PyTorch 是 PyTorch 在 Python 上的衍生. 因为 PyTorch 是一个使用 PyTorch 语言的神经网络库, Torch 很好用, 但是 Lua 又不是特别流行, 所有开发团队将 Lua 的 Torch 移植到了更流行的语言 Python 上.

在这里插入图片描述
比较耳熟能详的是，kaiming he 何恺明- FAIR 有些开源的工作用的是 PyTorch，cs231n 的教程是 PyTorch！

安装的方法如下：
进官网 https://pytorch.org/ ，选择你要下载的版本，选择完成后根据系统提示的 command 安装：

在这里插入图片描述

更多历史版本可以参考
https://pytorch.org/get-started/previous-versions/#linux-and-windows-30

eg：pip install torch torchvision

torch 是主模块, 用来搭建神经网络的,
torchvision 是辅模块, 有数据库, 还有一些已经训练好的神经网络等着你直接用, 比如 (VGG, AlexNet, ResNet).

安装成功会提示如下信息！

在这里插入图片描述
查看下版本

在这里插入图片描述

pytorch 和 torchvision 要对应上，否者日后某些功能会报错，下面链接可以查看对应关系

https://github.com/pytorch/vision

在这里插入图片描述

2 Tensor or Numpy

2.1 Tensor 定义和基本属性

Tensor 的属性

torch.tensor(data, dtype=None, device=None, requires_grad=False, pin_memory=False)

data: 数据，可以是list，numpy的ndarray
dtype: 数据类型，默认与data的类型一致
device: 所在设备，gpu/cpu
requires_grad: 是否需要梯度，因为神经网络结构经常会要求梯度
pin_memory: 是否存于锁页内存

在这里插入图片描述

参考从零开始深度学习Pytorch笔记（2）——张量的创建（上）
参考 torch之DataLoader参数pin_memory解析

阻塞和非阻塞通常用来形容多线程间的相互影响。比如一个线程占用临界区资源，那么其它所有需要这个资源的线程就必须在这个临界区中进行等待，等待会导致线程挂起。这种情况就是阻塞。此时，如果占用资源的线程一直不愿意释放资源，那么其它所有阻塞在这个临界区上的线程都不能工作。

非阻塞允许多个线程同时进入临界区

torch.cuda(non_blocking = True)

来自阻塞（Blocking）和非阻塞（Non-Blocking）

tensor = torch.randn(3,4,5)
print(tensor.type())  # 数据类型
print(tensor.size())  # 张量的shape，是个元组
print(tensor.dim())   # 维度的数量

mean, add, sin

# abs
data = [-1,-2,1,2]
tensor = torch.FloatTensor(data)
print('abs numpy:\n',np.abs(data))
print('abs torch:\n',torch.abs(tensor),'\n')
# mean
print('mean numpy:\n',np.mean(data))
print('mean torch:\n',torch.mean(tensor),'\n')
# sin
print('sin numpy:\n',np.sin(data))
print('sin torch:\n',torch.sin(tensor),'\n')

output

abs numpy:
 [1 2 1 2]
abs torch:
 tensor([1., 2., 1., 2.]) 

mean numpy:
 0.0
mean torch:
 tensor(0.) 

sin numpy:
 [-0.84147098 -0.90929743  0.84147098  0.90929743]
sin torch:
 tensor([-0.8415, -0.9093,  0.8415,  0.9093])

matrix multiply

data = np.array([[1,2],[3,4]])
tensor = torch.from_numpy(data)

print('matrix multiply numpy:\n',np.matmul(data,data),'\n')
print('matrix multiply torch:\n',torch.mm(tensor,tensor),'\n')
print('element-wise multiply torch:\n',torch.mul(tensor,tensor),'\n')

tensor1 = torch.from_numpy(data.flatten())
print('dot numpy:\n',data.dot(data),'\n')
print('dot torch:\n',tensor1.dot(tensor1))
# if no flatten()
# RuntimeError: dot: Expected 1-D argument self, but got 2-D

output

matrix multiply numpy:
 [[ 7 10]
 [15 22]] 

matrix multiply torch:
 tensor([[ 7, 10],
        [15, 22]]) 

element-wise multiply torch:
 tensor([[ 1,  4],
        [ 9, 16]]) 

dot numpy:
 [[ 7 10]
 [15 22]] 

dot torch:
 tensor(30)

# Matrix multiplcation: (m*n) * (n*p) * -> (m*p).
result = torch.mm(tensor1, tensor2)

# Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p)
result = torch.bmm(tensor1, tensor2)

# Element-wise multiplication.
result = tensor1 * tensor2

reshape

# 在将卷积层输入全连接层的情况下通常需要对张量做形变处理，
# 相比torch.view，torch.reshape可以自动处理输入张量不连续的情况。
tensor = torch.rand(2,3,4)
shape = (6, 4)
tensor = torch.reshape(tensor, shape)

2.2 Tensor CPU or Tensor GPU

9种 CPU tensor 类型和 9 种 GPU tensor 类型
在这里插入图片描述
数据类型转换

# 设置默认类型，pytorch中的FloatTensor远远快于DoubleTensor
torch.set_default_tensor_type(torch.FloatTensor)

# 类型转换
tensor = tensor.cuda()
tensor = tensor.cpu()
tensor = tensor.float()
tensor = tensor.long()

2.3 Torch to Numpy, Numpy to Torch

import torch
import numpy as np

np_data = np.arange(6).reshape((2,3))
# numpy to torch tensor
torch_data = torch.from_numpy(np_data)
# torch tensor to numpy
tensor2nmupy = torch_data.numpy()

print("np_data:\n",np_data,'\n')
print("torch_data:\n",torch_data,'\n')
print("tensor2nmupy:\n",tensor2nmupy)

output

np_data:
 [[0 1 2]
 [3 4 5]] 

torch_data:
 tensor([[0, 1, 2],
        [3, 4, 5]]) 

tensor2nmupy:
 [[0 1 2]
 [3 4 5]]

注意：从torch.from_numpy 创建的 tensor 和 ndarray 共享内存，当修改其中一个的数据，另外一个也会被修改。

除了 CharTensor，其他所有 CPU 上的张量都支持转换为 numpy 格式然后再转换回来。

ndarray = tensor.cpu().numpy()
tensor = torch.from_numpy(ndarray).float()
tensor = torch.from_numpy(ndarray.copy()).float() # If ndarray has negative stride.

2.4 Tensor 命名

来自 [深度学习框架]PyTorch常用代码段

张量命名是一个非常有用的方法，这样可以方便地使用维度的名字来做索引或其他操作，大大提高了可读性、易用性，防止出错。

# 在PyTorch 1.3之前，需要使用注释
# Tensor[N, C, H, W]
images = torch.randn(32, 3, 56, 56)
images.sum(dim=1)
images.select(dim=1, index=0)

# PyTorch 1.3之后
NCHW = [‘N’, ‘C’, ‘H’, ‘W’]
images = torch.randn(32, 3, 56, 56, names=NCHW)
images.sum('C')
images.select('C', index=0)
# 也可以这么设置
tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W'))
# 使用align_to可以对维度方便地排序
tensor = tensor.align_to('N', 'C', 'H', 'W')

2.5 Torch.tensor to PIL.Image

# pytorch中的张量默认采用[N, C, H, W]的顺序，并且数据范围在[0,1]，需要进行转置和规范化
# torch.Tensor -> PIL.Image
image = PIL.Image.fromarray(torch.clamp(tensor*255, min=0, max=255).byte().permute(1,2,0).cpu().numpy())
image = torchvision.transforms.functional.to_pil_image(tensor)  # Equivalently way

# PIL.Image -> torch.Tensor
path = r'./figure.jpg'
tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))).permute(2,0,1).float() / 255
tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way

np.ndarray 与 PIL.Image 的转换

image = PIL.Image.fromarray(ndarray.astype(np.uint8))
ndarray = np.asarray(PIL.Image.open(path))

3 Variable

后来好像取消了这个功能

在 Torch 中的 Variable 就是一个存放会变化的值的地理位置. 里面的值会不停的变化. 就像一个裝鸡蛋的篮子, 鸡蛋数会不停变动. 那谁是里面的鸡蛋呢, 自然就是 Torch 的 Tensor 咯. 如果用一个 Variable 进行计算, 那返回的也是一个同类型的 Variable.

import torch
import numpy as np
from torch.autograd import Variable

tensor = torch.FloatTensor([[1,2],[3,4]]) # no backpropagation
variable = Variable(tensor,requires_grad=True) # yes backpropagation
print(tensor)
print(variable)

t_out = torch.mean(tensor*tensor)
v_out = torch.mean(variable*variable)
print(t_out)
print(v_out)

output

tensor([[1., 2.],
        [3., 4.]])
tensor([[1., 2.],
        [3., 4.]], requires_grad=True)
tensor(7.5000)
tensor(7.5000, grad_fn=<MeanBackward1>)

Variable 计算时, 它在背景幕布后面一步步默默地搭建着一个庞大的系统, 叫做计算图, computational graph. 这个图是用来干嘛的? 原来是将所有的计算步骤 (节点) 都连接起来, 最后进行误差反向传递的时候, 一次性将所有 variable 里面的修改幅度 (梯度) 都计算出来, 而 tensor 就没有这个能力啦.

v_out.backward()
# v_out = 1/4*sum(variable*variable
# d(v_out) = 1/4*2*variable = bariable/2
print(v_out.grad,'\n') # gradient
print(variable.grad,'\n') # gradient

print(variable,'\n') # Variable
print(variable.data,'\n') # tensor
print(variable.data.numpy()) # numpy

output

None 

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]]) 

tensor([[1., 2.],
        [3., 4.]], requires_grad=True) 

tensor([[1., 2.],
        [3., 4.]]) 

[[1. 2.]
 [3. 4.]]

tensor to variable

import torch
from torch.autograd import Variable
import torch.nn as nn

"tensor to variable"
data = torch.arange(1, 10, 1).float()
data1 = Variable(data, requires_grad=True)
print(data)
print(data1)
"""
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.], requires_grad=True)
"""

variable to tensor

"variable to tensor"
data = torch.arange(1, 10, 1).float()
data1 = Variable(data, requires_grad=True).data
print(data)
print(data1)
"""
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.])
"""

4 Activation Function

import torch
import numpy as np
from torch.autograd import Variable
import torch.nn.functional as F
import matplotlib.pyplot as plt

# fake data
x = torch.linspace(-5,5,200)# x data 
x = Variable(x)
x_np = x.data.numpy() # .data 是 tensor，.data.numpy 是 numpy

y_relu = F.relu(x).data.numpy()
y_sigmoid = F.sigmoid(x).data.numpy()
y_tanh = F.tanh(x).data.numpy()
y_softplus = F.softplus(x).data.numpy()

# plt
plt.figure(1, figsize=(8, 6))
plt.subplot(221)
plt.plot(x_np, y_relu, c='red', label='relu')
plt.ylim((-1, 5))
plt.legend(loc='best')
plt.grid()

plt.subplot(222)
plt.plot(x_np, y_sigmoid, c='red', label='sigmoid')
plt.ylim((-0.2, 1.2))
plt.legend(loc='best')
plt.grid()

plt.subplot(223)
plt.plot(x_np, y_tanh, c='red', label='tanh')
plt.ylim((-1.2, 1.2))
plt.legend(loc='best')
plt.grid()

plt.subplot(224)
plt.plot(x_np, y_softplus, c='red', label='softplus')
plt.ylim((-0.2, 6))
plt.legend(loc='best')
plt.grid()

plt.show()

output
在这里插入图片描述

5 Regression and Classification

5.1 Regression

代码 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/301_regression.py

产生数据集

import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)  # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size())                 # noisy y data (tensor), shape=(100, 1)
plt.scatter(x.data.numpy(), y.data.numpy())
plt.show()

在这里插入图片描述
搭建网络，三层，input，hidden，output

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

net = Net(n_feature=1, n_hidden=10, n_output=1)     # define the network
print(net)  # net architecture

output

Net(
  (hidden): Linear(in_features=1, out_features=10, bias=True)
  (predict): Linear(in_features=10, out_features=1, bias=True)
)

定义优化器和损失

optimizer = torch.optim.SGD(net.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

迭代 200 次，每5次看下结果

plt.ion()   # something about plotting

for t in range(200):
    prediction = net(x)     # input x and predict based on x

    loss = loss_func(prediction, y)     # must be (1. nn output, 2. target)

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

    if t % 5 == 0:
        # plot and show learning process
        plt.cla()
        plt.scatter(x.data.numpy(), y.data.numpy())
        plt.grid()
        plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
        plt.text(0.5, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 20, 'color':  'red'})
        plt.pause(0.1)

plt.ioff()
plt.show()

节选如下
1，25
在这里插入图片描述
50，75

100，200

5.2 Classification

代码 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/302_classification.py

生成数据并可视化出来

import torch
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible
# make fake data
n_data = torch.ones(100, 2)
x0 = torch.normal(2*n_data, 1)      # class0 x data (tensor), shape=(100, 2)
y0 = torch.zeros(100)               # class0 y data (tensor), shape=(100, 1)
x1 = torch.normal(-2*n_data, 1)     # class1 x data (tensor), shape=(100, 2)
y1 = torch.ones(100)                # class1 y data (tensor), shape=(100, 1)
x = torch.cat((x0, x1), 0).type(torch.FloatTensor)  # shape (200, 2) FloatTensor = 32-bit floating
y = torch.cat((y0, y1), ).type(torch.LongTensor)    # shape (200,) LongTensor = 64-bit integer

# The code below is deprecated in Pytorch 0.4. Now, autograd directly supports tensors
# x, y = Variable(x), Variable(y)

plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=y.data.numpy(), s=100, lw=0, cmap='RdYlGn')
plt.grid(ls='--')
plt.show()

在这里插入图片描述
搭建网络，定义 optimizer 和 loss function

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.out = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.out(x)
        return x

net = Net(n_feature=2, n_hidden=10, n_output=2)     # define the network
print(net)  # net architecture

optimizer = torch.optim.SGD(net.parameters(), lr=0.02)
loss_func = torch.nn.CrossEntropyLoss()  # the target label is NOT an one-hotted

output

Net(
  (hidden): Linear(in_features=2, out_features=10, bias=True)
  (out): Linear(in_features=10, out_features=2, bias=True)
)

迭代 100 次，每两次打印下结果

plt.ion()   # something about plotting

for t in range(100):
    out = net(x)                 # input x and predict based on x
    loss = loss_func(out, y)     # must be (1. nn output, 2. target), the target label is NOT one-hotted

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

    if t % 2 == 0:
        # plot and show learning process
        plt.cla()
        prediction = torch.max(out, 1)[1] # 每个输出有两维,max 按 dimension=1 计算。[0]是两个中较大的输出，[1]是索引
        pred_y = prediction.data.numpy()
        target_y = y.data.numpy()
        plt.grid(ls='--')
        plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=pred_y, s=100, lw=0, cmap='RdYlGn')
        accuracy = float((pred_y == target_y).astype(int).sum()) / float(target_y.size)
        plt.text(1.5, -4, 'Accuracy=%.2f' % accuracy, fontdict={'size': 20, 'color':  'red'})
        plt.pause(0.1)

plt.ioff()
plt.show()

节选如下 1，3
在这里插入图片描述

5.3 Sequential

哈哈，上面定义类有点繁琐（like tensorflow），下面用一个比较快的方式（like keras 的 Sequential）
替换

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.out = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.out(x)
        return x

net = Net(n_feature=2, n_hidden=10, n_output=2)     # define the network
print(net)  # net architecture

为

net = torch.nn.Sequential(
    torch.nn.Linear(2, 10),
    torch.nn.ReLU(),
    torch.nn.Linear(10, 2)
)
print(net)  # net architecture

output

Sequential(
  (0): Linear(in_features=2, out_features=10, bias=True)
  (1): ReLU()
  (2): Linear(in_features=10, out_features=2, bias=True)
)

区别于前者的

Net(
  (hidden): Linear(in_features=2, out_features=10, bias=True)
  (out): Linear(in_features=10, out_features=2, bias=True)
)

6 Save and Load

代码来自于 https://github.com/MorvanZhou/PyTorch-Tutorial/blob/master/tutorial-contents/304_save_reload.py

两种保存模型的方式（save）：

保存整个网络 torch.save(net1, 'net.pkl')
仅保存参数 torch.save(net1.state_dict(), 'net_params.pkl')

两种保存方法的载入（load）：

net2 = torch.load('net.pkl')
net3.load_state_dict(torch.load('net_params.pkl'))，注意这个net3 需要把原模型再搭建一遍才可以载入

import torch
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible
# fake data
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)  # x data (tensor), shape=(100, 1)
y = x.pow(2) + 0.2*torch.rand(x.size())  # noisy y data (tensor), shape=(100, 1)

# The code below is deprecated in Pytorch 0.4. Now, autograd directly supports tensors
# x, y = Variable(x, requires_grad=False), Variable(y, requires_grad=False)

def save():
    # save net1
    net1 = torch.nn.Sequential(
        torch.nn.Linear(1, 10),
        torch.nn.ReLU(),
        torch.nn.Linear(10, 1)
    )
    optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)
    loss_func = torch.nn.MSELoss()

    for t in range(100):
        prediction = net1(x)
        loss = loss_func(prediction, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # plot result
    plt.figure(1, figsize=(10, 3))
    plt.subplot(131)
    plt.title('Net1')
    plt.scatter(x.data.numpy(), y.data.numpy())
    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)

    # 2 ways to save the net
    torch.save(net1, 'net.pkl')  # save entire net
    torch.save(net1.state_dict(), 'net_params.pkl')   # save only the parameters


def restore_net():
    # restore entire net1 to net2
    net2 = torch.load('net.pkl')
    prediction = net2(x)

    # plot result
    plt.subplot(132)
    plt.title('Net2')
    plt.scatter(x.data.numpy(), y.data.numpy())
    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)


def restore_params():
    # restore only the parameters in net1 to net3
    net3 = torch.nn.Sequential(
        torch.nn.Linear(1, 10),
        torch.nn.ReLU(),
        torch.nn.Linear(10, 1)
    )

    # copy net1's parameters into net3
    net3.load_state_dict(torch.load('net_params.pkl'))
    prediction = net3(x)

    # plot result
    plt.subplot(133)
    plt.title('Net3')
    plt.scatter(x.data.numpy(), y.data.numpy())
    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)
    plt.show()

# save net1
save()

# restore entire net (may slow)
restore_net()

# restore only the net parameters
restore_params()

output
在这里插入图片描述

7 批训练（DataLoader）

"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou

Dependencies:
torch: 0.1.11
"""
import torch
import torch.utils.data as Data

torch.manual_seed(1)    # reproducible

BATCH_SIZE = 5
# BATCH_SIZE = 8

x = torch.linspace(1, 10, 10)       # this is x data (torch tensor)
y = torch.linspace(10, 1, 10)       # this is y data (torch tensor)

torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    dataset=torch_dataset,      # torch TensorDataset format
    batch_size=BATCH_SIZE,      # mini batch size
    shuffle=True,               # random shuffle for training
    num_workers=2,              # subprocesses for loading data
)


def show_batch():
    for epoch in range(3):   # train entire dataset 3 times
        for step, (batch_x, batch_y) in enumerate(loader):  # for each training step
            # train your data...
            print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',
                  batch_x.numpy(), '| batch y: ', batch_y.numpy())


show_batch()

output

Epoch:  0 | Step:  0 | batch x:  [ 5.  7. 10.  3.  4.] | batch y:  [6. 4. 1. 8. 7.]
Epoch:  0 | Step:  1 | batch x:  [2. 1. 8. 9. 6.] | batch y:  [ 9. 10.  3.  2.  5.]
Epoch:  1 | Step:  0 | batch x:  [ 4.  6.  7. 10.  8.] | batch y:  [7. 5. 4. 1. 3.]
Epoch:  1 | Step:  1 | batch x:  [5. 3. 2. 1. 9.] | batch y:  [ 6.  8.  9. 10.  2.]
Epoch:  2 | Step:  0 | batch x:  [ 4.  2.  5.  6. 10.] | batch y:  [7. 9. 6. 5. 1.]
Epoch:  2 | Step:  1 | batch x:  [3. 9. 1. 8. 7.] | batch y:  [ 8.  2. 10.  3.  4.]

把 shuffle 改成 False

output

Epoch:  0 | Step:  0 | batch x:  [1. 2. 3. 4. 5.] | batch y:  [10.  9.  8.  7.  6.]
Epoch:  0 | Step:  1 | batch x:  [ 6.  7.  8.  9. 10.] | batch y:  [5. 4. 3. 2. 1.]
Epoch:  1 | Step:  0 | batch x:  [1. 2. 3. 4. 5.] | batch y:  [10.  9.  8.  7.  6.]
Epoch:  1 | Step:  1 | batch x:  [ 6.  7.  8.  9. 10.] | batch y:  [5. 4. 3. 2. 1.]
Epoch:  2 | Step:  0 | batch x:  [1. 2. 3. 4. 5.] | batch y:  [10.  9.  8.  7.  6.]
Epoch:  2 | Step:  1 | batch x:  [ 6.  7.  8.  9. 10.] | batch y:  [5. 4. 3. 2. 1.]

把 batch 设置为8，shuffle 设为 False 看看

output

Epoch:  0 | Step:  0 | batch x:  [1. 2. 3. 4. 5. 6. 7. 8.] | batch y:  [10.  9.  8.  7.  6.  5.  4.  3.]
Epoch:  0 | Step:  1 | batch x:  [ 9. 10.] | batch y:  [2. 1.]
Epoch:  1 | Step:  0 | batch x:  [1. 2. 3. 4. 5. 6. 7. 8.] | batch y:  [10.  9.  8.  7.  6.  5.  4.  3.]
Epoch:  1 | Step:  1 | batch x:  [ 9. 10.] | batch y:  [2. 1.]
Epoch:  2 | Step:  0 | batch x:  [1. 2. 3. 4. 5. 6. 7. 8.] | batch y:  [10.  9.  8.  7.  6.  5.  4.  3.]
Epoch:  2 | Step:  1 | batch x:  [ 9. 10.] | batch y:  [2. 1.]

8 Optimizers

SGD
Momentum（惯性）
Adagrad（错误方向给阻力）
RMSprop（部分 Momentum 和 Adagrad 的结合）
Adam（完整 Momentum 和 Adagrad 的结合）

"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou

Dependencies:
torch: 0.4
matplotlib
"""
import torch
import torch.utils.data as Data
import torch.nn.functional as F
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible

LR = 0.01
BATCH_SIZE = 32
EPOCH = 12

# fake dataset
x = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)
y = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))

# plot dataset
plt.scatter(x.numpy(), y.numpy())
plt.show()

# put dateset into torch dataset
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(dataset=torch_dataset, 
                         batch_size=BATCH_SIZE, 
                         shuffle=True, 
                         num_workers=2,)


# default network
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(1, 20)   # hidden layer
        self.predict = torch.nn.Linear(20, 1)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

if __name__ == '__main__':
    # different nets
    net_SGD         = Net()
    net_Momentum    = Net()
    net_Adagrad     = Net()
    net_RMSprop     = Net()
    net_Adam        = Net()
    nets = [net_SGD, net_Momentum, net_Adagrad, net_RMSprop, net_Adam]

    # different optimizers
    opt_SGD         = torch.optim.SGD(net_SGD.parameters(), lr=LR)
    opt_Momentum    = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)
    opt_Adagrad     = torch.optim.Adagrad(net_Adagrad.parameters(),lr=LR)
    opt_RMSprop     = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)
    opt_Adam        = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))
    optimizers = [opt_SGD, opt_Momentum, opt_Adagrad, opt_RMSprop, opt_Adam]

    loss_func = torch.nn.MSELoss()
    losses_his = [[], [], [], [], []]   # record loss

    # training
    for epoch in range(EPOCH):
        print('Epoch: ', epoch)
        for step, (b_x, b_y) in enumerate(loader):          # for each training step
            for net, opt, l_his in zip(nets, optimizers, losses_his):
                output = net(b_x)              # get output for every net
                loss = loss_func(output, b_y)  # compute loss for every net
                opt.zero_grad()                # clear gradients for next train
                loss.backward()                # backpropagation, compute gradients
                opt.step()                     # apply gradients
                l_his.append(loss.data.numpy())     # loss recoder

    labels = ['SGD', 'Momentum', 'Adagrad', 'RMSprop', 'Adam']
    for i, l_his in enumerate(losses_his):
        plt.plot(l_his, label=labels[i])
    plt.legend(loc='best')
    plt.xlabel('Steps')
    plt.ylabel('Loss')
    plt.ylim((0, 0.2))
    #plt.savefig("1.png")
    plt.show()

数据集
在这里插入图片描述
结果

结合图再看看，下面这个

SGD
Momentum（惯性）
Adagrad（错误方向给阻力）
RMSprop（部分 Momentum 和 Adagrad 的结合）
Adam（完整 Momentum 和 Adagrad 的结合）

9 CNN for MNIST（CPU）

"""
View more, visit my tutorial page: https://morvanzhou.github.io/tutorials/
My Youtube Channel: https://www.youtube.com/user/MorvanZhou

Dependencies:
torch: 0.4
torchvision
matplotlib
"""
# library
# standard library
import os

# third-party library
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt

# torch.manual_seed(1)    # reproducible

# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 50
LR = 0.001              # learning rate
DOWNLOAD_MNIST = False


# Mnist digits dataset
if not(os.path.exists('./mnist/')) or not os.listdir('./mnist/'):
    # not mnist dir or mnist is empyt dir
    DOWNLOAD_MNIST = True

train_data = torchvision.datasets.MNIST(
    root='./mnist/',
    train=True,                                     # this is training data
    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,
)

# plot one example
print(train_data.train_data.size())                 # (60000, 28, 28)
print(train_data.train_labels.size())               # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[0])
plt.show()

# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = test_data.test_labels[:2000]


class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(         # input shape (1, 28, 28)
            nn.Conv2d(
                in_channels=1,              # input height
                out_channels=16,            # n_filters
                kernel_size=5,              # filter size
                stride=1,                   # filter movement/step
                padding=2,                  # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
            ),                              # output shape (16, 28, 28)
            nn.ReLU(),                      # activation
            nn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14)
        )
        self.conv2 = nn.Sequential(         # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)
            nn.ReLU(),                      # activation
            nn.MaxPool2d(2),                # output shape (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)   # fully connected layer, output 10 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output, x    # return x for visualization


cnn = CNN()
print(cnn)  # net architecture

optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted

# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('Please install sklearn for layer visualization')
def plot_with_labels(lowDWeights, labels):
    plt.cla()
    X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
    for x, y, s in zip(X, Y, labels):
        c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
    plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)

plt.ion()
# training and testing
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader

        output = cnn(b_x)[0]               # cnn output
        loss = loss_func(output, b_y)   # cross entropy loss
        optimizer.zero_grad()           # clear gradients for this training step
        loss.backward()                 # backpropagation, compute gradients
        optimizer.step()                # apply gradients

        if step % 50 == 0:
            test_output, last_layer = cnn(test_x)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
            if HAS_SK:
                # Visualization of trained flatten layer (T-SNE)
                tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
                plot_only = 500
                low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
                labels = test_y.numpy()[:plot_only]
                plot_with_labels(low_dim_embs, labels)
plt.ioff()

# print 10 predictions from test data
test_output, _ = cnn(test_x[:10])
pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')

output

torch.Size([60000, 28, 28])
torch.Size([60000])

在这里插入图片描述

CNN(
  (conv1): Sequential(
    (0): Conv2d(1, 16, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(16, 32, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (out): Linear(in_features=1568, out_features=10, bias=True)
)

只训练一个 epoch，batch-size 是 50，每隔 50 个 steps 会打印一下 loss，accuracy 以及分类情况，下面只截取第一个 step 和最后一个 step（60000/50/50 = 24 step）在截取的 2000 个样例上的结果（CPU 太慢，就不用所有的测试集了）
在这里插入图片描述

在这里插入图片描述
有 98%的 acc

最后一项输出是打印 10 个预测的结果和真实的 label

[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

CPU 实在是太慢了，下面看看 GPU 的使用

10 CNN for MNIST（GPU）

在上一章的基础上，我们加入了 GPU 的使用，主要是 cuda（数据和模型部分）

model=model.cuda()
x=x.cuda()
y=y.cuda()

所有更改的地方都用 change 注释了，可以体会下差别，主要是 numpy 的冲突

我注释掉了画分类结果图的部分

import os
# third-party library
import torch
import torch.nn as nn
import torch.utils.data as Data
import torchvision
import matplotlib.pyplot as plt

torch.cuda.set_device(1) # change
# torch.manual_seed(1)    # reproducible

# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 50
LR = 0.001              # learning rate
DOWNLOAD_MNIST = False


# Mnist digits dataset
if not(os.path.exists('./mnist/')) or not os.listdir('./mnist/'):
    # not mnist dir or mnist is empyt dir
    DOWNLOAD_MNIST = True

train_data = torchvision.datasets.MNIST(
    root='./mnist/',
    train=True,                                     # this is training data
    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to
                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]
    download=DOWNLOAD_MNIST,
)

# plot one example
print(train_data.train_data.size())                 # (60000, 28, 28)
print(train_data.train_labels.size())               # (60000)
plt.imshow(train_data.train_data[0].numpy(), cmap='gray')
plt.title('%i' % train_data.train_labels[0])
plt.show()

# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)

# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
# !!!!!!!! Change in here !!!!!!!!! #
#test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
#test_y = test_data.test_labels[:2000]
test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000].cuda()/255.   # Tensor on GPU
test_y = test_data.test_labels[:2000].cuda()

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(         # input shape (1, 28, 28)
            nn.Conv2d(
                in_channels=1,              # input height
                out_channels=16,            # n_filters
                kernel_size=5,              # filter size
                stride=1,                   # filter movement/step
                padding=2,                  # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
            ),                              # output shape (16, 28, 28)
            nn.ReLU(),                      # activation
            nn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14)
        )
        self.conv2 = nn.Sequential(         # input shape (16, 14, 14)
            nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)
            nn.ReLU(),                      # activation
            nn.MaxPool2d(2),                # output shape (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)   # fully connected layer, output 10 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = x.view(x.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        output = self.out(x)
        return output, x    # return x for visualization


cnn = CNN()
print(cnn)  # net architecture

# !!!!!!!! Change in here !!!!!!!!! #
cnn.cuda()      # Moves all model parameters and buffers to the GPU.

optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted

# following function (plot_with_labels) is for visualization, can be ignored if not interested
from matplotlib import cm
try: from sklearn.manifold import TSNE; HAS_SK = True
except: HAS_SK = False; print('Please install sklearn for layer visualization')
def plot_with_labels(lowDWeights, labels):
    plt.cla()
    X, Y = lowDWeights[:, 0], lowDWeights[:, 1]
    for x, y, s in zip(X, Y, labels):
        c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)
    plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)

plt.ion()
# training and testing
for epoch in range(EPOCH):
    for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader
        
        # !!!!!!!! Change in here !!!!!!!!! #
        b_x = b_x.cuda()    # Tensor on GPU
        b_y = b_y.cuda()    # Tensor on GPU
        
        output = cnn(b_x)[0]               # cnn output
        loss = loss_func(output, b_y)   # cross entropy loss
        optimizer.zero_grad()           # clear gradients for this training step
        loss.backward()                 # backpropagation, compute gradients
        optimizer.step()                # apply gradients

        if step % 50 == 0:
            test_output, last_layer = cnn(test_x)
            # !!!!!!!! Change in here !!!!!!!!! #
            #pred_y = torch.max(test_output, 1)[1].data.numpy()            
            pred_y = torch.max(test_output, 1)[1].cuda().data  # move the computation in GPU
            #accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
            accuracy = torch.sum(pred_y == test_y).type(torch.FloatTensor) / test_y.size(0)
            #print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.numpy(), '| test accuracy: %.2f' % accuracy)
            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.2f' % accuracy)
#             if HAS_SK:
#                 # Visualization of trained flatten layer (T-SNE)
#                 tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
#                 plot_only = 50 # 改小了一点
#                 # !!!!!!!! Change in here !!!!!!!!! #
#                 #low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])
#                 #labels = test_y.numpy()[:plot_only]
#                 low_dim_embs = tsne.fit_transform(last_layer.data.cpu().numpy()[:plot_only, :])
#                 labels = test_y.data.cpu().numpy()[:plot_only]              
#                 plot_with_labels(low_dim_embs, labels)
plt.ioff()

# print 10 predictions from test data
test_output, _ = cnn(test_x[:10])
# !!!!!!!! Change in here !!!!!!!!! #
#pred_y = torch.max(test_output, 1)[1].data.numpy()
pred_y = torch.max(test_output, 1)[1].cuda().data # move the computation in GPU
print(pred_y, 'prediction number')
# !!!!!!!! Change in here !!!!!!!!! #
#print(test_y[:10].numpy(), 'real number')
print(test_y[:10], 'real number')

看下训练的结果

Epoch:  0 | train loss: 2.2915 | test accuracy: 0.10
Epoch:  0 | train loss: 0.2737 | test accuracy: 0.84
Epoch:  0 | train loss: 0.3125 | test accuracy: 0.90
Epoch:  0 | train loss: 0.1717 | test accuracy: 0.92
Epoch:  0 | train loss: 0.1851 | test accuracy: 0.94
Epoch:  0 | train loss: 0.4500 | test accuracy: 0.94
Epoch:  0 | train loss: 0.0509 | test accuracy: 0.94
Epoch:  0 | train loss: 0.2175 | test accuracy: 0.94
Epoch:  0 | train loss: 0.0376 | test accuracy: 0.95
Epoch:  0 | train loss: 0.0455 | test accuracy: 0.96
Epoch:  0 | train loss: 0.1451 | test accuracy: 0.96
Epoch:  0 | train loss: 0.1239 | test accuracy: 0.97
Epoch:  0 | train loss: 0.2214 | test accuracy: 0.97
Epoch:  0 | train loss: 0.3967 | test accuracy: 0.97
Epoch:  0 | train loss: 0.2205 | test accuracy: 0.97
Epoch:  0 | train loss: 0.1078 | test accuracy: 0.97
Epoch:  0 | train loss: 0.0480 | test accuracy: 0.97
Epoch:  0 | train loss: 0.1128 | test accuracy: 0.97
Epoch:  0 | train loss: 0.0227 | test accuracy: 0.98
Epoch:  0 | train loss: 0.1140 | test accuracy: 0.97
Epoch:  0 | train loss: 0.0608 | test accuracy: 0.96
Epoch:  0 | train loss: 0.0785 | test accuracy: 0.97
Epoch:  0 | train loss: 0.1001 | test accuracy: 0.98
Epoch:  0 | train loss: 0.0149 | test accuracy: 0.98
tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], device='cuda:1') prediction number
tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9], device='cuda:1') real number

推荐阅读：https://github.com/lxztju/pytorch_classification

bryant_meng

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
《PyTorch | MorvanZhou 》learning notes（上）

学习资源PyTorch 动态神经网络后文链接：《PyTorch | MorvanZhou 》learning notes（下）文章目录1 Why PyTorch?2 Torch or Numpy3 Variable4 Activation Function5 Regression and Classification5.1 Regression5.2 Classification5.3 Sequential6 Save and Load7 批训练（DataLoader）8 Optimize
复制链接

扫一扫