文章目录
摘要
在前段时间,入门学习了有关pytorch的相关安装和简单入门使用后,这周补充学习了pytorch中的数据集CIFAR10的使用并以此为数据集进行神经网络构建的相关学习,学会了通过调用pytorch中的有关卷积、池化、非线性、线性等包训练图片。
Abstract
After learning the installation and simple use of pytorch, I learned the use of data set CIFAR10 in pytorch this week and learned the construction of neural network based on it. I learned to train pictures by calling packages in pytorch about convolution, pooling, nonlinear, linear, etc.
1 torch 和 torchvision
torch 和 torchvision 是 PyTorch 深度学习框架提供的两个重要库:
- torch: torch 库是 PyTorch 的核心库,提供了张量操作、神经网络构建、自动微分等功能。主要功能包括:
- 张量操作:提供了张量(多维数组)的创建、操作和运算,支持 GPU 加速计算。
- 神经网络构建:包括各种类型的神经网络层、激活函数和优化器,方便用户构建和训练神经网络模型。
- 自动微分:PyTorch 的自动微分引擎可以自动计算张量的梯度,支持反向传播算法进行模型训练。
- torchvision: torchvision 库是 PyTorch 的计算机视觉库,提供了一系列用于图像处理和计算机视觉任务的工具和数据集。主要功能包括:
- 数据加载:提供了常用的图像数据集(如 ImageNet、CIFAR-10 等)的加载和预处理功能
- 图像变换:包括各种图像变换操作,如裁剪、缩放、旋转等,用于数据增强和预处理。
- 模型库:包含了一些经典的计算机视觉模型(如 ResNet、AlexNet 等),方便用户进行图像分类、目标检测等任务。
torch 提供了深度学习框架的核心功能,而 torchvision 则是专注于图像处理和计算机视觉任务的扩展库。这两个库结合起来可以帮助用户更轻松地构建和训练深度学习模型,并在计算机视觉任务中取得良好的表现。
1.1 查看CIFAR10数据集内容
import torchvision
train_set = torchvision.datasets.CIFAR10(root="./dataset2", train=True, transform=dataset_transform, download=True)
test_set = torchvision.datasets.CIFAR10(root="./dataset2", train=False, transform=dataset_transform, download=True)
# 查看第一个数据
print(test_set[0])
# 输出为(<PIL.Image.Image image mode=RGB size=32x32 at 0x20CC5155A60>, 3)可知第一个为img,第二个为标签的target
# 查看测试集中有什么类型的图片
print(test_set.classes) # ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
img, target = test_set[0] # 分别获得图片和target
print(img) # <PIL.Image.Image image mode=RGB size=32x32 at 0x20CC5155AF0>
print(target) # 3
print(test_set.classes[target]) # cat 即classes[3]
img.show()
Tensorboard查看内容
import torchvision
from torch.utils.tensorboard import SummaryWriter
dataset_transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor()
])
train_set = torchvision.datasets.CIFAR10(root="./dataset2", train=True, transform=dataset_transform, download=True)
test_set = torchvision.datasets.CIFAR10(root="./dataset2", train=False, transform=dataset_transform, download=True)
writer = SummaryWriter("P10")
for i in range(10):
img, target = test_set[i]
writer.add_image("test_set", img, i)
writer.close()
tensorboard --logdir=“P10”
1.2 Dataloader的使用
dataset 相当于有一副扑克牌,dataloader 则是从这副扑克牌中拿牌,设定一次拿几张牌等
查看dataloader的用法
torch.utils.data.DataLoader
import torchvision
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
test_data = torchvision.datasets.CIFAR10("./dataset2", train=False, transform=torchvision.transforms.ToTensor())
test_loader = DataLoader(test_data, batch_size=64, shuffle=True, num_workers=0, drop_last=False)
writer = SummaryWriter("dataloader")
step = 0
for data in test_loader:
imgs, targets = data
writer.add_images("test_data_drop_last", imgs, step)
step = step+1
writer.close()
batch_size表示每次取多少个数据
shuffle 表示每一次取这么完整的数据时候打乱
drop_last 表示最后一个step若取不到batch_size大小的数据时,是否舍弃
for epoch in range(2):
step = 0
for data in test_loader:
imgs, targets = data
writer.add_images("Epoch:{}".format(epoch), imgs, step)
step = step+1
2 神经网络的构建
2.1 神经网络的基本骨架
import torch
from torch import nn
class Cc(nn.Module):
# 继承父类的初始化
def __init__(self):
super(Cc, self).__init__()
#重写forward函数
def forward(self, input):
output = input + 1
return output
cxc = Cc()
x = torch.tensor(1.0)
output = cxc(x)
print(output)
2.2 卷积层原理
2.2.1 卷积基本原理
卷积运算包括输入图像和卷积核,设定步长stride
- 卷积核在输入图像上向右滑动stride格,到达边界后再向下滑动stride格
- 重复操作1至,图片的纵向最边界
代码示例:
import torch
import torch.nn.functional as F
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]])
kernel = torch.tensor([[1, 2, 1],
[0, 1, 0],
[2, 1, 0]])
print(input.shape) # torch.Size([5, 5])
print(kernel.shape) # torch.Size([3, 3])
# 要求的输入格式为
# input– input tensor of shape(minibatch,in_channels,iH,iW)
input = torch.reshape(input, (1, 1, 5, 5))
kernel = torch.reshape(kernel, (1, 1, 3, 3))
print(input.shape) # torch.Size([1, 1, 5, 5])
print(kernel.shape) # torch.Size([1, 1, 3, 3])
output = F.conv2d(input, kernel, stride=1) # tensor([[[[10, 12, 12],
print(output) # [18, 16, 16],
# [13, 9, 3]]]])
output = F.conv2d(input, kernel, stride=2) # tensor([[[[10, 12],
print(output) # [13, 3]]]])
2.2.2 padding
代码实现:
import torch
import torch.nn.functional as F
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]])
kernel = torch.tensor([[1, 2, 1],
[0, 1, 0],
[2, 1, 0]])
output = F.conv2d(input, kernel, stride=1, padding=1) # tensor([[[[ 1, 3, 4, 10, 8],
print(output) # [ 5, 10, 12, 12, 6],
# [ 7, 18, 16, 16, 8],
# [11, 13, 9, 3, 4],
# [14, 13, 9, 7, 4]]]])
2.3 构建一个卷积神经网络
卷积结果计算公式:
长度: H 2 = H 1 − F H + 2 P S + 1 长度:H_2=\frac{H_1-F_H+2P}{S}+1 长度:H2=SH1−FH+2P+1
宽度: W 2 = W 1 − F W + 2 P S + 1 宽度:W_2=\frac{W_1-F_W+2P}{S}+1 宽度:W2=SW1−FW+2P+1
其中,H1和W1表示输入的高度宽度;H1和W2表示输出特征图的高度宽度;F表示卷积核的长和宽的大小,S表示滑动窗口的步长,P表示边界填充的(填了几圈0)
官方给出的计算公式(加上了dilation的属性参与计算)
关于dilation属性:控制点之间的距离,默认是1,如果大于1,则该运算又称为扩张卷积运算。
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader
from torch.nn import Conv2d
# 创建数据
from torch.utils.tensorboard import SummaryWriter
test_data = torchvision.datasets.CIFAR10("dataset2", train=False,
transform=torchvision.transforms.ToTensor(),
download=True)
# 设置batch_size
data_loader = DataLoader(test_data, batch_size=64)
# 构建神经网络
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)
def forward(self, x):
x = self.conv1(x)
return x
# 初始化该神经网络
cnn = CNN()
print(cnn)
writer = SummaryWriter("data_conv2d")
step = 0
for data in data_loader:
imgs, targets = data
output = cnn(imgs)
print(imgs.shape) # torch.Size([64, 3, 32, 32])
print(output.shape) # torch.Size([64, 6, 30, 30])
writer.add_images("input", imgs, step)
# 6channel的图片无法进行显示 将[64, 6, 30, 30]->[64, 3, 30, 30]
output = torch.reshape(output, (-1, 3, 30, 30))
writer.add_images("output", output, step)
step = step+1
writer.close()
2.4 池化层
作用: 保留输入的特征,并减少数据量
import torch
import torchvision
from torch.nn import MaxPool2d
import torch.nn as nn
input = torch.tensor([[1, 2, 0, 3, 1],
[0, 1, 2, 3, 1],
[1, 2, 1, 0, 0],
[5, 2, 3, 1, 1],
[2, 1, 0, 1, 1]], dtype=torch.float32)
input = torch.reshape(input, (-1, 1, 5, 5))
class MaxPool(nn.Module):
def __init__(self):
super(MaxPool, self).__init__()
self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True)
def forward(self, input):
output = self.maxpool1(input)
return output
mp = MaxPool()
output = mp(input)
# tensor([[[[2., 3.],
# [5., 1.]]]])
print(output)
以图片集为例:
import torch
import torchvision
from torch.nn import MaxPool2d
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
# 初始化数据集
dataset = torchvision.datasets.CIFAR10("dataset2", train=False, transform=torchvision.transforms.ToTensor())
data_loader = DataLoader(dataset, batch_size=64,)
class MaxPool(nn.Module):
def __init__(self):
super(MaxPool, self).__init__()
self.maxpool1 = MaxPool2d(kernel_size=3, ceil_mode=True)
def forward(self, input):
output = self.maxpool1(input)
return output
mp = MaxPool()
writer = SummaryWriter("maxPool")
step = 0
for data in data_loader:
imgs, target = data
output = mp(imgs)
writer.add_images("input", imgs, step)
writer.add_images("maxpool", output, step)
step = step+1
writer.close()
2.5 非线性激活
作用: 为神经网络引入非线性限制
2.5.1 RELU的使用
对于Relu,输入小于0的数,被截为0,对于大于0的数,取原数值
关于inplace参数:
- 若为true,则输入变量的值也会被替换
- 若为false,则输入变量的值保持不变,将函数处理的值给新变量,优点是能够保存初始数据
import torch
import torch.nn as nn
from torch.nn import ReLU
input = torch.tensor([[1, -0.5],
[-1, 3]])
print(input)
class NonLinear(nn.Module):
def __init__(self):
super(NonLinear, self).__init__()
self.relu1 = ReLU()
def forward(self,input):
output = self.relu1(input)
return output
non = NonLinear()
output = non(input)
print(output)
2.5.2 Sigmoid的使用
import torchvision
from torch.nn import Sigmoid
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
dataset = torchvision.datasets.CIFAR10("dataset2", train=False, transform=torchvision.transforms.ToTensor())
data_loader = DataLoader(dataset, batch_size=64)
class NonLinear(nn.Module):
def __init__(self):
super(NonLinear, self).__init__()
self.sigmoid = Sigmoid()
def forward(self, input):
output = self.sigmoid(input)
return output
writer = SummaryWriter("data_sigmoid")
non = NonLinear()
step = 0
for data in data_loader:
imgs, target = data
writer.add_images("input", imgs, global_step=step)
output = non(imgs)
writer.add_images("img_sigmoid", output, step)
step +=1
writer.close()
2.6 线性层及其他层
2.6.1 线性层
作用: 对输入数据进行线性变换和仿射变换,将输入数据映射到输出数据空间,使用较多
import torch
import torchvision
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.nn import Flatten, Linear
dataset = torchvision.datasets.CIFAR10("dataset2", train=False, transform=torchvision.transforms.ToTensor())
data_loader = DataLoader(dataset, batch_size=64, drop_last=True)
class Li(nn.Module):
def __init__(self):
super(Li, self).__init__()
self.linear1 = Linear(196608, 10)
def forward(self,x):
output = self.linear1(x)
return output
l = Li()
for data in data_loader:
imgs, target = data
print(imgs.shape)
output = torch.flatten(imgs)
print(output.shape)
output = l(output)
print(output.shape)
关于Linear(196608, 10)钟196608的计算
- 由output = torch.reshape(imgs,(1,1,1,-1)) 结合64332*32计算得到
- 通过output = torch.flatten(imgs) 将多维度展平
注:torch.reshape() 功能更强大
2.6.2 其他层
2.7 搭建小实战和Sequential 的使用
2.7.1 小实战
- 首先计算出每次进行卷积时的padding填充了几圈
2. 代码实现
import torch
import torch.nn as nn
from torch.nn import Conv2d, MaxPool2d, Linear, Flatten
class conv(nn.Module):
def __init__(self):
super(conv, self).__init__()
self.conv1 = Conv2d(in_channels=3, out_channels=32, kernel_size= 5, stride=1, padding=2)
self.maxPool1 = MaxPool2d(kernel_size=2)
self.conv2 = Conv2d(32, 32, 5, padding=2)
self.maxPool2 = MaxPool2d(2)
self.conv3 = Conv2d(32, 64, 5, padding=2)
self.maxPool3 = MaxPool2d(2)
self.flatten1 = Flatten()
self.linear1 = Linear(1024, 64)
self.linear2 = Linear(64, 10)
def forward(self, x):
x = self.conv1(x)
x = self.maxPool1(x)
x = self.conv2(x)
x = self.maxPool2(x)
x = self.conv3(x)
x = self.maxPool3(x)
x = self.flatten1(x)
x = self.linear1(x)
x = self.linear2(x)
return x
cc = conv()
input = torch.ones((64, 3, 32, 32))
output = cc(input)
print(output.shape) # torch.Size([64, 10])
2.7.2 Sequentia的使用
import torch
import torch.nn as nn
from torch.nn import Conv2d, MaxPool2d, Flatten, Linear, Sequential
class seq(nn.Module):
def __init__(self):
super(seq, self).__init__()
self.model = Sequential(
Conv2d(3, 32, 5, padding=2),
MaxPool2d(2),
Conv2d(32, 32, 5, padding=2),
MaxPool2d(2),
Conv2d(32, 64, 5, padding=2),
MaxPool2d(2),
Flatten(),
Linear(1024, 64),
Linear(64, 10)
)
def forward(self,x):
x = self.model(x)
return x
s = seq()
input = torch.ones((64, 3, 32, 32))
print(input.shape) # torch.Size([64, 3, 32, 32])
output = s(input)
print(output.shape) # torch.Size([64, 10])
总结
通过学习pytorch的一些简单的入门使用,提高了自己的代码能力,不过对于pytorch的学习和使用,还要多看官方文档和自己的深入理解