PyTorch学习(1)
CIFAR-10数据集-图像分类
数据集来源是官方提供的:
torchvision.datasets.CIFAR10()
共有十类物品,需要用CNN实现图像分类问题。
代码如下:(CIFAR_10_Classifier_Self_1.py)
import torch
import torchvision
from torch import optim
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import transforms
import matplotlib as plt
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
)
train_set = torchvision.datasets.CIFAR10(root='./CIFAR_10', train=True, transform=transform,
download=True)
test_set = torchvision.datasets.CIFAR10(root='./CIFAR_10', train=False, transform=transform,
download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_set, batch_size=4, shuffle=True) # shuffle: 打乱数据集顺序
test_loader = torch.utils.data.DataLoader(dataset=test_set, batch_size=4, shuffle=True)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') # CIFAR-10 targets
class MyNet(nn.Module):
def __init__(self):
super().__init__() # 初始化父类
self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
self.pool = nn.MaxPool2d(kernel_size=2)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
self.fc1 = nn.Linear(in_features=16*5*5, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=84)
self.fc3 = nn.Linear(in_features=84, out_features=10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # 卷积层后面通常接非线性变换
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = MyNet()
criterion = nn.CrossEntropyLoss()
# optimizer: 优化器 SGD: 随机梯度下降
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # 训练进行两个epoch,每个epoch都代表一次完整的数据集遍历
running_loss = 0.0
for i, data in enumerate(train_loader, 0): # 遍历数据加载器(DataLoader)
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999:
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
running_loss = 0.0
print('Finished Training')
# save
PATH = './cifar_net.pth'
torch.save(net, PATH)
运行结果:
C:\Users\dell\anaconda3\envs\pytorch\python.exe C:\Users\dell\Desktop\2024Summer\project1\learn_pytorch\pythonProject3\CIFAR_10_Classifier_Self_1.py
Files already downloaded and verified
Files already downloaded and verified
[1, 2000] loss: 2.208
[1, 4000] loss: 1.849
[1, 6000] loss: 1.681
[1, 8000] loss: 1.569
[1, 10000] loss: 1.523
[1, 12000] loss: 1.477
[2, 2000] loss: 1.395
[2, 4000] loss: 1.373
[2, 6000] loss: 1.360
[2, 8000] loss: 1.306
[2, 10000] loss: 1.325
[2, 12000] loss: 1.297
Finished Training
Process finished with exit code 0
把上面程序中的epoch循环次数改为10,并运行下列程序:(CIFAR_10_Classifier_Self_2.py)
import torch
from CIFAR_10_Classifier_Self_1 import MyNet
import torchvision.transforms as transforms
from PIL import Image
net = MyNet()
PATH = './cifar_net.pth'
net.load_state_dict(torch.load(PATH))
net.eval()
transform = transforms.Compose([
transforms.Resize((32, 32)), # 输入不符合网络要求时,会报错
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# read image
def image_loader(image_name):
image = Image.open(image_name)
image = transform(image).unsqueeze(0) # 在tensor外面套一层中括号[]
return image
def classify_image(image_path):
image = image_loader(image_path)
outputs = net(image)
_, predicted = torch.max(outputs, 1)
return predicted.item()
image_path = './images/ALPINA B3.jpg'
predicted_class = classify_image(image_path)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
print('Predicted Class:', classes[predicted_class])
对于输入图像:(ALPINA B3.jpg)
运行结果如下:
C:\Users\dell\anaconda3\envs\pytorch\python.exe C:\Users\dell\Desktop\2024Summer\project1\learn_pytorch\pythonProject3\CIFAR_10_Classifier_Self_2.py
Files already downloaded and verified
Files already downloaded and verified
[1, 2000] loss: 2.227
[1, 4000] loss: 1.900
[1, 6000] loss: 1.696
[1, 8000] loss: 1.586
[1, 10000] loss: 1.518
[1, 12000] loss: 1.440
[2, 2000] loss: 1.386
[2, 4000] loss: 1.351
[2, 6000] loss: 1.342
[2, 8000] loss: 1.312
[2, 10000] loss: 1.293
[2, 12000] loss: 1.246
[3, 2000] loss: 1.194
[3, 4000] loss: 1.199
[3, 6000] loss: 1.180
[3, 8000] loss: 1.175
[3, 10000] loss: 1.150
[3, 12000] loss: 1.154
[4, 2000] loss: 1.070
[4, 4000] loss: 1.088
[4, 6000] loss: 1.099
[4, 8000] loss: 1.069
[4, 10000] loss: 1.101
[4, 12000] loss: 1.082
[5, 2000] loss: 0.988
[5, 4000] loss: 1.013
[5, 6000] loss: 1.024
[5, 8000] loss: 1.040
[5, 10000] loss: 1.033
[5, 12000] loss: 1.045
[6, 2000] loss: 0.944
[6, 4000] loss: 0.957
[6, 6000] loss: 0.978
[6, 8000] loss: 0.990
[6, 10000] loss: 0.976
[6, 12000] loss: 0.998
[7, 2000] loss: 0.891
[7, 4000] loss: 0.931
[7, 6000] loss: 0.945
[7, 8000] loss: 0.934
[7, 10000] loss: 0.936
[7, 12000] loss: 0.936
[8, 2000] loss: 0.851
[8, 4000] loss: 0.898
[8, 6000] loss: 0.907
[8, 8000] loss: 0.898
[8, 10000] loss: 0.890
[8, 12000] loss: 0.911
[9, 2000] loss: 0.809
[9, 4000] loss: 0.851
[9, 6000] loss: 0.856
[9, 8000] loss: 0.869
[9, 10000] loss: 0.888
[9, 12000] loss: 0.903
[10, 2000] loss: 0.796
[10, 4000] loss: 0.812
[10, 6000] loss: 0.825
[10, 8000] loss: 0.857
[10, 10000] loss: 0.865
[10, 12000] loss: 0.862
Finished Training
Predicted Class: car
Process finished with exit code 0
可以看到,能够将图片正确分类。
(之前epoch为2时,错误地将图片分类成了ship,推测是网络参数因为训练次数少导致并非较佳值;epoch为10时也不敢保证一定能预测正确)
MNIST手写数字识别(LeNet)
import torch
import numpy as np
from matplotlib import pyplot as plt
from torch import nn
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
# Fetch the dataset
training_set = datasets.MNIST(root='./minst', train=True, transform=transform, download=True)
train_loader = DataLoader(dataset=training_set, batch_size=64, shuffle=True)
test_set = datasets.MNIST(root='./minst', train=False, transform=transform, download=True)
test_loader = DataLoader(dataset=test_set, batch_size=64, shuffle=True)
# show the dataset *
fig = plt.figure()
for i in range(12):
plt.subplot(3, 4, i+1) # 第一个参数代表子图的行数,第二个参数代表该行图像的列数,第三个参数代表每行的第几个图像
plt.tight_layout()
plt.imshow(training_set.data[i], cmap='gray', interpolation='none')
plt.title("Label: {}".format(training_set.targets[i]))
plt.xticks([])
plt.yticks([])
plt.show()
class MyLeNet5(nn.Module):
def __init__(self):
super(MyLeNet5, self).__init__()
self.c1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2)
self.Sigmoid = nn.Sigmoid()
self.s2 = nn.AvgPool2d(kernel_size=2)
self.c3 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
self.s4 = nn.AvgPool2d(kernel_size=2)
self.c5 = nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5)
self.flatten = nn.Flatten()
self.f6 = nn.Linear(in_features=120, out_features=84)
self.output = nn.Linear(in_features=84, out_features=10)
def forward(self, x):
x = self.Sigmoid(self.c1(x))
x = self.s2(x)
x = self.Sigmoid(self.c3(x))
x = self.s4(x)
x = self.c5(x)
x = self.flatten(x)
x = self.f6(x)
x = self.output(x)
return x
model = MyLeNet5()
# loss function & optimizer(参数优化)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.01, momentum=0.5) # momentum: 冲量
# train
def train(epoch): # epoch: 方便打印
running_loss = 0.0
running_total = 0
running_correct = 0
for batch_idx, data in enumerate(train_loader, 0): # 给train_loader元素编号,从0开始
inputs, targets = data # inputs和targets是“数组”的形式
optimizer.zero_grad() # 消除优化器中原有的梯度
outputs = model(inputs)
loss = criterion(outputs, targets) # 对比输出结果和“答案”
loss.backward()
optimizer.step() # 优化网络参数
running_loss += loss.item() # .item(): 取出tensor中特定位置的具体元素值并返回该值(Tensor to int or float)
_, predicted = torch.max(outputs.data, dim=1) # 找到每个样本预测概率最高的类别的标签值(即预测结果)
# dim=0计算tensor中每列的最大值的索引,dim=1表示每行的最大值的索引
running_total += inputs.shape[0] # .shape[0]: 读取矩阵第一维度的长度
running_correct += (predicted == targets).sum().item()
if batch_idx % 300 == 299:
print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
% (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
running_loss = 0.0
running_total = 0
running_correct = 0
# test *
def test(epoch):
correct = 0
total = 0
with torch.no_grad():
for data in test_loader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc = correct / total
print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch + 1, 10, 100 * acc))
return acc
acc_list_test = []
for epoch in range(10):
train(epoch)
acc_test = test(epoch)
acc_list_test.append(acc_test)
plt.plot(acc_list_test)
plt.xlabel('Epoch')
plt.ylabel('Accuracy on Test Set')
plt.show()
运行结果:
C:\Users\dell\anaconda3\envs\pytorch\python.exe C:\Users\dell\Desktop\2024Summer\project1\learn_pytorch\pythonProject3\MINST_1.py
[1, 300]: loss: 2.302 , acc: 11.04 %
[1, 600]: loss: 2.297 , acc: 11.97 %
[1, 900]: loss: 2.287 , acc: 15.71 %
[1 / 10]: Accuracy on test set: 26.9 %
[2, 300]: loss: 2.212 , acc: 27.77 %
[2, 600]: loss: 1.695 , acc: 50.16 %
[2, 900]: loss: 0.959 , acc: 71.32 %
[2 / 10]: Accuracy on test set: 77.0 %
[3, 300]: loss: 0.655 , acc: 79.78 %
[3, 600]: loss: 0.558 , acc: 82.46 %
[3, 900]: loss: 0.503 , acc: 84.27 %
[3 / 10]: Accuracy on test set: 86.3 %
[4, 300]: loss: 0.431 , acc: 86.77 %
[4, 600]: loss: 0.413 , acc: 87.64 %
[4, 900]: loss: 0.391 , acc: 88.16 %
[4 / 10]: Accuracy on test set: 89.3 %
[5, 300]: loss: 0.361 , acc: 89.16 %
[5, 600]: loss: 0.354 , acc: 89.33 %
[5, 900]: loss: 0.330 , acc: 89.92 %
[5 / 10]: Accuracy on test set: 90.4 %
[6, 300]: loss: 0.322 , acc: 90.38 %
[6, 600]: loss: 0.322 , acc: 90.20 %
[6, 900]: loss: 0.306 , acc: 90.59 %
[6 / 10]: Accuracy on test set: 91.5 %
[7, 300]: loss: 0.296 , acc: 90.87 %
[7, 600]: loss: 0.293 , acc: 91.23 %
[7, 900]: loss: 0.290 , acc: 91.17 %
[7 / 10]: Accuracy on test set: 92.1 %
[8, 300]: loss: 0.279 , acc: 91.47 %
[8, 600]: loss: 0.274 , acc: 91.71 %
[8, 900]: loss: 0.263 , acc: 92.03 %
[8 / 10]: Accuracy on test set: 92.6 %
[9, 300]: loss: 0.252 , acc: 92.34 %
[9, 600]: loss: 0.253 , acc: 92.16 %
[9, 900]: loss: 0.250 , acc: 92.61 %
[9 / 10]: Accuracy on test set: 93.3 %
[10, 300]: loss: 0.240 , acc: 92.69 %
[10, 600]: loss: 0.230 , acc: 93.27 %
[10, 900]: loss: 0.230 , acc: 93.03 %
[10 / 10]: Accuracy on test set: 93.6 %
问题区
import torchvision 和 from torchvision import * 有什么区别?
import torchvision
# 使用transforms模块
transform = torchvision.transforms.Compose([...])
from torchvision import *
# 直接使用transforms模块(如果torchvision中有这样的公开导入)
# 注意:实际上,torchvision不会将所有内容都直接暴露出来,这里只是为了说明
transform = Compose([...])
然而,第二种方法可能会出现命名冲突的问题。同时,并非所有内容都会被采用第二种方法读入,以_开头的“私有”名称通常不会被导入。
因此,推荐使用 import torchvision 的方式。
DataLoader返回的是啥?为什么有时候要在外面套一个“enumerate()”?
返回值是一个实现了__iter__的对象,可以使用for循环进行迭代,或者转换成迭代器取第一条batch数据查看。
# train
def train(epoch):
...
for batch_idx, data in enumerate(train_loader, 0):
...
在上面这段代码中,需要在train_loader外面套个enumerate的原因是:能够给train_loader加上编号,变成“batch_idx”。(这里是方便打印训练结果)
Python enumerate()函数使用举例:
>>> seq = ['one', 'two', 'three']
>>> for i, element in enumerate(seq):
... print i, element
...
0 one
1 two
2 three
torch.max()的作用?
用法如下:
import torch
output = torch.tensor([[1, 2, 3], [3, 4, 5]])
predict = torch.max(output, dim=0) # dim=0: 取每一列的最大值
print(predict)
predict = torch.max(output, dim=1) # dim=1: 取每一行的最大值
print(predict)
输出:
torch.return_types.max(
values=tensor([3, 4, 5]),
indices=tensor([1, 1, 1]))
torch.return_types.max(
values=tensor([3, 5]),
indices=tensor([2, 2]))
indices表示从0开始的下标。
_, predicted = torch.max(outputs.data, dim=1) 这里的下划线的作用是?
下划线表示一个占位符,表示该值被有意地忽略。