今天,我用卷积神经网络练习手写数字识别,这是一个早期非常经典的人工智能应用,相信友友们都不陌生,就是像这种的图片,让AI来识别,识别率非常高。
上边2个图是最终结果,可以看到,接近100%。
步入正题,首先,我们得自己写个类,定义自己的卷积神经网络模型,如下图:
import torch.nn
import torchvision
import torch.utils.data
class CNN(torch.nn.Module):
def __init__(self):
super().__init__()
# in 28 x 28 x 1
self.layer1 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3),
torch.nn.BatchNorm2d(num_features=16),
torch.nn.ReLU()
)
# in 26 x 26 x 16
self.layer2 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
torch.nn.BatchNorm2d(num_features=32),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2)
)
# in 12 x 12 x 32
self.layer3 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
torch.nn.BatchNorm2d(num_features=64),
torch.nn.ReLU()
)
# in 10 x 10 x 64
self.layer4 = torch.nn.Sequential()
self.layer4.add_module('conv2d4', torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3))
self.layer4.add_module('bn4', torch.nn.BatchNorm2d(128))
self.layer4.add_module('relu4', torch.nn.ReLU())
self.layer4.add_module('maxpool4', torch.nn.MaxPool2d(kernel_size=2, stride=2))
# in 4 x 4 x 128
self.fc = torch.nn.Sequential(
torch.nn.Linear(in_features=128 * 4 * 4, out_features=1024),
torch.nn.ReLU(),
torch.nn.Linear(in_features=1024, out_features=128),
torch.nn.ReLU(),
torch.nn.Linear(in_features=128, out_features=10)
)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# print(x.size())
# 下面2种写法都对,保证全连接层连接正常匹配
# x = x.view(x.size(0), -1)
x = x.view(-1, 128 * 4 * 4)
# print(x.size())
x = self.fc(x)
return x
我来解释一下,这个CNN是自己起的名字,比较形象,可以换别的名,不是关键字。这个里面卷积有4层,layer1到layer4,其中1和4中各有一个池化层,激活函数用的是relu,采用torch.nn.Sequential()封装,layer4与其他3个layer写法不一样,为什么,因为我想用2种写法练习pytorch。注意构造方法__init__(self)中第一行要写成:super().init(),这个是python3的写法,大多数教程,书里都是用的老的写法,新写法super里面不许写self和自己的类名(写了就报错),新写法让写代码更加简单了。self.fc是一个全连接层,从4个卷积层来的2048个参数,先变成1024,然后在变成128,最后变成10(因为你要识别10个数字,有10种分类)。输入图像是黑白图像,也就是一维的(单通道),输入时1x28x28,每一层多少输入,代码的注释中有写,卷积层后的size是:
(输入图像宽度(高度)-卷积核的宽度(高度)+ 2 x padding)/2 + 1
池化层输出大小:
(输入图像宽度(高度)-卷积核的宽度(高度))/2 + 1
池化层和卷积层很像,就是没有+2倍的padding。最终的卷积层的输出是128通道的4x4大小的图像,然后进入全连接层。
forward函数中,前面4行代码就是将4个卷积层连接起来,接着有一个view函数,2种写法都对,一种写法是严格保证列数是12844(2048),行数无所谓;注释的写法是只要保证每个batch的数对(行数对),那么列数也就是2048,换句话说,可以有不同的方法保证列数是2048(12844),要和全连接层输入的参数2048对应(如果不是很清楚,可以将那块的代码不注释,运行输出看下效果,你就懂了)。
接下来是对象实例化以及损失函数,优化函数的代码:
net = CNN()
my_optim = torch.optim.Adam(net.parameters())
my_loss = torch.nn.CrossEntropyLoss()
这里采用了那个adam优化函数,据说这个挺先进的,损失函数用的是交叉熵(一般用于处理分裂问题,回归问题通常用均方差函数MSE)。
下来是调用GPU:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
大家都是这种写法。
定义大循环次数(所有的数据重复训练的次数):
max_epoch = 5
得到数据:
data_train = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=0.5, std=0.5)]
), train=True, download=False)
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
写法都是这么写的,不多说了,看data_train那里,root是数据的路径,transforms.Compose是对数据进行哪些操作(增强,缩放,旋转等),这里没那么复杂,进行了数据Tensor化,归一化,train=True是代表这个作为训练集合,download=False代表电脑中有数据,无需从网络下载,如果电脑中没有数据(第一次运行代码),要将False改为True来下载数据图片。
数据装载:
data_loader_train = torch.utils.data.DataLoader(dataset=data_train, batch_size=64, shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)
这个是将数据打包好,多少张图片(现在应该已经是Tensor了)放一起,一个形象的例子,torchvision.datasets是从货架里把货取下来,torch.utils.data.DataLoader是将货打包,比如64个货装一箱。
训练:
for epoch in range(max_epoch):
running_loss = 0.0
running_correct = 0
start_time = time.time()
print("Epoch {}/{}".format(epoch, max_epoch))
print("-" * 10)
for data in data_loader_train:
images_train, data_label_train = data
images_train.to(device)
data_label_train.to(device)
outputs = net(images_train.to(device))
_, pred = torch.max(outputs.detach(), 1)
my_optim.zero_grad()
loss = my_loss(outputs, data_label_train.to(device))
loss.backward()
my_optim.step()
running_loss += loss.detach()
running_correct += torch.sum(pred == data_label_train.detach().to(device))
testing_correct = 0
for data in data_loader_test:
images_test, data_label_test = data
data_label_test.to(device)
outputs = net(images_test.to(device))
_, pred = torch.max(outputs.detach(), dim=1)
testing_correct += torch.sum(pred == data_label_test.detach().to(device))
print("Loss is:{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}".format(running_loss / len(data_train),
100 * running_correct / len(
data_train),
100 * testing_correct / len(
data_test)))
end_time = time.time()
print("用时:", end_time - start_time)
外边的for是总共循环几次,所有数据总共训练5次。里面的for是每次给data64个数据(我们的batch_size是64,就是一次给模型64个图片),当然输出的outputs也是64个,换句话说,64个一组。送入的数据是4维度的,B,C,H,W,B是图片的个数(可以这么理解),H,W不用多说,图片的大小,C是图片的通道数,W,H,C具体顺序,不同的网络可能不一样,但B一般都是0维度,在前面。time是记录时间的,不重要;print输出给人看的,不重要;running前缀的也不是很重要。_, pred = torch.max(outputs.detach(), dim=1)这行代码解释一下:torch.max有2个返回值,前面的返回值是真正的值,第二个返回值是标签,那个下划线接收的是真正的值(这里我们不关心它),为啥用下划线,是这样解释的:一般不需要、不关注的可以用下滑线(当然你也可以用其他的字母来代替,都对,就好像我们一般会把临时变量叫temp一样),max后面的参数dim=1是指从第一维中获取最大值(就是一行的最大值),如果是0的话,就是从行里取最大值(某一列的最大值)。这里的outputs是64行10列的,因此,最终得到64个数(10个数中挑一个最大的,因为最大的那个概率最高)。注意:可千万不要看成是第一个参数和1比较大小哦,因此1的前面写dim=还是很友好的。loss和my_optim那些代码都是常规操作了,这里就不多说了。里面的第二个for是测试的,第一个(刚才说的那个for)是训练的。注意许多参数要加.to(device)来满足GPU运行(有一种简单的方法是将默认的Tensor变成CUDA的,但我这里没用这种办法)。
保存模型:
torch.save(net.state_dict(), "net_train.pth")
第二个参数是文件的名字,后缀名可以不是pth,写别的也对。如果不保存就白训练了。
下面是测试中首先加载模型:
net = CNN.CNN()
p = torch.load("net_train.pth")
net.load_state_dict(p)
首先实例化一个网络(CNN),然后调用load函数,接着把参数值给load_state_dict函数,这里写法不唯一。作用是用刚才训练好的模型来进行测试(或者说应用)。
测试:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
start_time = time.time()
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)
images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
outputs = outputs.view(8, -1)
data_label_test = data_label_test.view(8, -1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)
end_time = time.time()
print("用时:", end_time - start_time)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()
注意:img开头的后面几行代码是为了给人看的,img.numpy().transpose是维度换位置,将通道在前,变成通道在后,就是前面说的HWC还是CHW,torchvision.utils.make_grid是将图片网格化,64张图变成8x8显示,具体原理可以搜一搜,写到这里已经6千6百多个字了,写不动了。如图:
本例子分为3个代码文件:CCN.py,main.py,test.py。CNN中定义的是我自己的类CNN,main是运行训练的,test是测试用的。
CNN.py:
import torch.nn
import torchvision
import torch.utils.data
class CNN(torch.nn.Module):
def __init__(self):
super().__init__()
# in 28 x 28 x 1
self.layer1 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3),
torch.nn.BatchNorm2d(num_features=16),
torch.nn.ReLU()
)
# in 26 x 26 x 16
self.layer2 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
torch.nn.BatchNorm2d(num_features=32),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size=2, stride=2)
)
# in 12 x 12 x 32
self.layer3 = torch.nn.Sequential(
torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
torch.nn.BatchNorm2d(num_features=64),
torch.nn.ReLU()
)
# in 10 x 10 x 64
self.layer4 = torch.nn.Sequential()
self.layer4.add_module('conv2d4', torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3))
self.layer4.add_module('bn4', torch.nn.BatchNorm2d(128))
self.layer4.add_module('relu4', torch.nn.ReLU())
self.layer4.add_module('maxpool4', torch.nn.MaxPool2d(kernel_size=2, stride=2))
# in 4 x 4 x 128
self.fc = torch.nn.Sequential(
torch.nn.Linear(in_features=128 * 4 * 4, out_features=1024),
torch.nn.ReLU(),
torch.nn.Linear(in_features=1024, out_features=128),
torch.nn.ReLU(),
torch.nn.Linear(in_features=128, out_features=10)
)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# print(x.size())
# 下面2种写法都对,保证全连接层连接正常匹配
# x = x.view(x.size(0), -1)
x = x.view(-1, 128 * 4 * 4)
# print(x.size())
x = self.fc(x)
return x
main.py:
import time
import matplotlib.pyplot as plt
import torchvision
import torch.utils.data
from CNN import CNN
net = CNN()
my_optim = torch.optim.Adam(net.parameters())
my_loss = torch.nn.CrossEntropyLoss()
max_epoch = 1
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
data_train = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=0.5, std=0.5)]
), train=True, download=False)
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
data_loader_train = torch.utils.data.DataLoader(dataset=data_train, batch_size=64, shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)
for epoch in range(max_epoch):
running_loss = 0.0
running_correct = 0
start_time = time.time()
print("Epoch {}/{}".format(epoch, max_epoch))
print("-" * 10)
for data in data_loader_train:
images_train, data_label_train = data
images_train.to(device)
data_label_train.to(device)
outputs = net(images_train.to(device))
_, pred = torch.max(outputs.detach(), 1)
my_optim.zero_grad()
loss = my_loss(outputs, data_label_train.to(device))
loss.backward()
my_optim.step()
running_loss += loss.detach()
running_correct += torch.sum(pred == data_label_train.detach().to(device))
testing_correct = 0
for data in data_loader_test:
images_test, data_label_test = data
data_label_test.to(device)
outputs = net(images_test.to(device))
_, pred = torch.max(outputs.detach(), dim=1)
testing_correct += torch.sum(pred == data_label_test.detach().to(device))
print("Loss is:{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}".format(running_loss / len(data_train),
100 * running_correct / len(
data_train),
100 * testing_correct / len(
data_test)))
end_time = time.time()
print("用时:", end_time - start_time)
torch.save(net.state_dict(), "net_train.fsx")
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=4, shuffle=True)
images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()
test.py:
import time
import matplotlib.pyplot as plt
import torchvision
import torch.utils.data
import CNN
net = CNN.CNN()
p = torch.load("net_train.fsx")
net.load_state_dict(p)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
start_time = time.time()
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
[torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)
images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
outputs = outputs.view(8, -1)
data_label_test = data_label_test.view(8, -1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)
end_time = time.time()
print("用时:", end_time - start_time)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()
若有表达有错误或者讲解的不好的地方,欢迎指正。