卷积神经网络经典应用手写数字识别练习

今天,我用卷积神经网络练习手写数字识别,这是一个早期非常经典的人工智能应用,相信友友们都不陌生,就是像这种的图片,让AI来识别,识别率非常高。
在这里插入图片描述
在这里插入图片描述
上边2个图是最终结果,可以看到,接近100%。

步入正题,首先,我们得自己写个类,定义自己的卷积神经网络模型,如下图:

import torch.nn
import torchvision
import torch.utils.data


class CNN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # in 28 x 28 x 1
        self.layer1 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=16),
            torch.nn.ReLU()
        )
        # in 26 x 26 x 16
        self.layer2 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # in 12 x 12 x 32
        self.layer3 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=64),
            torch.nn.ReLU()
        )
        # in 10 x 10 x 64
        self.layer4 = torch.nn.Sequential()
        self.layer4.add_module('conv2d4', torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3))
        self.layer4.add_module('bn4', torch.nn.BatchNorm2d(128))
        self.layer4.add_module('relu4', torch.nn.ReLU())
        self.layer4.add_module('maxpool4', torch.nn.MaxPool2d(kernel_size=2, stride=2))
        # in 4 x 4 x 128
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(in_features=128 * 4 * 4, out_features=1024),
            torch.nn.ReLU(),
            torch.nn.Linear(in_features=1024, out_features=128),
            torch.nn.ReLU(),
            torch.nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        # print(x.size())
        # 下面2种写法都对,保证全连接层连接正常匹配
        # x = x.view(x.size(0), -1)
        x = x.view(-1, 128 * 4 * 4)
        # print(x.size())
        x = self.fc(x)
        return x

我来解释一下,这个CNN是自己起的名字,比较形象,可以换别的名,不是关键字。这个里面卷积有4层,layer1到layer4,其中1和4中各有一个池化层,激活函数用的是relu,采用torch.nn.Sequential()封装,layer4与其他3个layer写法不一样,为什么,因为我想用2种写法练习pytorch。注意构造方法__init__(self)中第一行要写成:super().init(),这个是python3的写法,大多数教程,书里都是用的老的写法,新写法super里面不许写self和自己的类名(写了就报错),新写法让写代码更加简单了。self.fc是一个全连接层,从4个卷积层来的2048个参数,先变成1024,然后在变成128,最后变成10(因为你要识别10个数字,有10种分类)。输入图像是黑白图像,也就是一维的(单通道),输入时1x28x28,每一层多少输入,代码的注释中有写,卷积层后的size是:
(输入图像宽度(高度)-卷积核的宽度(高度)+ 2 x padding)/2 + 1
池化层输出大小:
(输入图像宽度(高度)-卷积核的宽度(高度))/2 + 1
池化层和卷积层很像,就是没有+2倍的padding。最终的卷积层的输出是128通道的4x4大小的图像,然后进入全连接层。
forward函数中,前面4行代码就是将4个卷积层连接起来,接着有一个view函数,2种写法都对,一种写法是严格保证列数是12844(2048),行数无所谓;注释的写法是只要保证每个batch的数对(行数对),那么列数也就是2048,换句话说,可以有不同的方法保证列数是2048(12844),要和全连接层输入的参数2048对应(如果不是很清楚,可以将那块的代码不注释,运行输出看下效果,你就懂了)。
接下来是对象实例化以及损失函数,优化函数的代码:

net = CNN()
my_optim = torch.optim.Adam(net.parameters())
my_loss = torch.nn.CrossEntropyLoss()

这里采用了那个adam优化函数,据说这个挺先进的,损失函数用的是交叉熵(一般用于处理分裂问题,回归问题通常用均方差函数MSE)。
下来是调用GPU:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)

大家都是这种写法。
定义大循环次数(所有的数据重复训练的次数):

max_epoch = 5

得到数据:

data_train = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize(mean=0.5, std=0.5)]
), train=True, download=False)

data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)

写法都是这么写的,不多说了,看data_train那里,root是数据的路径,transforms.Compose是对数据进行哪些操作(增强,缩放,旋转等),这里没那么复杂,进行了数据Tensor化,归一化,train=True是代表这个作为训练集合,download=False代表电脑中有数据,无需从网络下载,如果电脑中没有数据(第一次运行代码),要将False改为True来下载数据图片。
数据装载:

data_loader_train = torch.utils.data.DataLoader(dataset=data_train, batch_size=64, shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)

这个是将数据打包好,多少张图片(现在应该已经是Tensor了)放一起,一个形象的例子,torchvision.datasets是从货架里把货取下来,torch.utils.data.DataLoader是将货打包,比如64个货装一箱。

训练:

for epoch in range(max_epoch):

    running_loss = 0.0
    running_correct = 0
    start_time = time.time()
    print("Epoch {}/{}".format(epoch, max_epoch))
    print("-" * 10)
    for data in data_loader_train:
        images_train, data_label_train = data
        images_train.to(device)
        data_label_train.to(device)
        outputs = net(images_train.to(device))
        _, pred = torch.max(outputs.detach(), 1)
        my_optim.zero_grad()
        loss = my_loss(outputs, data_label_train.to(device))
        loss.backward()
        my_optim.step()
        running_loss += loss.detach()
        running_correct += torch.sum(pred == data_label_train.detach().to(device))

    testing_correct = 0
    for data in data_loader_test:
        images_test, data_label_test = data
        data_label_test.to(device)
        outputs = net(images_test.to(device))
        _, pred = torch.max(outputs.detach(), dim=1)
        testing_correct += torch.sum(pred == data_label_test.detach().to(device))
    print("Loss is:{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}".format(running_loss / len(data_train),
                                                                                    100 * running_correct / len(
                                                                                        data_train),
                                                                                    100 * testing_correct / len(
                                                                                        data_test)))
    end_time = time.time()
    print("用时:", end_time - start_time)

外边的for是总共循环几次,所有数据总共训练5次。里面的for是每次给data64个数据(我们的batch_size是64,就是一次给模型64个图片),当然输出的outputs也是64个,换句话说,64个一组。送入的数据是4维度的,B,C,H,W,B是图片的个数(可以这么理解),H,W不用多说,图片的大小,C是图片的通道数,W,H,C具体顺序,不同的网络可能不一样,但B一般都是0维度,在前面。time是记录时间的,不重要;print输出给人看的,不重要;running前缀的也不是很重要。_, pred = torch.max(outputs.detach(), dim=1)这行代码解释一下:torch.max有2个返回值,前面的返回值是真正的值,第二个返回值是标签,那个下划线接收的是真正的值(这里我们不关心它),为啥用下划线,是这样解释的:一般不需要、不关注的可以用下滑线(当然你也可以用其他的字母来代替,都对,就好像我们一般会把临时变量叫temp一样),max后面的参数dim=1是指从第一维中获取最大值(就是一行的最大值),如果是0的话,就是从行里取最大值(某一列的最大值)。这里的outputs是64行10列的,因此,最终得到64个数(10个数中挑一个最大的,因为最大的那个概率最高)。注意:可千万不要看成是第一个参数和1比较大小哦,因此1的前面写dim=还是很友好的。loss和my_optim那些代码都是常规操作了,这里就不多说了。里面的第二个for是测试的,第一个(刚才说的那个for)是训练的。注意许多参数要加.to(device)来满足GPU运行(有一种简单的方法是将默认的Tensor变成CUDA的,但我这里没用这种办法)。
保存模型:

torch.save(net.state_dict(), "net_train.pth")

第二个参数是文件的名字,后缀名可以不是pth,写别的也对。如果不保存就白训练了。
下面是测试中首先加载模型:

net = CNN.CNN()
p = torch.load("net_train.pth")
net.load_state_dict(p)

首先实例化一个网络(CNN),然后调用load函数,接着把参数值给load_state_dict函数,这里写法不唯一。作用是用刚才训练好的模型来进行测试(或者说应用)。
测试:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
start_time = time.time()
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)

images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
outputs = outputs.view(8, -1)
data_label_test = data_label_test.view(8, -1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)
end_time = time.time()
print("用时:", end_time - start_time)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()

注意:img开头的后面几行代码是为了给人看的,img.numpy().transpose是维度换位置,将通道在前,变成通道在后,就是前面说的HWC还是CHW,torchvision.utils.make_grid是将图片网格化,64张图变成8x8显示,具体原理可以搜一搜,写到这里已经6千6百多个字了,写不动了。如图:
在这里插入图片描述

本例子分为3个代码文件:CCN.py,main.py,test.py。CNN中定义的是我自己的类CNN,main是运行训练的,test是测试用的。
CNN.py:

import torch.nn
import torchvision
import torch.utils.data


class CNN(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # in 28 x 28 x 1
        self.layer1 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=16),
            torch.nn.ReLU()
        )
        # in 26 x 26 x 16
        self.layer2 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=32),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # in 12 x 12 x 32
        self.layer3 = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3),
            torch.nn.BatchNorm2d(num_features=64),
            torch.nn.ReLU()
        )
        # in 10 x 10 x 64
        self.layer4 = torch.nn.Sequential()
        self.layer4.add_module('conv2d4', torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3))
        self.layer4.add_module('bn4', torch.nn.BatchNorm2d(128))
        self.layer4.add_module('relu4', torch.nn.ReLU())
        self.layer4.add_module('maxpool4', torch.nn.MaxPool2d(kernel_size=2, stride=2))
        # in 4 x 4 x 128
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(in_features=128 * 4 * 4, out_features=1024),
            torch.nn.ReLU(),
            torch.nn.Linear(in_features=1024, out_features=128),
            torch.nn.ReLU(),
            torch.nn.Linear(in_features=128, out_features=10)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        # print(x.size())
        # 下面2种写法都对,保证全连接层连接正常匹配
        # x = x.view(x.size(0), -1)
        x = x.view(-1, 128 * 4 * 4)
        # print(x.size())
        x = self.fc(x)
        return x

main.py:

import time
import matplotlib.pyplot as plt
import torchvision
import torch.utils.data
from CNN import CNN

net = CNN()
my_optim = torch.optim.Adam(net.parameters())
my_loss = torch.nn.CrossEntropyLoss()
max_epoch = 1
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)

data_train = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(),
     torchvision.transforms.Normalize(mean=0.5, std=0.5)]
), train=True, download=False)

data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)

data_loader_train = torch.utils.data.DataLoader(dataset=data_train, batch_size=64, shuffle=True)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)
for epoch in range(max_epoch):

    running_loss = 0.0
    running_correct = 0
    start_time = time.time()
    print("Epoch {}/{}".format(epoch, max_epoch))
    print("-" * 10)
    for data in data_loader_train:
        images_train, data_label_train = data
        images_train.to(device)
        data_label_train.to(device)
        outputs = net(images_train.to(device))
        _, pred = torch.max(outputs.detach(), 1)
        my_optim.zero_grad()
        loss = my_loss(outputs, data_label_train.to(device))
        loss.backward()
        my_optim.step()
        running_loss += loss.detach()
        running_correct += torch.sum(pred == data_label_train.detach().to(device))

    testing_correct = 0
    for data in data_loader_test:
        images_test, data_label_test = data
        data_label_test.to(device)
        outputs = net(images_test.to(device))
        _, pred = torch.max(outputs.detach(), dim=1)
        testing_correct += torch.sum(pred == data_label_test.detach().to(device))
    print("Loss is:{:.4f},Train Accuracy is:{:.4f}%,Test Accuracy is:{:.4f}".format(running_loss / len(data_train),
                                                                                    100 * running_correct / len(
                                                                                        data_train),
                                                                                    100 * testing_correct / len(
                                                                                        data_test)))
    end_time = time.time()
    print("用时:", end_time - start_time)

torch.save(net.state_dict(), "net_train.fsx")

data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=4, shuffle=True)
images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)

std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()

test.py:

import time
import matplotlib.pyplot as plt
import torchvision
import torch.utils.data
import CNN

net = CNN.CNN()
p = torch.load("net_train.fsx")
net.load_state_dict(p)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
start_time = time.time()
data_test = torchvision.datasets.MNIST(root="./data/", transform=torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=0.5, std=0.5)]), train=False)
data_loader_test = torch.utils.data.DataLoader(dataset=data_test, batch_size=64, shuffle=True)

images_test, data_label_test = next(iter(data_loader_test))
data_label_test.to(device)
outputs = net(images_test.to(device))
_, outputs = torch.max(outputs.detach(), dim=1)
outputs = outputs.view(8, -1)
data_label_test = data_label_test.view(8, -1)
print("Predict Label is:", [i for i in outputs.detach()])
print("真实标签:", [i for i in data_label_test])
img = torchvision.utils.make_grid(images_test)
img = img.numpy().transpose(1, 2, 0)
end_time = time.time()
print("用时:", end_time - start_time)
std = [0.5, 0.5, 0.5]
mean = [0.5, 0.5, 0.5]
img = img * std + mean
plt.imshow(img)
plt.show()

若有表达有错误或者讲解的不好的地方,欢迎指正。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值