基于PyTorch的TinyMind 汉字书法识别部分代码详解

最新推荐文章于 2024-08-06 14:46:36 发布

战死为止

最新推荐文章于 2024-08-06 14:46:36 发布

阅读量3.3k

点赞数 4

分类专栏： Deep Learning 个人笔记软件配置文章标签：神经网络 python 深度学习

本文链接：https://blog.csdn.net/qq_34122861/article/details/106020765

版权

个人笔记同时被 3 个专栏收录

42 篇文章 19 订阅

订阅专栏

软件配置

10 篇文章 0 订阅

订阅专栏

Deep Learning

3 篇文章 0 订阅

订阅专栏

文章目录

0. 前言
1 遇到的问题
2 data.py 数据转录文件
3 model.py 网络描述文件
- 3.1 定义网络结构★★★
- 3.2 保存和读取网络参数
4 train.py 网络训练文件★★★
5 test.py 网络测试文件★★★
结语
参考链接

0. 前言

选修课《神经网络与深度学习》的实验为TinyMind 汉字书法识别自由练习赛（初级难度）。
对应网址为：https://www.tinymind.cn/competitions/41#result_list

搜到Link2Link大神给出的源码。
该代码的github地址：https://github.com/Link2Link/TinyMind-start-with-0
对应的CSDN博客为：深度学习入门指南：从零开始TinyMind汉字书法识别
对应的TinyMind参赛经验贴为：【参赛经验】深度学习入门指南：从零开始TinyMind汉字书法识别——by：Link
修改路径后能跑到96+，无其他错误！再次感谢大神的源码！

在刚开始自己啃的一脸懵逼时，还大致过了一遍：PyTorch 动态神经网络 (莫烦 Python 教学)
B站链接为：https://www.bilibili.com/video/BV1Vx411j7kT?from=search&seid=15124761477393382413
对应的学习资源为：https://morvanzhou.github.io/tutorials/machine-learning/torch/
感谢莫烦大神！

PyTorch安装可以参考上一篇博客的安装步骤：
Win10下 NIVIDIA（CUDA+CUDNN）+Anaconda安装PyTorch(GPU版)

1 遇到的问题

上来遇到了两个问题，先行列出。

1.1 NameError: name ‘cv2’ is not defined

未安装OpenCV。
如果要用OpenCV，就需要import cv2。

安装步骤见参考链接1.

1.1.1 OpenCV下载

OpenCV下载地址为：https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv

找到OpenCV下选择对应的版本下载
建议下载opencv_python+contrib
因为opencv-contrib-python包含了主要模块以及扩展模块，扩展模块主要是包含了一些带专利的收费算法（如shift特征检测）以及一些在测试的新的算法（稳定后会合并到主要模块）。
见参考链接2.

我下载的是

opencv_python-4.1.2+contrib-cp36-cp36m-win_amd64.whl

cp36：PyTorch虚拟环境安装的python版本为3.6
win_amd64：64位Windows

1.1.2 OpenCV安装

打开Anacoda3下的Anacoda Prompt

激活pytorch_gpu环境

输入 activate 你的PyTorch环境名（我的为 activate pytorch_gpu）
如果想退出当前环境，则输入 deactivate

在进入OpenCV本地文件所在的文件夹

注：因为是在pytorch_gpu环境下安装OpenCV，所以先需要激活pytorch_gpu环境，再在该环境下切换到OpenCV本地文件存放的地方

pip install OpenCV

pip install opencv_python-4.1.2+contrib-cp36-cp36m-win_amd64.whl
按你下载的文件名替换，安装即可

1.1.3 OpenCV验证

在pytorch_gpu环境下，输入python 进入代码环境，再输入import cv2。如正常运行，说明安装成功。
在这里插入图片描述

1.2 RuntimeError: CUDA out of memory.

GPU用尽。用的GTX 1650, 4G显存。该显卡性能不够好。

见参考链接3.
人为减小bath_size。
把bath_size由512改为了256，即可解决。

在测试该网络之前，我只遇到了这两个问题。解决了之后，我们就可以继续往下走了。

1.3 所有用到的库

如有类似上述OpenCV的错误，进入激活pytorch_gpu环境，conda install对应库即可。
numpy
pytorch
opencv-python
PIL
tqdm
pandas

2 data.py 数据转录文件

通过函数transData()，将训练集train转录为numpy矩阵，生成data.npy及label.npy

共100个文件夹（即100个汉字），每个文件夹400张图片
data.npy大小为：(40000, 128, 128)
label.npy大小为：(40000, 100)
注：博客为img_size = (256, 256)，GitHub下载的为img_size = (128, 128)。
这并不影响整体理解。

并提供了将测试集test文件夹下图片转录为.npy的函数：loadtestdata()

为了训练时给PyTorch使用，最方便的方法是使用PyTorch做好的loader工具，为此需要实现自己的 data.Dataset。只需继承data.Dataset,并且重写getitem和len两个方法就可以。
Pytorch中有工具函数torch.utils.Data.DataLoader，通过这个函数我们在准备加载数据集使用mini-batch的时候可以使用多线程并行处理，这样可以加快我们准备数据集的速度。
稍微详细的说明，见参考链接4.

不同训练集的格式不同，也对应着不同的转录格式。
这部分就不做详细介绍了。

3 model.py 网络描述文件

定义了网络net，也定义了保存网络参数的函数save_checkpoint和加载网络参数的函数load_checkpoint

3.1 定义网络结构★★★

见参考链接5.

附自己加上的部分注释，代码及注释如下：

# 网络名为net，继承自nn.Moudule的类
class net(nn.Module):
	# 搭建神经网络各层所需要的信息，用于组成前向通道
    def __init__(self):
        super(net, self).__init__()
        
        # 池化
        self.pool = nn.MaxPool2d(2)
        
        # Dropout
        self.drop = nn.Dropout(p=0.5)
        
        # Conv, Norm
        self.conv1 = nn.Conv2d(1, 32, 7, stride=2, padding=3)
        self.norm1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 32, 3, stride=1, padding=1)
        self.norm2 = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32, 64, 3, stride=1, padding=1)
        self.norm3 = nn.BatchNorm2d(64)
        
        # Sequential 是连续操作的写法
        self.convs = nn.Sequential(nn.Conv2d(64, 128, 3, stride=1, padding=1),
                                   nn.BatchNorm2d(128),
                                   # ReLU激活
                                   nn.ReLU(),
                                   nn.Conv2d(128, 128, 3, stride=1, padding=1),
                                   nn.BatchNorm2d(128),
                                   nn.ReLU(),
                                   )
                                   
        # FC
        self.out_layers = nn.Sequential(nn.Linear(128 * 8 * 8, 1024),
                                        nn.BatchNorm1d(1024),
                                        nn.ReLU(),
                                        nn.Linear(1024, 256),
                                        nn.BatchNorm1d(256),
                                        nn.ReLU(),
                                        nn.Linear(256, 100),
                                        nn.BatchNorm1d(100),
                                        nn.ReLU(),
                                        )

	# 神经网络前向通道，反向计算会由PyTorch自动实现
    def forward(self, x):
        x = F.relu(self.norm1(self.conv1(x)))   # Conv BN ReLU
        x = self.pool(x)                        # 池化
        x = F.relu(self.norm2(self.conv2(x)))   # Conv BN ReLU
        x = F.relu(self.norm3(self.conv3(x)))   # Conv BN ReLU
        x = self.pool(x)                        # 池化
        x = self.convs(x)                       # 连续操作：Conv -> BN -> ReLU -> Conv -> BN -> ReLU
        x = self.pool(x)						# 池化
        x = x.view(-1, 128 * 8 * 8)             # 将图像拉直为向量
        x = self.drop(x)                        # Dropout
        x = self.out_layers(x)                  # FC
        return x

为方便理解，假设对应格式为：(n_H,n_W,n_C)，则该网络的处理流程为：
在这里插入图片描述

3.2 保存和读取网络参数

训练时：save_checkpoint将网络参数保存到model_save文件夹下，名字为model_parameters.pth.tar
测试时：load_checkpoint读取对应的文件，加载到网络中

def save_checkpoint(state, save_adress='model_save'):
    name = 'model_parameters.pth.tar'

    folder = os.path.exists(save_adress)
    if not folder:
        os.mkdir(save_adress)
        print('--- create a new folder ---')
    fulladress = save_adress + '\\' + name
    torch.save(state, fulladress)
    print('model saved:', fulladress)

def load_checkpoint(save_adress='model_save'):
    name = 'model_parameters.pth.tar'
    fulladress = save_adress + '\\' + name
    return torch.load(fulladress)

4 train.py 网络训练文件★★★

附自己加上的部分注释，代码及注释如下：

设置epoch，batch_size

n_epoch, batch_size = 10, 512

加载训练数据

训练集train分为两部分：90%的训练集，10%的验证集

trainset = data.TrainSet(eval=False)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
evalset = data.TrainSet(eval=True)
evalloader = torch.utils.data.DataLoader(evalset, batch_size=batch_size, shuffle=True)

调用网络，使用交叉熵损失函数

net = model.net()
if torch.cuda.is_available():
	# 使用GPU加速计算
    net.cuda()
criterion = nn.CrossEntropyLoss()

main()

设置优化器

    optimizer = optim.Adam(net.parameters(), lr=1e-3, betas=(0.9, 0.999), weight_decay=0)

训练模式

见参考链接6.

def train(epoch):
    net.train() # 网络处于训练模式，会导致dropout启用
    correct = 0
    sum = 0
    T = 0
	
	# 加载数据
    for batch_index, (datas, labels) in enumerate(trainloader, 0):
    	# 转化为Tensor
        labels = labels.max(1)[1]
        datas = Variable(datas).float()
        datas = datas.view(-1, 1, 128, 128)
        labels = Variable(labels).long()
        if torch.cuda.is_available():
            datas = datas.cuda()
            labels = labels.cuda()
        
        # 将梯度初始化为零
        optimizer.zero_grad()

		# 网络输出
        outputs = net(datas)

		# 计算损失
        loss = criterion(outputs, labels)

		# 反向求导 
        loss.backward()
		
		# 权值更新
        optimizer.step()
        
        T += 1

		# 列出预测的标签
        pred_choice = outputs.data.max(1)[1]

		# 将预测值与真实值比较
        correct += pred_choice.eq(labels.data).cpu().sum()
        sum += len(labels)
		
		# 打印batch索引，epoch，以及准确率
        print('batch_index: [%d/%d]' % (batch_index, len(trainloader)),
              'Train epoch: [%d]' % (epoch),
              'correct/sum:%d/%d, %.4f' % (correct, sum, correct / sum))

保存网络参数

将训练得到的网络参数保存，用于之后的测试。

		checkpoint = {'epoch': epoch, 'state_dict': net.state_dict(), 'optimizer': optimizer.state_dict()}
        model.save_checkpoint(checkpoint)

验证模式

大致同训练模式。

def eval(epoch):
    net.eval()  # 弯网络处于测试模式，dropout停用，BN放射变换停止
    correct = 0
    sum = 0
    for batch_index, (datas, labels) in enumerate(evalloader, 0):
        labels = labels.max(1)[1]
        datas = Variable(datas).cuda().float()
        datas = datas.view(-1, 1, 128, 128)
        labels = Variable(labels).cuda().long()
        # optimizer.zero_grad()
        outputs = net(datas)
        # loss = criterion(outputs, labels)
        # loss.backward()
        # optimizer.step()

        pred_choice = outputs.data.max(1)[1]
        correct += pred_choice.eq(labels.data).cpu().sum()
        sum += len(labels)
        
        print('batch_index: [%d/%d]' % (batch_index, len(evalloader)),
              'Eval epoch: [%d]' % (epoch),
              'correct/sum:%d/%d, %.4f' % (correct, sum, correct / sum))

5 test.py 网络测试文件★★★

附自己加上的部分注释，代码及注释如下：

读取测试集test的相关属性

test1path = 'test1\\'
trainpath = 'train\\'

filename = os.listdir(test1path)
words = os.listdir(trainpath)   # 按时间排序 从早到晚
words = np.array(words)
testnumber = len(filename)
category_number = len(words)

调用网络，使用验证模式

net = model.net()
if torch.cuda.is_available():
    net.cuda()
net.eval()

main()

读取网络参数

	checkpoint = model.load_checkpoint()
    net.load_state_dict(checkpoint['state_dict'])

加载测试数据

	testdatas = data.loadtestdata()
    testdatas.astype(np.float)

设置验证集图片数量N，batch_size并分块

	n = 0
    N = 10000
    batch_size = 8
    pre = np.array([])
    batch_site = []
    while n < N:
        n += batch_size
        if n < N:
            n1 = n - batch_size
            n2 = n
        else:
            n1 = n2
            n2 = N

        batch_site.append([n1, n2])

按块测试，并保存5个最匹配的汉字的索引

 pred_choice = []
    for site in tqdm(batch_site):
        test_batch = testdatas[site[0]:site[1]]
        test_batch = torch.from_numpy(test_batch)
        datas = Variable(test_batch).cuda().float()
        datas = datas.view(-1, 1, 128, 128)
        outputs = net(datas)
        outputs = outputs.cpu()
        outputs = outputs.data.numpy()

		# 提取出五个最匹配的汉字
        for out in outputs:
            K = 5
            index = np.argpartition(out, -K)[-K:]
            pred_choice.append(index)
    pre = np.array(pred_choice)

对测试集图片遍历，找出匹配索引在训练集中对应的汉字

predicts = []
    for k in range(testnumber):
        index = pre[k]
        predict5 = words[index]
        predict5 = "".join(predict5)
        predicts.append(predict5)

将测试结果保存为.csv，并打印

dataframe = pd.DataFrame({'filename': filename, 'label': predicts})
    dataframe.to_csv("test.csv", index=False, encoding='utf-8')

    read = pd.read_csv('test.csv')
    print(read)