Paddle的编程方式
动态图编程方式是一种Python风格的编程方式,其特点是易于与数据交互,拥有该特点的框架最有名的是pytorch。
此外,与此相对的,有静态图编程方式,启动一个会议session,然后在session中run optimizer,accuracy等等,然后编写对应的feed字典,这种方式的特点是易于部署,拥有该特点的框架最有名的是tensorflow。
而paddle既可以用动态图方式编程,也可以用静态图方式编程,二者的模型也可以相互转化,本文使用的方式是动态图编程。
自动混合精度训练方式
自动混合精度训练方式是百度和Nvidia在2018年发表的论文:
MIX PRECISION TRAINING中提出的训练方法。
其实就是在训练中同时使用单精度(float16)和半精度(float8)两种数据类型。其目的是在保持精度持平(经笔者实验发现,精度还是有所降低的)的前提下,加速训练以及降低显存的消耗。
数据集CIFAR10(npy格式)
https://pan.baidu.com/s/1mwieRKLyflrpo7axOrOong
提取码:1234
模型
很随便写的一个普通CNN
代码
import os
from PIL import Image
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import paddle
from paddle import nn
def load_data(normalization=True):
train_xs = np.load(r'D:\pytorch\lab\dataset\datasets\cifar-10\train_batches.npy', allow_pickle=True).astype('float32')
train_ys = np.load(r'D:\pytorch\lab\dataset\datasets\cifar-10\train_labels.npy', allow_pickle=True).astype('float32')
test_xs = np.load(r'D:\pytorch\lab\dataset\datasets\cifar-10\test_batches.npy', allow_pickle=True).astype('float32')
test_ys = np.load(r'D:\pytorch\lab\dataset\datasets\cifar-10\test_labels.npy', allow_pickle=True).astype('float32')
train_xs = np.transpose(train_xs, [0, 3, 1, 2])
test_xs = np.transpose(test_xs, [0, 3, 1, 2])
train_ys = np.argmax(train_ys, axis=1)
test_ys = np.argmax(test_ys, axis=1)
if normalization: # 这里只是防止上溢和下溢,没有使得各个通道分布都归一((x-mean)/(max-min)),只是懒得操作
train_xs = train_xs / 255.
test_xs = test_xs / 255.
return (train_xs[:10000], train_ys[:10000]), (test_xs[:1000], test_ys[:1000])
class CNN(nn.Layer):
def __init__(self, num_classes):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2D(3, 32, kernel_size=3, padding=1, stride=1),
nn.BatchNorm2D(32),
nn.ReLU(),
nn.MaxPool2D(kernel_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2D(32, 64, kernel_size=3, padding=1, stride=1),
nn.BatchNorm2D(64),
nn.ReLU(),
nn.MaxPool2D(kernel_size=2, stride=2)
)
self.layer3 = nn.Sequential(
nn.Conv2D(64, 96, kernel_size=3, padding=1, stride=1),
nn.BatchNorm2D(96),
nn.ReLU(),
nn.MaxPool2D(kernel_size=2, stride=2)
)
self.gap = nn.AdaptiveAvgPool2D((1, 1))
self.layer4 = nn.Linear(96, 96)
self.layer5 = nn.Linear(96, 128)
self.layer6 = nn.Linear(128, num_classes)
def forward(self, x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.gap(x).reshape((x.shape[0], x.shape[1]))
x = self.layer4(x)
x = self.layer5(x)
x = self.layer6(x)
return x
if __name__ == '__main__':
# 超参数
lr = 1e-3
batch_size = 32
epochs = 10
weight_decay_rate = 1e-4
# 加载数据
(x_train, y_train),(x_valid, y_valid) = load_data(normalization=True)
# 载入模型
model = CNN(num_classes=10)
#model = paddle.load('my_model')
# 定义损失函数和优化函数,paddle的AMP(自动混合精度训练)对优化器和损失函数换点操作就可以了,可以在保持精度前提下,加速训练和降低显存消耗
cost = nn.CrossEntropyLoss()
losses = []
optimizer = paddle.optimizer.Adam(lr, parameters=model.parameters(), weight_decay=weight_decay_rate)
# 定义缩放器,用于缩放Loss比例,避免浮点数溢出
scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
# 定义精度的测量
total_train_steps = x_train.shape[0]/batch_size
total_valid_steps = x_valid.shape[0]/batch_size
accs_train = 0
accs_valid = 0
train_accs_list = []
valid_accs_list = []
# 训练
losses = []
train_steps = len(x_train)//batch_size if len(x_train)%batch_size==0 else len(x_train)//batch_size+1
valid_steps = len(x_valid)//batch_size if len(x_valid)%batch_size==0 else len(x_valid)//batch_size+1
for epoch in range(epochs):
model.train()
for idx, step in enumerate(range(train_steps)):
start = idx*batch_size%len(x_train)
stop = start+batch_size if start<=len(x_train)-batch_size else None
x_batch = paddle.to_tensor(x_train[start:stop])
y_batch = paddle.to_tensor(y_train[start:stop])
# 前向计算
y_pred = model(x_batch)
loss = cost(y_pred, label=y_batch)
losses.append(loss.numpy().item())
# 反向传播
scaled_loss = scaler.scale(loss)
scaled_loss.backward()
scaler.minimize(optimizer, scaled_loss)
optimizer.clear_grad()
# 精度累加
accs_train += ((y_pred.argmax(axis=1).numpy()==y_batch.numpy()).sum())/batch_size
model.eval()
for idx, step in enumerate(range(valid_steps)):
start = idx*batch_size%len(x_train)
stop = start+batch_size if start<=len(x_train)-batch_size else None
x_batch = paddle.to_tensor(x_valid[start:stop])
y_batch = paddle.to_tensor(y_valid[start:stop])
accs_valid += ((model(x_batch).argmax(axis=1).numpy()==y_batch.numpy()).sum())/batch_size
acc_train = accs_train / total_train_steps
acc_valid = accs_valid / total_valid_steps
print('Epoch[{}/{}],Train Accuracy = {:.4f} , Valid Accuracy = {:.4f} \n'.format(epoch+1,epochs, acc_train, acc_valid))
accs_train = 0
accs_valid = 0
# 保存模型
paddle.save(model, 'my_model')
# 绘制损失曲线
plt.plot(losses)
plt.xlabel('Step')
plt.ylabel('Loss')
plt.show()
结果与讨论
不用AMP时的损失曲线和训练过程
使用了AMP后的损失曲线和训练过程
使用了AMP后,可以明显感受到训练变快了,同时比较损失曲线可以看出,模型的收敛速度并没有变快,只是单纯地提高计算速度。但是比较训练过程可知,不管是训练集精度还是验证集精度,使用了AMP后都有所下降,所以…,看各位的需求使用把。