mnist数据算法方案整理第二版
前言
这次将使用BP算法,adam优化器,softmax及其损失函数,动手实现层的封装,关于数据的inplace技术的使用用来缓解内存问题,
一、使用步骤
1.引入库
代码如下:
import numpy as np
import struct
import random
import matplotlib.pyplot as plt
import pandas as pd
import math
2.读入数据
代码如下(示例):
def load_labels(file):
with open(file, "rb") as f:
data = f.read()
magic_number, num_samples = struct.unpack(">ii", data[:8])
if magic_number != 2049: # 0x00000801
print(f"magic number mismatch {magic_number} != 2049")
return None
labels = np.array(list(data[8:]))
return labels
def load_images(file):
with open(file, "rb") as f:
data = f.read()
magic_number, num_samples, image_width, image_height = struct.unpack(">iiii", data[:16])
if magic_number != 2051: # 0x00000803
print(f"magic number mismatch {magic_number} != 2051")
return None
image_data = np.asarray(list(data[16:]), dtype=np.uint8).reshape(num_samples, -1)
return image_data
def one_hot(labels, classes, label_smoothing=0):
n = len(labels)
eoff = label_smoothing / classes
output = np.ones((n, classes), dtype=np.float32) * eoff
for row, label in enumerate(labels):
output[row, label] = 1 - label_smoothing + eoff
return output
关于one_hot我与前文相比我加入了一个标签平滑的条件,是一个在 0 到 1 之间的浮点数。当为 0 时,不进行标签平滑;当大于 0 时,会稍微降低真实标签的概率,并将剩余的概率均匀分配给其他类别。
3,文件读取
val_labels = load_labels("dataset/t10k-labels-idx1-ubyte") # 10000,
val_images = load_images("dataset/t10k-images-idx3-ubyte") # 10000, 784
val_images = val_images / 255 - 0.5
train_labels = load_labels("dataset/train-labels-idx1-ubyte") # 60000,
train_images = load_images("dataset/train-images-idx3-ubyte") # 60000, 784
train_images = train_images / 255 - 0.5
class Dataset:
def __init__(self, images, labels):
self.images = images
self.labels = labels
# 获取他的一个item, dataset = Dataset(), dataset[index]
def __getitem__(self, index):
return self.images[index], self.labels[index]
# 获取数据集的长度,个数
def __len__(self):
return len(self.images)
class DataLoaderIterator:
def __init__(self, dataloader):
self.dataloader = dataloader
self.cursor = 0
self.indexs = list(range(self.dataloader.count_data)) # 0, ... 60000
if self.dataloader.shuffle:
# 打乱一下
random.shuffle(self.indexs)
def __next__(self):
if self.cursor >= self.dataloader.count_data:
raise StopIteration()
batch_data = []
remain = min(self.dataloader.batch_size, self.dataloader.count_data - self.cursor) # 256, 128
for n in range(remain):
index = self.indexs[self.cursor]
data = self.dataloader.dataset[index]
# 如果batch没有初始化,则初始化n个list成员
if len(batch_data) == 0:
batch_data = [[] for i in range(len(data))]
#直接append进去
for index, item in enumerate(data):
batch_data[index].append(item)
self.cursor += 1
# 通过np.vstack一次性实现合并,而非每次一直在合并
for index in range(len(batch_data)):
batch_data[index] = np.vstack(batch_data[index])
return batch_data
class DataLoader:
# shuffle 打乱
def __init__(self, dataset, batch_size, shuffle):
self.dataset = dataset
self.shuffle = shuffle
self.count_data = len(dataset)
self.batch_size = batch_size
def __iter__(self):
return DataLoaderIterator(self)
与前文相比batch_data 的处有一些改变,这样更容易理解。
layer的运用和初始化
#__call__内置函数的应用**
class Module:
def __init__(self,name):
self.name = name
def __call__(self,*arge):
return self.forward(*arge)
#这两个函数属于父类
class Initializer:
def __init__(self,name):
self.name = name
def __call__(self, *arge):
return self.apply(*arge)
#定义高斯函数,定义我们参数初始化
class GuassInitializer(Initializer):
def __init__(self,mu,sigma):
self.mu = mu
self.sigma = sigma
def apply(self,value):
value[...] = np.random.normal(self.mu, self.sigma, value.shape)
#定义参数的形状和经过求导后的参数
class Parameter:
def __init__(self, value):
self.value = value
self.delta = np.zeros_like(value) # 使用 np.zeros_like 创建与 value 相同形状的零数组
def zero_grad(self):
self.delta[...] = 0 # 使用[...]inplace来确保对整个数组进行操作,清零梯度防止内存不够用
hclass LinearLayer(Module):
def __init__(self, input_feature, output_feature):
super().__init__("Linear")
self.input_feature = input_feature
self.output_feature = output_feature
# 初始化权重和偏重
self.weights = Parameter(np.zeros((input_feature, output_feature)))
self.bais = Parameter(np.zeros((1, output_feature)))
# 权重初始化
initer = GuassInitializer(0, 1.0)
initer.apply(self.weights.value)
def forward(self, x):
self.x_save = x.copy()
return x @ self.weights.value + self.bais.value
def backward(self, G):
self.weights.delta = self.x_save.T @ G
self.bais.delta[...] = np.sum(G, 0) #值复制
return G @ self.weights.value.T
#定义激活函数
class ReLULayer(Module):
def __init__(self, inplace=True):
super().__init__("ReLU")
self.inplace = inplace
def forward(self, x):
self.negative_position = x < 0
if not self.inplace:
x = x.copy()
x[self.negative_position] = 0
return x
def backward(self, G):
if not self.inplace:
G = G.copy()
G[self.negative_position] = 0
return G
#定义损失函数和需要传入模型的G
class SigmoidCrossEntropyLayer(Module):
def __init__(self):
super().__init__("SigmoidCrossEntropy")
def sigmoid(self,x):
return 1 / (1 + np.exp(-x))
def forward(self, x, label_onehot):
eps = 1e-4
self.label_onehot = label_onehot
self.predict = self.sigmoid(x)
self.predict = np.clip(self.predict, a_max=1-eps, a_min=eps) # 裁切
self.batch_size = self.predict.shape[0]
return -np.sum(label_onehot * np.log(self.predict) + (1 - label_onehot) *
np.log(1 - self.predict)) / self.batch_size
def backward(self):
return (self.predict - self.label_onehot) / self.batch_size
class SoftmaxCrossEntropyLayer(Module):
def __init__(self):
super().__init__("SoftmaxCrossEntropy")
def softmax(self, x):
return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
def forward(self, x, label_onehot):
eps = 1e-4
self.label_onehot = label_onehot
self.predict = self.softmax(x)
self.predict = np.clip(self.predict, a_max=1-eps, a_min=eps) # 裁切
self.batch_size = self.predict.shape[0]
return -np.sum(label_onehot * np.log(self.predict) + (1 - label_onehot) *
np.log(1 - self.predict)) / self.batch_size
def backward(self):
return (self.predict - self.label_onehot) / self.batch_size
1,优化器的定义与运用
#定义优化器的基类,遍历模型的layers和layers的参数
class Optimizer:
def __init__(self, name, model, lr):
self.name = name
self.model = model
self.lr = lr
layers = []
self.params = []
for attr in model.__dict__:
layer = model.__dict__[attr]
if isinstance(layer, Module):
layers.append(layer)
for layer in layers:
for attr in layer.__dict__:
layer_param = layer.__dict__[attr]
if isinstance(layer_param, Parameter):
self.params.append(layer_param)
#设定梯度清零
def zero_grad(self):
for param in self.params:
param.zero_grad()
#设定学习率
def set_lr(self, lr):
self.lr = lr
class SGD(Optimizer):
def __init__(self,model,lr=1e-3):
super().__init__("SGD", model, lr)
def step(self):
for param in self.params:
param.value -= self.lr * param.delta
#设定adam的类
class Adam(Optimizer):
def __init__(self, model, lr=1e-3, beta1=0.9, beta2=0.999, l2_regularization = 1e-3):
super().__init__("Adam", model, lr)
self.beta1 = beta1
self.beta2 = beta2
self.l2_regularization = l2_regularization
self.t = 0
for param in self.params:
param.m = 0
param.v = 0
# 指数移动平均
def step(self):
eps = 1e-8
self.t += 1
for param in self.params:
g = param.delta
param.m = self.beta1 * param.m + (1 - self.beta1) * g
param.v = self.beta2 * param.v + (1 - self.beta2) * g ** 2
mt_ = param.m / (1 - self.beta1 ** self.t)
vt_ = param.v / (1 - self.beta2 ** self.t)
param.value -= self.lr * mt_ / (np.sqrt(vt_) + eps) + self.l2_regularization * param.value
如果你在一个条件语句中(比如 if 语句)给 layer 赋值,然后在这个条件语句之外引用它,那么就会出现这个错误。但是,从你提供的代码片段来看,并没有直接出现这样的错误,因为 layer 变量是在 for 循环内部定义的,并且只在 if 语句内部被使用。
2,Model初始化
class Model(Module):
def __init__(self, num_feature, num_hidden, num_classes):
super().__init__("Model")
self.input_to_hidden = LinearLayer(num_feature, num_hidden)
self.relu = ReLULayer()
self.hidden_to_output = LinearLayer(num_hidden, num_classes)
def forward(self, x):
x = self.input_to_hidden(x)
x = self.relu(x)
x = self.hidden_to_output(x)
return x
#因为使用的基函数定义了__call__直接顺次执行forwa函数suo
def backward(self, G):
G = self.hidden_to_output.backward(G)
G = self.relu.backward(G)
G = self.input_to_hidden.backward(G)
return G
LinearLayer是确定线性模块的shape,forward向前传递运用bp模型,backward是梯度更新反向传播
训练模型
classes = 10 # 定义10个类别
batch_size = 32 # 定义每个批次的大小
epochs = 20 # 退出策略,也就是最大把所有数据看10次
lr = 1e-3
numdata, data_dims = train_images.shape # 60000, 784
model = Model(data_dims, 256, classes)
train_data = DataLoader(Dataset(train_images,one_hot(train_labels,classes)),batch_size,shuffle = True)
loss_func = SigmoidCrossEntropyLayer()
optim = Adam(model,lr)
iters = 0
for epoch in range(epochs):
#根据上文dataset里面images和laebls是一个元组通过v竖向堆叠在一起
for index,(images,labels) in enumerate(train_data):
x = model(images)
loss = loss_func(x,labels)
optim.zero_grad()
G=loss_func.backward()
model.backward(G)
optim.step()
iters +=1
if iters % 1000 == 0:#注意缩进
print(f"Iter {iters}, {epoch} / {epochs}, Loss {loss:.3f}, LR {lr:g}")
val_accuracy, val_loss = estimate_val(model(val_images), val_labels, classes, loss_func)
print(f"Val set, Accuracy: {val_accuracy:.3f}, Loss: {val_loss:.3f}")
因为adam后续不需要lr的变换所以我直接给了一个简单的参数lr,如果要进行lr的调节可以在训练过程中添加,需要预先设定
Iter 1000, 0 / 20, Loss 0.907, LR 0.001
Val set, Accuracy: 0.916, Loss: 0.707
Iter 2000, 1 / 20, Loss 0.748, LR 0.001
Iter 3000, 1 / 20, Loss 0.568, LR 0.001
Val set, Accuracy: 0.927, Loss: 0.584
Iter 4000, 2 / 20, Loss 0.299, LR 0.001
Iter 5000, 2 / 20, Loss 0.432, LR 0.001
Val set, Accuracy: 0.939, Loss: 0.504
Iter 6000, 3 / 20, Loss 0.450, LR 0.001
Iter 7000, 3 / 20, Loss 0.478, LR 0.001
Val set, Accuracy: 0.939, Loss: 0.472
Iter 8000, 4 / 20, Loss 0.411, LR 0.001
Iter 9000, 4 / 20, Loss 0.640, LR 0.001
Val set, Accuracy: 0.945, Loss: 0.431
Iter 10000, 5 / 20, Loss 0.380, LR 0.001
Iter 11000, 5 / 20, Loss 0.570, LR 0.001
Val set, Accuracy: 0.943, Loss: 0.417
Iter 12000, 6 / 20, Loss 0.208, LR 0.001
Iter 13000, 6 / 20, Loss 0.401, LR 0.001
Val set, Accuracy: 0.945, Loss: 0.424
Iter 14000, 7 / 20, Loss 0.629, LR 0.001
Iter 15000, 7 / 20, Loss 0.212, LR 0.001
Val set, Accuracy: 0.947, Loss: 0.401
Iter 16000, 8 / 20, Loss 0.258, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.410
Iter 17000, 9 / 20, Loss 0.613, LR 0.001
Iter 18000, 9 / 20, Loss 0.545, LR 0.001
Val set, Accuracy: 0.946, Loss: 0.408
Iter 19000, 10 / 20, Loss 0.309, LR 0.001
Iter 20000, 10 / 20, Loss 0.531, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.388
Iter 21000, 11 / 20, Loss 0.765, LR 0.001
Iter 22000, 11 / 20, Loss 0.213, LR 0.001
Val set, Accuracy: 0.942, Loss: 0.429
Iter 23000, 12 / 20, Loss 0.648, LR 0.001
Iter 24000, 12 / 20, Loss 0.390, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.410
Iter 25000, 13 / 20, Loss 0.404, LR 0.001
Iter 26000, 13 / 20, Loss 0.576, LR 0.001
Val set, Accuracy: 0.942, Loss: 0.440
Iter 27000, 14 / 20, Loss 0.184, LR 0.001
Iter 28000, 14 / 20, Loss 0.263, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.420
Iter 29000, 15 / 20, Loss 0.645, LR 0.001
Iter 30000, 15 / 20, Loss 0.655, LR 0.001
Val set, Accuracy: 0.949, Loss: 0.398
Iter 31000, 16 / 20, Loss 0.441, LR 0.001
Val set, Accuracy: 0.951, Loss: 0.393
Iter 32000, 17 / 20, Loss 0.446, LR 0.001
Iter 33000, 17 / 20, Loss 0.709, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.390
Iter 34000, 18 / 20, Loss 0.475, LR 0.001
Iter 35000, 18 / 20, Loss 0.310, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.408
Iter 36000, 19 / 20, Loss 0.425, LR 0.001
Iter 37000, 19 / 20, Loss 0.386, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.392
总结
这次进行了模型的封装下一次我将进行模型的调优。我下一次会尝试别的模型,优化器,初始参数设置,正则化
。