mnist改进加封装

本文详细介绍了如何使用BP算法和Adam优化器对MNIST数据集进行深度学习模型训练,包括数据预处理、层的封装(如线性层、ReLU层、激活函数等)、优化器的定义与应用以及模型训练过程。作者还展示了如何处理内存问题和模型参数初始化的方法。
摘要由CSDN通过智能技术生成


前言

这次将使用BP算法,adam优化器,softmax及其损失函数,动手实现层的封装,关于数据的inplace技术的使用用来缓解内存问题,


一、使用步骤

1.引入库

代码如下:

import numpy as np
import struct
import random
import matplotlib.pyplot as plt
import pandas as pd
import math

2.读入数据

代码如下(示例):

def load_labels(file):
    with open(file, "rb") as f:
        data = f.read()
    
    magic_number, num_samples = struct.unpack(">ii", data[:8])
    if magic_number != 2049:   # 0x00000801
        print(f"magic number mismatch {magic_number} != 2049")
        return None
    
    labels = np.array(list(data[8:]))
    return labels

def load_images(file):
    with open(file, "rb") as f:
        data = f.read()

    magic_number, num_samples, image_width, image_height = struct.unpack(">iiii", data[:16])
    if magic_number != 2051:   # 0x00000803
        print(f"magic number mismatch {magic_number} != 2051")
        return None
    
    image_data = np.asarray(list(data[16:]), dtype=np.uint8).reshape(num_samples, -1)
    return image_data

def one_hot(labels, classes, label_smoothing=0):
    n = len(labels)
    eoff = label_smoothing / classes
    output = np.ones((n, classes), dtype=np.float32) * eoff
    for row, label in enumerate(labels):
        output[row, label] = 1 - label_smoothing + eoff
    return output

关于one_hot我与前文相比我加入了一个标签平滑的条件,是一个在 0 到 1 之间的浮点数。当为 0 时,不进行标签平滑;当大于 0 时,会稍微降低真实标签的概率,并将剩余的概率均匀分配给其他类别。


3,文件读取

val_labels = load_labels("dataset/t10k-labels-idx1-ubyte")   #  10000,
val_images = load_images("dataset/t10k-images-idx3-ubyte")   #  10000, 784
val_images = val_images / 255 - 0.5

train_labels = load_labels("dataset/train-labels-idx1-ubyte") # 60000,
train_images = load_images("dataset/train-images-idx3-ubyte") # 60000, 784
train_images = train_images / 255 - 0.5
class Dataset:
    def __init__(self, images, labels):
        self.images = images
        self.labels = labels
        
    # 获取他的一个item,  dataset = Dataset(),   dataset[index]
    def __getitem__(self, index):
        return self.images[index], self.labels[index]
    
    # 获取数据集的长度,个数
    def __len__(self):
        return len(self.images)
    
class DataLoaderIterator:
    def __init__(self, dataloader):
        self.dataloader = dataloader
        self.cursor = 0
        self.indexs = list(range(self.dataloader.count_data))  # 0, ... 60000
        if self.dataloader.shuffle:
            # 打乱一下
            random.shuffle(self.indexs)
            
    def __next__(self):
        if self.cursor >= self.dataloader.count_data:
            raise StopIteration()
            
        batch_data = []
        remain = min(self.dataloader.batch_size, self.dataloader.count_data - self.cursor)  #  256, 128
        for n in range(remain):
            index = self.indexs[self.cursor]
            data = self.dataloader.dataset[index]
            
            # 如果batch没有初始化,则初始化n个list成员
            if len(batch_data) == 0:
                batch_data = [[] for i in range(len(data))]
                
            #直接append进去
            for index, item in enumerate(data):
                batch_data[index].append(item)
            self.cursor += 1
            
        # 通过np.vstack一次性实现合并,而非每次一直在合并
        for index in range(len(batch_data)):
            batch_data[index] = np.vstack(batch_data[index])
        return batch_data

class DataLoader:
    
    # shuffle 打乱
    def __init__(self, dataset, batch_size, shuffle):
        self.dataset = dataset
        self.shuffle = shuffle
        self.count_data = len(dataset)
        self.batch_size = batch_size
        
    def __iter__(self):
        return DataLoaderIterator(self)

与前文相比batch_data 的处有一些改变,这样更容易理解。

layer的运用和初始化

#__call__内置函数的应用**
class Module:
    def __init__(self,name):
        self.name = name
    def __call__(self,*arge):
        return self.forward(*arge)
#这两个函数属于父类
class Initializer:
    def __init__(self,name):
        self.name = name
        
    def __call__(self, *arge):
        return self.apply(*arge)
#定义高斯函数,定义我们参数初始化    
class GuassInitializer(Initializer):
    def __init__(self,mu,sigma):
        self.mu = mu
        self.sigma = sigma
    def apply(self,value):
        value[...] = np.random.normal(self.mu, self.sigma, value.shape)
 #定义参数的形状和经过求导后的参数
class Parameter:  
    def __init__(self, value):  
        self.value = value  
        self.delta = np.zeros_like(value)  # 使用 np.zeros_like 创建与 value 相同形状的零数组  

    def zero_grad(self):  
        self.delta[...] = 0  # 使用[...]inplace来确保对整个数组进行操作,清零梯度防止内存不够用
        
        
hclass LinearLayer(Module):  
    def __init__(self, input_feature, output_feature):  
        super().__init__("Linear")  
        self.input_feature = input_feature  
        self.output_feature = output_feature  
        # 初始化权重和偏重
        self.weights = Parameter(np.zeros((input_feature, output_feature)))
        self.bais = Parameter(np.zeros((1, output_feature)))
        # 权重初始化 
        initer = GuassInitializer(0, 1.0)
        initer.apply(self.weights.value)
    def forward(self, x):
        self.x_save = x.copy()
        return x @ self.weights.value + self.bais.value

    def backward(self, G):
        self.weights.delta = self.x_save.T @ G       
        self.bais.delta[...] = np.sum(G, 0)  #值复制            
        return G @ self.weights.value.T                                                                       
        
        
 #定义激活函数      
class ReLULayer(Module):
    def __init__(self, inplace=True):
        super().__init__("ReLU")
        self.inplace = inplace
        
    def forward(self, x):
        self.negative_position = x < 0
        if not self.inplace:
            x = x.copy()
            
        x[self.negative_position] = 0
        return x
    
    def backward(self, G):
        if not self.inplace:
            G = G.copy()
            
        G[self.negative_position] = 0
        return G
        
        
        
        
        
#定义损失函数和需要传入模型的G    
class SigmoidCrossEntropyLayer(Module):
    def __init__(self):
        super().__init__("SigmoidCrossEntropy")
    def sigmoid(self,x):
        return 1 / (1 + np.exp(-x))
    def forward(self, x, label_onehot):
        eps = 1e-4
        self.label_onehot = label_onehot
        self.predict = self.sigmoid(x)
        self.predict = np.clip(self.predict, a_max=1-eps, a_min=eps)  # 裁切
        self.batch_size = self.predict.shape[0]
        return -np.sum(label_onehot * np.log(self.predict) + (1 - label_onehot) * 
                        np.log(1 - self.predict)) / self.batch_size
    
    def backward(self):
        return (self.predict - self.label_onehot) / self.batch_size
    
class SoftmaxCrossEntropyLayer(Module):
    def __init__(self):
        super().__init__("SoftmaxCrossEntropy")
        
    def softmax(self, x):
        return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)

    def forward(self, x, label_onehot):
        eps = 1e-4
        self.label_onehot = label_onehot
        self.predict = self.softmax(x)
        self.predict = np.clip(self.predict, a_max=1-eps, a_min=eps)  # 裁切
        self.batch_size = self.predict.shape[0]
        return -np.sum(label_onehot * np.log(self.predict) + (1 - label_onehot) * 
                        np.log(1 - self.predict)) / self.batch_size
    
    def backward(self):
        return (self.predict - self.label_onehot) / self.batch_size

1,优化器的定义与运用

#定义优化器的基类,遍历模型的layers和layers的参数
class Optimizer:
    def __init__(self, name, model, lr):
        self.name = name
        self.model = model
        self.lr = lr
        
        layers = []
        self.params = []
        for attr in model.__dict__:
            layer = model.__dict__[attr]
            if isinstance(layer, Module):
                layers.append(layer)
                
        for layer in layers:
            for attr in layer.__dict__:
                layer_param = layer.__dict__[attr]
                if isinstance(layer_param, Parameter):
                    self.params.append(layer_param)
#设定梯度清零
    def zero_grad(self):
        for param in self.params:
            param.zero_grad()
#设定学习率            
    def set_lr(self, lr):
        self.lr = lr

class SGD(Optimizer):
    def __init__(self,model,lr=1e-3):
        super().__init__("SGD", model, lr)
        
    def step(self):
        for param in self.params:
            param.value -= self.lr * param.delta
#设定adam的类        
class Adam(Optimizer):
    def __init__(self, model, lr=1e-3, beta1=0.9, beta2=0.999, l2_regularization = 1e-3):
        super().__init__("Adam", model, lr)
        self.beta1 = beta1
        self.beta2 = beta2
        self.l2_regularization = l2_regularization
        self.t = 0
        
        for param in self.params:
            param.m = 0
            param.v = 0
            
    # 指数移动平均
    def step(self):
        eps = 1e-8
        self.t += 1
        for param in self.params:
            g = param.delta
            param.m = self.beta1 * param.m + (1 - self.beta1) * g
            param.v = self.beta2 * param.v + (1 - self.beta2) * g ** 2
            mt_ = param.m / (1 - self.beta1 ** self.t)
            vt_ = param.v / (1 - self.beta2 ** self.t)
            param.value -= self.lr * mt_ / (np.sqrt(vt_) + eps) + self.l2_regularization * param.value

如果你在一个条件语句中(比如 if 语句)给 layer 赋值,然后在这个条件语句之外引用它,那么就会出现这个错误。但是,从你提供的代码片段来看,并没有直接出现这样的错误,因为 layer 变量是在 for 循环内部定义的,并且只在 if 语句内部被使用。

2,Model初始化

class Model(Module):
    def __init__(self, num_feature, num_hidden, num_classes):
        super().__init__("Model")
        self.input_to_hidden = LinearLayer(num_feature, num_hidden)
        self.relu = ReLULayer()
        self.hidden_to_output = LinearLayer(num_hidden, num_classes)
        
    def forward(self, x):
        x = self.input_to_hidden(x)
        x = self.relu(x)
        x = self.hidden_to_output(x)
        return x
#因为使用的基函数定义了__call__直接顺次执行forwa函数suo
    def backward(self, G):
        G = self.hidden_to_output.backward(G)
        G = self.relu.backward(G)
        G = self.input_to_hidden.backward(G)
        return G

LinearLayer是确定线性模块的shape,forward向前传递运用bp模型,backward是梯度更新反向传播

训练模型

classes = 10                  # 定义10个类别
batch_size = 32              # 定义每个批次的大小
epochs = 20                   # 退出策略,也就是最大把所有数据看10次
lr = 1e-3
numdata, data_dims = train_images.shape  # 60000, 784
model = Model(data_dims, 256, classes)
train_data = DataLoader(Dataset(train_images,one_hot(train_labels,classes)),batch_size,shuffle = True)
loss_func = SigmoidCrossEntropyLayer()
optim = Adam(model,lr)
iters = 0
for epoch in range(epochs):
    #根据上文dataset里面images和laebls是一个元组通过v竖向堆叠在一起
    for index,(images,labels) in enumerate(train_data):
        x = model(images)
        loss = loss_func(x,labels)
        optim.zero_grad()
        G=loss_func.backward()
        model.backward(G)
        optim.step()
        iters +=1
        if iters % 1000 == 0:#注意缩进
            print(f"Iter {iters}, {epoch} / {epochs}, Loss {loss:.3f}, LR {lr:g}")
            
    val_accuracy, val_loss = estimate_val(model(val_images), val_labels, classes, loss_func)
    print(f"Val set, Accuracy: {val_accuracy:.3f}, Loss: {val_loss:.3f}")

因为adam后续不需要lr的变换所以我直接给了一个简单的参数lr,如果要进行lr的调节可以在训练过程中添加,需要预先设定

Iter 1000, 0 / 20, Loss 0.907, LR 0.001
Val set, Accuracy: 0.916, Loss: 0.707
Iter 2000, 1 / 20, Loss 0.748, LR 0.001
Iter 3000, 1 / 20, Loss 0.568, LR 0.001
Val set, Accuracy: 0.927, Loss: 0.584
Iter 4000, 2 / 20, Loss 0.299, LR 0.001
Iter 5000, 2 / 20, Loss 0.432, LR 0.001
Val set, Accuracy: 0.939, Loss: 0.504
Iter 6000, 3 / 20, Loss 0.450, LR 0.001
Iter 7000, 3 / 20, Loss 0.478, LR 0.001
Val set, Accuracy: 0.939, Loss: 0.472
Iter 8000, 4 / 20, Loss 0.411, LR 0.001
Iter 9000, 4 / 20, Loss 0.640, LR 0.001
Val set, Accuracy: 0.945, Loss: 0.431
Iter 10000, 5 / 20, Loss 0.380, LR 0.001
Iter 11000, 5 / 20, Loss 0.570, LR 0.001
Val set, Accuracy: 0.943, Loss: 0.417
Iter 12000, 6 / 20, Loss 0.208, LR 0.001
Iter 13000, 6 / 20, Loss 0.401, LR 0.001
Val set, Accuracy: 0.945, Loss: 0.424
Iter 14000, 7 / 20, Loss 0.629, LR 0.001
Iter 15000, 7 / 20, Loss 0.212, LR 0.001
Val set, Accuracy: 0.947, Loss: 0.401
Iter 16000, 8 / 20, Loss 0.258, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.410
Iter 17000, 9 / 20, Loss 0.613, LR 0.001
Iter 18000, 9 / 20, Loss 0.545, LR 0.001
Val set, Accuracy: 0.946, Loss: 0.408
Iter 19000, 10 / 20, Loss 0.309, LR 0.001
Iter 20000, 10 / 20, Loss 0.531, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.388
Iter 21000, 11 / 20, Loss 0.765, LR 0.001
Iter 22000, 11 / 20, Loss 0.213, LR 0.001
Val set, Accuracy: 0.942, Loss: 0.429
Iter 23000, 12 / 20, Loss 0.648, LR 0.001
Iter 24000, 12 / 20, Loss 0.390, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.410
Iter 25000, 13 / 20, Loss 0.404, LR 0.001
Iter 26000, 13 / 20, Loss 0.576, LR 0.001
Val set, Accuracy: 0.942, Loss: 0.440
Iter 27000, 14 / 20, Loss 0.184, LR 0.001
Iter 28000, 14 / 20, Loss 0.263, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.420
Iter 29000, 15 / 20, Loss 0.645, LR 0.001
Iter 30000, 15 / 20, Loss 0.655, LR 0.001
Val set, Accuracy: 0.949, Loss: 0.398
Iter 31000, 16 / 20, Loss 0.441, LR 0.001
Val set, Accuracy: 0.951, Loss: 0.393
Iter 32000, 17 / 20, Loss 0.446, LR 0.001
Iter 33000, 17 / 20, Loss 0.709, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.390
Iter 34000, 18 / 20, Loss 0.475, LR 0.001
Iter 35000, 18 / 20, Loss 0.310, LR 0.001
Val set, Accuracy: 0.948, Loss: 0.408
Iter 36000, 19 / 20, Loss 0.425, LR 0.001
Iter 37000, 19 / 20, Loss 0.386, LR 0.001
Val set, Accuracy: 0.950, Loss: 0.392

总结

这次进行了模型的封装下一次我将进行模型的调优。我下一次会尝试别的模型,优化器,初始参数设置,正则化
在这里插入图片描述在这里插入图片描述

  • 4
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值