昇思25天学习打卡营第25天 | 基于MobileNetv2的垃圾分类——训练与测试部分

最新推荐文章于 2024-07-25 22:24:14 发布

m0_55944995

最新推荐文章于 2024-07-25 22:24:14 发布

阅读量623

点赞数 19

文章标签：学习

本文链接：https://blog.csdn.net/m0_55944995/article/details/140409050

版权

续上节

4、MobileNetV2模型的训练与测试

TIPS:常用的学习率下降策略：

固定学习率（Fixed Learning Rate）:
- 学习率在整个训练过程中保持不变。
分段常数学习率（Piecewise Constant Learning Rate）:
- 在不同的训练阶段设置不同的固定学习率。
指数衰减学习率（Exponential Decay）:
- 学习率以指数形式衰减，例如：lr = lr0 * exp(-kt)，其中k是衰减常数，t是迭代次数。
自然指数衰减学习率（Natural Exponential Decay）:
- 类似于指数衰减，但使用自然指数底数e。
逆时间衰减学习率（Inverse Time Decay）:
- 学习率与迭代次数成反比，例如：lr = lr0 / (1 + k * t)。
多项式衰减学习率（Polynomial Decay）:
- 学习率按照多项式函数衰减，例如：lr = (lr0 - lr1) * (1 - t / T)^p + lr1，其中T是总迭代次数，p是多项式幂。
步进衰减学习率（Step Decay）:
- 每隔一定的迭代次数，学习率乘以一个衰减因子。
余弦退火衰减学习率（Cosine Annealing）:
- 学习率按照余弦函数周期性变化。
线性学习率预热（Linear Warmup）:
- 在训练开始时线性增加学习率，直到达到预设的学习率。
循环学习率（Cyclical Learning Rate, CLR）:
- 学习率在两个边界之间循环变化。
学习率范围测试（Learning Rate Range Test）:
- 通过在一定范围内变化学习率来观察损失函数的变化，以确定合适的学习率。
自适应学习率算法:
- 如Adam、RMSprop、Adagrad等，它们根据训练过程中的统计信息自动调整学习率。

训练策略

一般情况下，模型训练时采用静态学习率，如0.01。随着训练步数的增加，模型逐渐趋于收敛，对权重参数的更新幅度应该逐渐降低，以减小模型训练后期的抖动。所以，模型训练时可以采用动态下降的学习率。

这里使用cosine decay下降策略：

def cosine_decay(total_steps, lr_init=0.0, lr_end=0.0, lr_max=0.1, warmup_steps=0):
    """
    Applies cosine decay to generate learning rate array.

    Args:
       total_steps(int): all steps in training.
       lr_init(float): init learning rate.
       lr_end(float): end learning rate
       lr_max(float): max learning rate.
       warmup_steps(int): all steps in warmup epochs.

    Returns:
       list, learning rate array.
    """
    lr_init, lr_end, lr_max = float(lr_init), float(lr_end), float(lr_max)
    decay_steps = total_steps - warmup_steps
    lr_all_steps = []
    inc_per_step = (lr_max - lr_init) / warmup_steps if warmup_steps else 0
    for i in range(total_steps):
        if i < warmup_steps:
            lr = lr_init + inc_per_step * (i + 1)
        else:
            cosine_decay = 0.5 * (1 + math.cos(math.pi * (i - warmup_steps) / decay_steps))
            lr = (lr_max - lr_end) * cosine_decay + lr_end
        lr_all_steps.append(lr)

    return lr_all_steps

在模型训练过程中，可以添加检查点（Checkpoint）用于保存模型的参数，以便进行推理及中断后再训练使用。使用场景如下：

训练后推理场景

模型训练完毕后保存模型的参数，用于推理或预测操作。
训练过程中，通过实时验证精度，把精度最高的模型参数保存下来，用于预测操作。

再训练场景

进行长时间训练任务时，保存训练过程中的Checkpoint文件，防止任务异常退出后从初始状态开始训练。
Fine-tuning（微调）场景，即训练一个模型并保存参数，基于该模型，面向第二个类似任务进行模型训练。

这里加载ImageNet数据上预训练的MobileNetv2进行Fine-tuning，只训练最后修改的FC层，并在训练过程中保存Checkpoint。

def switch_precision(net, data_type):
    if ms.get_context('device_target') == "Ascend":
        net.to_float(data_type)
        for _, cell in net.cells_and_names():
            if isinstance(cell, nn.Dense):
                cell.to_float(ms.float32)

模型训练与测试

在进行正式的训练之前，定义训练函数，读取数据并对模型进行实例化，定义优化器和损失函数。

首先简单介绍损失函数及优化器的概念：

损失函数：又叫目标函数，用于衡量预测值与实际值差异的程度。深度学习通过不停地迭代来缩小损失函数的值。定义一个好的损失函数，可以有效提高模型的性能。
优化器：用于最小化损失函数，从而在训练过程中改进模型。

定义了损失函数后，可以得到损失函数关于权重的梯度。梯度用于指示优化器优化权重的方向，以提高模型性能。

在训练MobileNetV2之前对MobileNetV2Backbone层的参数进行了固定，使其在训练过程中对该模块的权重参数不进行更新；只对MobileNetV2Head模块的参数进行更新。

MindSpore支持的损失函数有SoftmaxCrossEntropyWithLogits、L1Loss、MSELoss等。这里使用SoftmaxCrossEntropyWithLogits损失函数。

训练测试过程中会打印loss值，loss值会波动，但总体来说loss值会逐步减小，精度逐步提高。每个人运行的loss值有一定随机性，不一定完全相同。

每打印一个epoch后模型都会在测试集上的计算测试精度，从打印的精度值分析MobileNetV2模型的预测能力在不断提升。

from mindspore.amp import FixedLossScaleManager
import time
LOSS_SCALE = 1024

train_dataset = create_dataset(dataset_path=config.dataset_path, config=config)
eval_dataset = create_dataset(dataset_path=config.dataset_path, config=config)
step_size = train_dataset.get_dataset_size()
    
backbone = MobileNetV2Backbone() #last_channel=config.backbone_out_channels
# Freeze parameters of backbone. You can comment these two lines.
for param in backbone.get_parameters():
    param.requires_grad = False
# load parameters from pretrained model
load_checkpoint(config.pretrained_ckpt, backbone)

head = MobileNetV2Head(input_channel=backbone.out_channels, num_classes=config.num_classes)
network = mobilenet_v2(backbone, head)

# define loss, optimizer, and model
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')
loss_scale = FixedLossScaleManager(LOSS_SCALE, drop_overflow_update=False)
lrs = cosine_decay(config.epochs * step_size, lr_max=config.lr_max)
opt = nn.Momentum(network.trainable_params(), lrs, config.momentum, config.weight_decay, loss_scale=LOSS_SCALE)

# 定义用于训练的train_loop函数。
def train_loop(model, dataset, loss_fn, optimizer):
    # 定义正向计算函数
    def forward_fn(data, label):
        logits = model(data)
        loss = loss_fn(logits, label)
        return loss

    # 定义微分函数，使用mindspore.value_and_grad获得微分函数grad_fn,输出loss和梯度。
    # 由于是对模型参数求导,grad_position 配置为None，传入可训练参数。
    grad_fn = ms.value_and_grad(forward_fn, None, optimizer.parameters)

    # 定义 one-step training函数
    def train_step(data, label):
        loss, grads = grad_fn(data, label)
        optimizer(grads)
        return loss

    size = dataset.get_dataset_size()
    model.set_train()
    for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
        loss = train_step(data, label)

        if batch % 10 == 0:
            loss, current = loss.asnumpy(), batch
            print(f"loss: {loss:>7f}  [{current:>3d}/{size:>3d}]")

# 定义用于测试的test_loop函数。
def test_loop(model, dataset, loss_fn):
    num_batches = dataset.get_dataset_size()
    model.set_train(False)
    total, test_loss, correct = 0, 0, 0
    for data, label in dataset.create_tuple_iterator():
        pred = model(data)
        total += len(data)
        test_loss += loss_fn(pred, label).asnumpy()
        correct += (pred.argmax(1) == label).asnumpy().sum()
    test_loss /= num_batches
    correct /= total
    print(f"Test: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

print("============== Starting Training ==============")
# 由于时间问题，训练过程只进行了2个epoch ，可以根据需求调整。
epoch_begin_time = time.time()
epochs = 2
for t in range(epochs):
    begin_time = time.time()
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(network, train_dataset, loss, opt)
    ms.save_checkpoint(network, "save_mobilenetV2_model.ckpt")
    end_time = time.time()
    times = end_time - begin_time
    print(f"per epoch time: {times}s")
    test_loop(network, eval_dataset, loss)
epoch_end_time = time.time()
times = epoch_end_time - epoch_begin_time
print(f"total time:  {times}s")
print("============== Training Success ==============")

5、模型推理

加载模型Checkpoint进行推理，使用load_checkpoint接口加载数据时，需要把数据传入给原始网络，而不能传递给带有优化器和损失函数的训练网络。

CKPT="save_mobilenetV2_model.ckpt"

def image_process(image):
    """Precess one image per time.
    
    Args:
        image: shape (H, W, C)
    """
    mean=[0.485*255, 0.456*255, 0.406*255]
    std=[0.229*255, 0.224*255, 0.225*255]
    image = (np.array(image) - mean) / std
    image = image.transpose((2,0,1))
    img_tensor = Tensor(np.array([image], np.float32))
    return img_tensor

def infer_one(network, image_path):
    image = Image.open(image_path).resize((config.image_height, config.image_width))
    logits = network(image_process(image))
    pred = np.argmax(logits.asnumpy(), axis=1)[0]
    print(image_path, class_en[pred])

def infer():
    backbone = MobileNetV2Backbone(last_channel=config.backbone_out_channels)
    head = MobileNetV2Head(input_channel=backbone.out_channels, num_classes=config.num_classes)
    network = mobilenet_v2(backbone, head)
    load_checkpoint(CKPT, network)
    for i in range(91, 100):
        infer_one(network, f'data_en/test/Cardboard/000{i}.jpg')
infer()

2024-07-13 16:09:59 Mindstorm

m0_55944995

关注

19
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
昇思25天学习打卡营第25天 | 基于MobileNetv2的垃圾分类——训练与测试部分

随着训练步数的增加，模型逐渐趋于收敛，对权重参数的更新幅度应该逐渐降低，以减小模型训练后期的抖动。加载模型Checkpoint进行推理，使用load_checkpoint接口加载数据时，需要把数据传入给原始网络，而不能传递给带有优化器和损失函数的训练网络。这里加载ImageNet数据上预训练的MobileNetv2进行Fine-tuning，只训练最后修改的FC层，并在训练过程中保存Checkpoint。在进行正式的训练之前，定义训练函数，读取数据并对模型进行实例化，定义优化器和损失函数。
复制链接

扫一扫