3 MobileNet V2图像分类实验
3.1 实验介绍
本实验将使用轻量级网络MobileNet V2实现花卉图像数据集分类。
3.2 实验准备
在动手进行实践之前,确保你已经正确安装了MindSpore。如果没有,可以通过MindSpore官网安装页面:https://www.mindspore.cn/install/,将MindSpore安装在你的电脑当中。
同时希望你拥有Python编程基础和概率、矩阵等基础数学知识。
推荐环境:
版本:MindSpore 1.7
编程语言:Python 3.7
3.3 实验详细设计与实现
3.3.1 数据准备
我们示例中用到的图像花卉数据集,该数据集是开源数据集,总共包括5种花的类型:分别是daisy(雏菊,633张),dandelion(蒲公英,898张),roses(玫瑰,641张),sunflowers(向日葵,699张),tulips(郁金香,799张),保存在5个文件夹当中,总共3670张,大小大概在230M左右。为了在模型部署上线之后进行测试,数据集在这里分成了flower_photos_train和flower_photos_test两部分。
目录结构如下:
flower_photos_train
├── daisy
├── dandelion
├── roses
├── sunflowers
├── tulips
├── LICENSE.txt
flower_photos_test
├── daisy
├── dandelion
├── roses
├── sunflowers
├── tulips
├── LICENSE.txt
数据集的获取:
https://ascend-professional-construction-dataset.obs.myhuaweicloud.com/deep-learning/flower_photos_train.zip
https://ascend-professional-construction-dataset.obs.myhuaweicloud.com/deep-learning/flower_photos_test.zip
3.3.2 实验步骤
步骤 1加载数据集
定义函数`create_dataset`加载数据集,使用`ImageFolderDataset`接口来加载花卉图像分类数据集,并对数据集进行图像增强操作。代码:
import mindspore.dataset as ds
import mindspore.dataset.vision.c_transforms as CV
from mindspore import dtype as mstype
train_data_path = 'flower_photos_train'
val_data_path = 'flower_photos_test'
def create_dataset(data_path, batch_size=18, training=True):
"""定义数据集"""
data_set = ds.ImageFolderDataset(data_path, num_parallel_workers=8, shuffle=True,
class_indexing={'daisy': 0, 'dandelion': 1, 'roses': 2, 'sunflowers': 3,
'tulips': 4})
# 对数据进行增强操作
image_size = 224
mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
if training:
trans = [
CV.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
CV.RandomHorizontalFlip(prob=0.5),
CV.Normalize(mean=mean, std=std),
CV.HWC2CHW()
]
else:
trans = [
CV.Decode(),
CV.Resize(256),
CV.CenterCrop(image_size),
CV.HWC2CHW()
]
# 实现数据的map映射、批量处理和数据重复的操作
data_set = data_set.map(operations=trans, input_columns="image", num_parallel_workers=8)
# 设置batch_size的大小,若最后一次抓取的样本数小于batch_size,则丢弃
data_set = data_set.batch(batch_size, drop_remainder=True)
return data_set
dataset_train = create_dataset(train_data_path)
dataset_val = create_dataset(val_data_path)
步骤 2数据集可视化
从`create_dataset`接口中加载的训练数据集返回值为字典,用户可通过`create_dict_iterator`接口创建数据迭代器,使用`next`迭代访问数据集。本章中`batch_size`设为18,所以使用`next`一次可获取18个图像及标签数据。代码:
import matplotlib.pyplot as plt
import numpy as np
data = next(dataset_train.create_dict_iterator())
images = data["image"]
labels = data["label"]
print("Tensor of image", images.shape)
print("Labels:", labels)
# class_name对应label,按文件夹字符串从小到大的顺序标记label
class_name = {0:'daisy',1:'dandelion',2:'roses',3:'sunflowers',4:'tulips'}
plt.figure(figsize=(15, 7))
for i in range(len(labels)):
# 获取图像及其对应的label
data_image = images[i].asnumpy()
data_label = labels[i]
# 处理图像供展示使用
data_image = np.transpose(data_image, (1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
data_image = std * data_image + mean
data_image = np.clip(data_image, 0, 1)
# 显示图像
plt.subplot(3, 6, i + 1)
plt.imshow(data_image)
plt.title(class_name[int(labels[i].asnumpy())])
plt.axis("off")
plt.show()
输出结果:
步骤 3创建MobileNet V2模型
数据集对于训练非常重要,好的数据集可以有效提高训练精度和效率。在MobileNet网络是由Google团队于2017年提出的专注于移动端、嵌入式或IoT设备的轻量级CNN网络,相比于传统的卷积神经网络,MobileNet网络使用深度可分离卷积(Depthwise Separable Convolution)的思想在准确率小幅度降低的前提下,大大减小了模型参数与运算量。并引入宽度系数 α 和分辨率系数 β 使模型满足不同应用场景的需求。
由于MobileNet网络中Relu激活函数处理低维特征信息时会存在大量的丢失,所以MobileNetV2网络提出使用倒残差结构(Inverted residual block)和Linear Bottlenecks来设计网络,以提高模型的准确率,且优化后的模型更小。
上图中Inverted residual block结构是先使用1x1卷积进行升维,然后使用3x3的DepthWise卷积,最后使用1x1的卷积进行降维,与Residual block结构相反。Residual block是先使用1x1的卷积进行降维,然后使用3x3的卷积,最后使用1x1的卷积进行升维。
详细内容可参见MobileNet V2论文(https://arxiv.org/pdf/1801.04381.pdf)
代码:
import numpy as np
import mindspore as ms
import mindspore.nn as nn
import mindspore.ops as ops
def _make_divisible(v, divisor, min_value=None):
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
class GlobalAvgPooling(nn.Cell):
def __init__(self):
super(GlobalAvgPooling, self).__init__()
self.mean = ops.ReduceMean(keep_dims=False)
def construct(self, x):
x = self.mean(x, (2, 3))
return x
class ConvBNReLU(nn.Cell):
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
super(ConvBNReLU, self).__init__()
padding = (kernel_size - 1) // 2
in_channels = in_planes
out_channels = out_planes
if groups == 1:
conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad_mode='pad', padding=padding)
else:
out_channels = in_planes
conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, pad_mode='pad',
padding=padding, group=in_channels)
layers = [conv, nn.BatchNorm2d(out_planes), nn.ReLU6()]
self.features = nn.SequentialCell(layers)
def construct(self, x):
output = self.features(x)
return output
class InvertedResidual(nn.Cell):
def __init__(self, inp, oup, stride, expand_ratio):
super(InvertedResidual, self).__init__()
assert stride in [1, 2]
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = stride == 1 and inp == oup
layers = []
if expand_ratio != 1:
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
layers.extend([
# dw
ConvBNReLU(hidden_dim, hidden_dim,
stride=stride, groups=hidden_dim),
# pw-linear
nn.Conv2d(hidden_dim, oup, kernel_size=1,
stride=1, has_bias=False),
nn.BatchNorm2d(oup),
])
self.conv = nn.SequentialCell(layers)
self.add = ops.Add()
self.cast = ops.Cast()
def construct(self, x):
identity = x
x = self.conv(x)
if self.use_res_connect:
return self.add(identity, x)
return x
class MobileNetV2Backbone(nn.Cell):
def __init__(self, width_mult=1., inverted_residual_setting=None, round_nearest=8,
input_channel=32, last_channel=1280):
super(MobileNetV2Backbone, self).__init__()
block = InvertedResidual
# setting of inverted residual blocks
self.cfgs = inverted_residual_setting
if inverted_residual_setting is None:
self.cfgs = [
# t, c, n, s
[1, 16, 1, 1],
[6, 24, 2, 2],
[6, 32, 3, 2],
[6, 64, 4, 2],
[6, 96, 3, 1],
[6, 160, 3, 2],
[6, 320, 1, 1],
]
# building first layer
input_channel = _make_divisible(input_channel * width_mult, round_nearest)
self.out_channels = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
features = [ConvBNReLU(3, input_channel, stride=2)]
# building inverted residual blocks
for t, c, n, s in self.cfgs:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
features.append(block(input_channel, output_channel, stride, expand_ratio=t))
input_channel = output_channel
# building last several layers
features.append(ConvBNReLU(input_channel, self.out_channels, kernel_size=1))
# make it nn.CellList
self.features = nn.SequentialCell(features)
self._initialize_weights()
def construct(self, x):
x = self.features(x)
return x
def _initialize_weights(self):
self.init_parameters_data()
for _, m in self.cells_and_names():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.set_data(ms.Tensor(np.random.normal(0, np.sqrt(2. / n),
m.weight.data.shape).astype("float32")))
if m.bias is not None:
m.bias.set_data(
ms.numpy.zeros(m.bias.data.shape, dtype="float32"))
elif isinstance(m, nn.BatchNorm2d):
m.gamma.set_data(
ms.Tensor(np.ones(m.gamma.data.shape, dtype="float32")))
m.beta.set_data(
ms.numpy.zeros(m.beta.data.shape, dtype="float32"))
@property
def get_features(self):
return self.features
class MobileNetV2Head(nn.Cell):
def __init__(self, input_channel=1280, num_classes=1000, has_dropout=False, activation="None"):
super(MobileNetV2Head, self).__init__()
# mobilenet head
head = ([GlobalAvgPooling()] if not has_dropout else
[GlobalAvgPooling(), nn.Dropout(0.2)])
self.head = nn.SequentialCell(head)
self.dense = nn.Dense(input_channel, num_classes, has_bias=True)
self.need_activation = True
if activation == "Sigmoid":
self.activation = ops.Sigmoid()
elif activation == "Softmax":
self.activation = ops.Softmax()
else:
self.need_activation = False
self._initialize_weights()
def construct(self, x):
x = self.head(x)
x = self.dense(x)
if self.need_activation:
x = self.activation(x)
return x
def _initialize_weights(self):
self.init_parameters_data()
for _, m in self.cells_and_names():
if isinstance(m, nn.Dense):
m.weight.set_data(ms.Tensor(np.random.normal(
0, 0.01, m.weight.data.shape).astype("float32")))
if m.bias is not None:
m.bias.set_data(
ms.numpy.zeros(m.bias.data.shape, dtype="float32"))
class MobileNetV2Combine(nn.Cell):
def __init__(self, backbone, head):
super(MobileNetV2Combine, self).__init__(auto_prefix=False)
self.backbone = backbone
self.head = head
def construct(self, x):
x = self.backbone(x)
x = self.head(x)
return x
def mobilenet_v2(num_classes):
backbone_net = MobileNetV2Backbone()
head_net = MobileNetV2Head(backbone_net.out_channels,num_classes)
return MobileNetV2Combine(backbone_net, head_net)
步骤 4模型训练与评估
创建好模型,损失函数和优化器后,通过`Model`接口初始化模型后,使用`model.train`接口训练模型,`model.eval`接口评估模型精度。
在本段章节中涉及迁移学习的知识点:
1.下载预训练模型权重
通过https://download.mindspore.cn/models/r1.7/mobilenetv2_ascend_v170_imagenet2012_official_cv_top1acc71.88.ckpt下载已经在ImageNet数据集上已经训练好的预训练模型文件,并存放在运行代码的同级目录下。
2.读取预训练模型
通过load_checkpoint()接口读取预训练模型文件,输出结果为一个字典数据格式。
3.修改预训练模型参数
修改预训练模型权重相关的参数(预训练模型是在ImageNet数据集上训练的1001分类任务,而我们当前实验任务是实现flowers数据集的5分类任务,网络模型修改的是最后一层的全连接层)。
代码:
import mindspore
import mindspore.nn as nn
from mindspore.train import Model
from mindspore import Tensor, save_checkpoint
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor
from mindspore.train.serialization import load_checkpoint, load_param_into_net
# 创建模型,其中目标分类数为5
network = mobilenet_v2(5)
# 加载预训练权重
param_dict = load_checkpoint("./mobilenetv2_ascend_v170_imagenet2012_official_cv_top1acc71.88.ckpt")
# 根据修改的模型结构修改相应的权重数据
param_dict["dense.weight"] = mindspore.Parameter(Tensor(param_dict["dense.weight"][:5, :],mindspore.float32), name="dense.weight", requires_grad=True)
param_dict["dense.bias"] = mindspore.Parameter(Tensor(param_dict["dense.bias"][:5, ],mindspore.float32), name="dense.bias", requires_grad=True)
# 将修改后的权重参数加载到模型中
load_param_into_net(network, param_dict)
train_step_size = dataset_train.get_dataset_size()
epoch_size = 20
lr = nn.cosine_decay_lr(min_lr=0.0, max_lr=0.1,total_step=epoch_size * train_step_size,step_per_epoch=train_step_size,decay_epoch=200)
#定义优化器
network_opt = nn.Momentum(params=network.trainable_params(), learning_rate=0.01, momentum=0.9)
# 定义损失函数
network_loss = loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction="mean")
# 定义评价指标
metrics = {"Accuracy": nn.Accuracy()}
# 初始化模型
model = Model(network, loss_fn=network_loss, optimizer=network_opt, metrics=metrics)
# 损失值监控
loss_cb = LossMonitor(per_print_times=train_step_size)
# 模型保存参数,设置每隔多少步保存一次模型,最多保存几个模型
ckpt_config = CheckpointConfig(save_checkpoint_steps=100, keep_checkpoint_max=10)
# 模型保存,设置模型保存的名称,路径,以及保存参数
ckpoint_cb = ModelCheckpoint(prefix="mobilenet_v2", directory='./ckpt', config=ckpt_config)
print("============== Starting Training ==============")
# 训练模型,设置训练次数为5,训练集,回调函数
model.train(5, dataset_train, callbacks=[loss_cb,ckpoint_cb], dataset_sink_mode=True)
# 使用测试集进行模型评估,输出测试集的准确率
metric = model.eval(dataset_val)
print(metric)
输出:
============== Starting Training ==============
epoch: 1 step: 201, loss is 0.8389087915420532
epoch: 2 step: 201, loss is 0.5519619584083557
epoch: 3 step: 201, loss is 0.26490363478660583
epoch: 4 step: 201, loss is 0.4540162682533264
epoch: 5 step: 201, loss is 0.5963617563247681
{'Accuracy': 0.9166666666666666}
步骤 5可视化模型预测
定义visualize_model函数,使用上述验证精度最高的模型对输入图像进行预测,并将预测结果可视化。
代码:
import matplotlib.pyplot as plt
import mindspore as ms
def visualize_model(best_ckpt_path, val_ds):
num_class = 5 # 对狼和狗图像进行二分类
net = mobilenet_v2(num_class)
# 加载模型参数
param_dict = ms.load_checkpoint(best_ckpt_path)
ms.load_param_into_net(net, param_dict)
model = ms.Model(net)
# 加载验证集的数据进行验证
data = next(val_ds.create_dict_iterator())
images = data["image"].asnumpy()
labels = data["label"].asnumpy()
class_name = {0:'daisy',1:'dandelion',2:'roses',3:'sunflowers',4:'tulips'}
# 预测图像类别
output = model.predict(ms.Tensor(data['image']))
pred = np.argmax(output.asnumpy(), axis=1)
# 显示图像及图像的预测值
plt.figure(figsize=(15, 7))
for i in range(len(labels)):
plt.subplot(3, 6, i + 1)
# 若预测正确,显示为蓝色;若预测错误,显示为红色
color = 'blue' if pred[i] == labels[i] else 'red'
plt.title('predict:{}'.format(class_name[pred[i]]), color=color)
picture_show = np.transpose(images[i], (1, 2, 0))
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
picture_show = std * picture_show + mean
picture_show = np.clip(picture_show, 0, 1)
plt.imshow(picture_show)
plt.axis('off')
plt.show()
visualize_model('ckpt/mobilenet_v2-5_201.ckpt', dataset_val)
输出:
Mobile net v2
最新推荐文章于 2024-07-19 16:36:18 发布