机器学习的入门笔记(第十五周)

本周观看了B站up主霹雳吧啦Wz的图像处理的课程,

课程链接:霹雳吧啦Wz的个人空间-霹雳吧啦Wz个人主页-哔哩哔哩视频

下面是本周的所看的课程总结。

利用GoogLeNet进行图像分类

GoogLeNet是由 Google 提出的卷积神经网络架构,于 2014 年在 ImageNet 竞赛中获得了显著的成功。

GoogLeNet 的核心创新是引入了 Inception 模块,Inception 模块通过并行处理不同的卷积核尺寸和最大池化操作,能够在一个层级中提取多尺度的特征。

GoogLeNet整体架构如下:

GoogLeNet网络中的亮点如下:

Inception的结构如下,其中在右边图片的黄色背景的1*1的卷积核的作用是为了降维,1x1 卷积在保持特征图尺寸的同时,能够减少通道数,从而降低计算复杂度。一个输入的图片,经过四种不同的变化,提取图片的特征,最后输出将通道数相加。

注:每个分支所得的特征矩阵高和宽必须相同

上述的Inception结构使用了1*1的卷积核用于降维,若不使用1*1的卷积核降维,那么如下图所示,参数会很多,计算会更慢,更难。

若使用了1*1的卷积核用于降维,那么如下图所示,将512的通道数变为24,在进行5*5的卷积核操作,所需要的参数要少很多。

其中为了缓解梯度消失的问题,GoogLeNet 在网络的中间层引入了两个辅助分类器。这些辅助分类器帮助提供额外的梯度信号,有助于训练更深的网络,如下图所示

最后在GoogLeNet网络的最后一层,使用了全局平均池化层,通过对每个特征图的所有空间位置取平均值,减少了参数数量,并降低了过拟合的风险。

在GoogLeNet中,把进行主分类器之前进行全局平均池化操作,使得高和宽都变为1的特征矩阵,减少了参数数量。

同样,GoogLeNet的模型参数相比VGGNet的参数要少很多。

代码实现

1、定义BasicConv2d类,为卷积模版,将图片进行卷积层和激活函数的输出

class BasicConv2d(nn.Module):
    """
    卷积模板
    """

    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, **kwargs)
        self.relu = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

2、定义Inception类,是GoogLeNet的Inception块的输出,Inception块有4部分构成,分别是1*1的卷积核,3*3的卷积核,5*5的卷积核、池化层,它们每个的输出只是通道数目不一致,高,宽都想同,最后将这四部分的输出在通道维度上进行相加拼接

class Inception(nn.Module):
    # red是reduce的缩写,指通过1*1的卷积核进行降维
    def __init__(self, in_channels, ch1x1, ch3x3red, ch3x3, ch5x5red, ch5x5, pool_proj):
        super(Inception, self).__init__()
        self.branch1 = BasicConv2d(in_channels, ch1x1, kernel_size=1)
        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels, ch3x3red, kernel_size=1),
            BasicConv2d(ch3x3red, ch3x3, kernel_size=3, padding=1),  # 保证输出大小等于输入大小
        )
        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, ch5x5red, kernel_size=1),
            BasicConv2d(ch5x5red, ch5x5, kernel_size=5, padding=2),  # 保证输出大小等于输入大小
        )
        self.branch4 = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            BasicConv2d(in_channels, pool_proj, kernel_size=1)
        )

    def forward(self, x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)
        outputs = [branch1, branch2, branch3, branch4]
        # 在维度1(通道)上进行拼接
        return torch.cat(outputs, dim=1)

3、定义InceptionAux这个类,在Inception块4a和4d后会有辅助分类器,只有在训练阶段有效,缓解梯度消失问题

class InceptionAux(nn.Module):
    """
    Auxiliary Classifier 辅助分类器
    """

    def __init__(self, in_channels, num_classes):
        super(InceptionAux, self).__init__()
        # 平均池化层
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)  # output (batch,128,4,4)

        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)

    def forward(self, x):
        # aux1 N*512*14*14  aux2 N*528*14*14
        x = self.averagePool(x)
        # aux1 N*512*4*4  aux2 N*528*4*4
        x = self.conv(x)
        #  N*128*4*4
        x = torch.flatten(x, start_dim=1)  # 从channel这个维度往后展开 N * 2048
        # # 根据实际的训练结果微调 可以通过model.train和model.eval控制模型的状态,model.train时候,self.training=True
        x = F.dropout(x, 0.5, training=self.training)
        # N * 2048
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        # N * 1024
        x = self.fc2(x)
        return x

4、定义GoogLeNet网络

class GoogLeNet(nn.Module):
    # aux_logits 是否使用辅助分类器
    def __init__(self, num_classes=1000, aux_logits=True, init_weights=False):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits

        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.maxpool1 = nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True)  # ceil_model为True,是向下取整

        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)

        if self.aux_logits:
            self.aux1 = InceptionAux(512, num_classes)
            self.aux2 = InceptionAux(528, num_classes)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 高和宽都变为1的特征矩阵
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_classes)
        if init_weights:
            self._initialize_weights()

    def forward(self, x):
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 480 x 28 x 28
        x = self.maxpool3(x)
        # N x 480 x 14 x 14
        x = self.inception4a(x)
        # N x 512 x 14 x 14
        if self.training and self.aux_logits:  # eval model lose this layer 当前模型是否处于训练模式
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        # N x 512 x 14 x 14
        x = self.inception4c(x)
        # N x 512 x 14 x 14
        x = self.inception4d(x)
        # N x 528 x 14 x 14
        if self.training and self.aux_logits:  # eval model lose this layer
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        # N x 832 x 14 x 14
        x = self.maxpool4(x)
        # N x 832 x 7 x 7
        x = self.inception5a(x)
        # N x 832 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        if self.training and self.aux_logits:  # eval model lose this layer
            return x, aux2, aux1  # 主分类器,辅助分类器的返回

        return x  # 主分类器

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

5、选择GPU设备

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

6、对图片进行预处理操作,数据增强

data_transform = {
    'train': torchvision.transforms.Compose([
        torchvision.transforms.RandomResizedCrop(224),
        torchvision.transforms.RandomHorizontalFlip(),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ]),
    'val': torchvision.transforms.Compose([
        torchvision.transforms.Resize((224, 224)),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
    ])
}

7、定义训练加载器,测试加载器

image_path = '../data/flower_data'

train_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'train'),
                                                 transform=data_transform['train'])
train_num = len(train_dataset)  # 3306

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)

val_dataset = torchvision.datasets.ImageFolder(root=os.path.join(image_path, 'val'), transform=data_transform['val'])
val_num = len(val_dataset)  # 364

val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=0)

train_steps = len(train_loader)  # 104 3306/32
val_steps = len(val_loader)  # 12 364/32

8、将flower数据集的种类名称转换为key为索引值,value为种类名称,并保存在json文件中

flower_list = train_dataset.class_to_idx
cla_dict = dict((val, key) for key, val in flower_list.items())
json_str = json.dumps(cla_dict, indent=4)

with open('class_indices.json', 'w') as f:
    f.write(json_str)

f.close()

'''

{
    "0": "daisy",
    "1": "dandelion",
    "2": "roses",
    "3": "sunflowers",
    "4": "tulips"
}

'''

9、将GoogLeNet网络实例化,并定义损失函数,优化器

net = GoogLeNet(num_classes=5, aux_logits=True, init_weights=True)
net.to(device)
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.0003)

10、对flower数据集进行训练

注:在训练过程中,需要算出主分类器和辅助分类器的损失,最后通过一定比例的相加得到最终损失

best_acc = 0.0
save_path = './GoogLeNet.pth'
epochs = 30

for epoch in range(epochs):
    # train
    net.train()
    running_loss = 0.0
    train_bar = tqdm(train_loader, file=sys.stdout)
    for step, data in enumerate(train_bar):
        images, labels = data
        optimizer.zero_grad()
        logits, aux_logits2, aux_logits1 = net(images.to(device))
        loss0 = loss_function(logits, labels.to(device))
        loss1 = loss_function(aux_logits1, labels.to(device))
        loss2 = loss_function(aux_logits2, labels.to(device))
        loss = loss0 + loss1 * 0.3 + loss2 * 0.3
        loss.backward()
        optimizer.step()

        running_loss += loss.item()

        train_bar.desc = f'train epoch[{epoch + 1}/{epochs}] loss:{loss:.3f}'

    # 测试阶段不需要考虑辅助分类器,只需要考虑主分类器
    net.eval()
    acc = 0.0
    with torch.no_grad():
        val_bar = tqdm(val_loader, file=sys.stdout)
        for step, val_data in enumerate(val_bar):
            val_images, val_labels = val_data
            outputs = net(val_images.to(device))
            predict_y = torch.max(outputs, dim=1)[1]
            acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

        accurate = acc / val_num
        print(f'[epoch {epoch + 1}] train_loss: {running_loss / train_steps:.3f}  val_accuracy: {accurate:.3f}')

        if accurate > best_acc:
            best_acc = accurate
            torch.save(net.state_dict(), save_path)

print('Finished Training')

'''
train epoch[1/30] loss:1.448: 100%|██████████| 104/104 [00:21<00:00,  4.80it/s]
100%|██████████| 12/12 [00:01<00:00,  6.36it/s]
[epoch 1] train_loss: 1.490  val_accuracy: 0.626
train epoch[2/30] loss:1.987: 100%|██████████| 104/104 [00:22<00:00,  4.68it/s]
100%|██████████| 12/12 [00:01<00:00,  6.76it/s]
[epoch 2] train_loss: 1.493  val_accuracy: 0.604
train epoch[3/30] loss:0.985: 100%|██████████| 104/104 [00:21<00:00,  4.73it/s]
100%|██████████| 12/12 [00:01<00:00,  7.11it/s]
[epoch 3] train_loss: 1.384  val_accuracy: 0.679
train epoch[4/30] loss:1.274: 100%|██████████| 104/104 [00:21<00:00,  4.80it/s]
100%|██████████| 12/12 [00:01<00:00,  7.23it/s]
[epoch 4] train_loss: 1.380  val_accuracy: 0.676
train epoch[5/30] loss:1.055: 100%|██████████| 104/104 [00:22<00:00,  4.69it/s]
100%|██████████| 12/12 [00:01<00:00,  6.72it/s]
[epoch 5] train_loss: 1.339  val_accuracy: 0.692
train epoch[6/30] loss:1.568: 100%|██████████| 104/104 [00:21<00:00,  4.83it/s]
100%|██████████| 12/12 [00:01<00:00,  7.29it/s]
[epoch 6] train_loss: 1.264  val_accuracy: 0.706
train epoch[7/30] loss:1.550: 100%|██████████| 104/104 [00:21<00:00,  4.75it/s]
100%|██████████| 12/12 [00:01<00:00,  6.59it/s]
[epoch 7] train_loss: 1.224  val_accuracy: 0.720
train epoch[8/30] loss:0.771: 100%|██████████| 104/104 [00:21<00:00,  4.82it/s]
100%|██████████| 12/12 [00:01<00:00,  7.18it/s]
[epoch 8] train_loss: 1.144  val_accuracy: 0.698
train epoch[9/30] loss:2.318: 100%|██████████| 104/104 [00:21<00:00,  4.90it/s]
100%|██████████| 12/12 [00:01<00:00,  7.16it/s]
[epoch 9] train_loss: 1.189  val_accuracy: 0.717
train epoch[10/30] loss:0.495: 100%|██████████| 104/104 [00:21<00:00,  4.73it/s]
100%|██████████| 12/12 [00:01<00:00,  6.88it/s]
[epoch 10] train_loss: 1.137  val_accuracy: 0.690
train epoch[11/30] loss:0.274: 100%|██████████| 104/104 [00:21<00:00,  4.75it/s]
100%|██████████| 12/12 [00:01<00:00,  7.16it/s]
[epoch 11] train_loss: 1.108  val_accuracy: 0.695
train epoch[12/30] loss:0.913: 100%|██████████| 104/104 [00:21<00:00,  4.79it/s]
100%|██████████| 12/12 [00:01<00:00,  6.85it/s]
[epoch 12] train_loss: 1.120  val_accuracy: 0.698
train epoch[13/30] loss:1.103: 100%|██████████| 104/104 [00:21<00:00,  4.74it/s]
100%|██████████| 12/12 [00:01<00:00,  6.95it/s]
[epoch 13] train_loss: 1.037  val_accuracy: 0.670
train epoch[14/30] loss:1.682: 100%|██████████| 104/104 [00:21<00:00,  4.84it/s]
100%|██████████| 12/12 [00:01<00:00,  7.12it/s]
[epoch 14] train_loss: 1.081  val_accuracy: 0.736
train epoch[15/30] loss:1.607: 100%|██████████| 104/104 [00:22<00:00,  4.69it/s]
100%|██████████| 12/12 [00:01<00:00,  6.90it/s]
[epoch 15] train_loss: 0.998  val_accuracy: 0.736
train epoch[16/30] loss:0.204: 100%|██████████| 104/104 [00:21<00:00,  4.74it/s]
100%|██████████| 12/12 [00:01<00:00,  6.93it/s]
[epoch 16] train_loss: 0.981  val_accuracy: 0.750
train epoch[17/30] loss:0.499: 100%|██████████| 104/104 [00:21<00:00,  4.77it/s]
100%|██████████| 12/12 [00:01<00:00,  6.72it/s]
[epoch 17] train_loss: 0.958  val_accuracy: 0.736
train epoch[18/30] loss:0.666: 100%|██████████| 104/104 [00:22<00:00,  4.66it/s]
100%|██████████| 12/12 [00:01<00:00,  7.21it/s]
[epoch 18] train_loss: 0.949  val_accuracy: 0.777
train epoch[19/30] loss:1.036: 100%|██████████| 104/104 [00:21<00:00,  4.73it/s]
100%|██████████| 12/12 [00:01<00:00,  7.26it/s]
[epoch 19] train_loss: 0.954  val_accuracy: 0.761
train epoch[20/30] loss:1.162: 100%|██████████| 104/104 [00:22<00:00,  4.70it/s]
100%|██████████| 12/12 [00:01<00:00,  6.83it/s]
[epoch 20] train_loss: 0.896  val_accuracy: 0.772
train epoch[21/30] loss:0.682: 100%|██████████| 104/104 [00:21<00:00,  4.81it/s]
100%|██████████| 12/12 [00:01<00:00,  6.87it/s]
[epoch 21] train_loss: 0.924  val_accuracy: 0.755
train epoch[22/30] loss:1.488: 100%|██████████| 104/104 [00:21<00:00,  4.76it/s]
100%|██████████| 12/12 [00:01<00:00,  6.91it/s]
[epoch 22] train_loss: 0.880  val_accuracy: 0.758
train epoch[23/30] loss:1.137: 100%|██████████| 104/104 [00:21<00:00,  4.75it/s]
100%|██████████| 12/12 [00:01<00:00,  6.99it/s]
[epoch 23] train_loss: 0.866  val_accuracy: 0.766
train epoch[24/30] loss:0.498: 100%|██████████| 104/104 [00:21<00:00,  4.82it/s]
100%|██████████| 12/12 [00:01<00:00,  6.97it/s]
[epoch 24] train_loss: 0.872  val_accuracy: 0.753
train epoch[25/30] loss:0.650: 100%|██████████| 104/104 [00:21<00:00,  4.80it/s]
100%|██████████| 12/12 [00:01<00:00,  6.75it/s]
[epoch 25] train_loss: 0.798  val_accuracy: 0.786
train epoch[26/30] loss:1.176: 100%|██████████| 104/104 [00:21<00:00,  4.83it/s]
100%|██████████| 12/12 [00:01<00:00,  7.06it/s]
[epoch 26] train_loss: 0.801  val_accuracy: 0.780
train epoch[27/30] loss:0.439: 100%|██████████| 104/104 [00:21<00:00,  4.84it/s]
100%|██████████| 12/12 [00:01<00:00,  6.96it/s]
[epoch 27] train_loss: 0.874  val_accuracy: 0.720
train epoch[28/30] loss:0.958: 100%|██████████| 104/104 [00:21<00:00,  4.76it/s]
100%|██████████| 12/12 [00:01<00:00,  7.07it/s]
[epoch 28] train_loss: 0.834  val_accuracy: 0.819
train epoch[29/30] loss:0.478: 100%|██████████| 104/104 [00:22<00:00,  4.60it/s]
100%|██████████| 12/12 [00:01<00:00,  6.50it/s]
[epoch 29] train_loss: 0.803  val_accuracy: 0.775
train epoch[30/30] loss:0.976: 100%|██████████| 104/104 [00:22<00:00,  4.60it/s]
100%|██████████| 12/12 [00:01<00:00,  6.97it/s]
[epoch 30] train_loss: 0.754  val_accuracy: 0.780
Finished Training
'''

11、模型保存成功后,进行图片的预测,对预测的图片的预处理操作

data_transform = torchvision.transforms.Compose([
    torchvision.transforms.Resize((224, 224)),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

img = Image.open('../Test2_alexnet/tulip.jpg')
img = data_transform(img)
# img = img.unsqueeze(0)
img = torch.unsqueeze(img, dim=0)

try:
    json_file = open('./class_indices.json','r')
    class_indices = json.load(json_file)
except Exception as e:
    print(e)
    exit(-1)

12、将GoogLeNet模型实例化,并加载保存的模型权重参数

model = GoogLeNet(num_classes=5,aux_logits=False)
model_weight_path = './GoogLeNet.pth'
# strict为False,当前模型不导入辅助分类器参数
missing_keys,unexpected_keys = model.load_state_dict(torch.load(model_weight_path,map_location='cpu'), strict=False)
print(missing_keys)
print(unexpected_keys)
'''
[]
['aux1.conv.conv.weight', 'aux1.conv.conv.bias', 'aux1.fc1.weight', 'aux1.fc1.bias', 'aux1.fc2.weight', 'aux1.fc2.bias', 'aux2.conv.conv.weight', 'aux2.conv.conv.bias', 'aux2.fc1.weight', 'aux2.fc1.bias', 'aux2.fc2.weight', 'aux2.fc2.bias']
'''

13、进行预测

model.eval()
with torch.no_grad():
    output = torch.squeeze(model(img))
    predict = torch.softmax(output,dim=-1)
    # predict = torch.max(predict_y,dim=1)[1]
    predict_cla = torch.argmax(predict).numpy()

print(class_indices[str(predict_cla)],predict[predict_cla].item())

'''

tulips 0.9999539852142334

'''

利用ResNet进行图像分类

ResNet残差网络是一种深度神经网络架构,它通过引入残差连接,允许网络在训练过程中跳过某些层,从而缓解了深层网络中的梯度消失问题。

ResNet网络架构如下:

网络中的亮点如下:

其中网络中的Residual块为,主分支的输出矩阵与输入矩阵相加,所以图片的高,宽和通道必须相同,并非拼接。

而且,对于传统的神经网络,并不是层数越深效果越好,如下图,56层的神经网络可能要比20层的神经网络要差很多,传统的神经网络面临梯度消失或梯度爆炸,以及退化问题;而ResNet网络不会面临这些问题,ResNet随着层数的增加,模型的损失会越来越小。

而且Residual结构,使用1*1的卷积核用来降维和升维,参数的大小计算公式为输入深度*输出深度*卷积核大小,最后在相加,我们发现使用1*1的卷积核用于降维所需要的参数要少很多,如下图所示:

其中,Residual结构通过shortcut连接,输入矩阵和主分支的输出矩阵相加,输入矩阵和主分支的输出矩阵的维度也都相同;若输入矩阵和主分支的输出矩阵的维度不相同,为了保证主分支输出矩阵和输入矩阵维度相同,在其引入虚线,增加1*1的卷积核,用于升维和调整高,宽,使得主分支输出矩阵和输入矩阵的维度相同。

Batch Normalization(批量归一化)

批量归一化在图像预处理过程对图像进行标准化处理,这样可以加速网络的收敛,批量归一化使得图片的每一个维度满足均值为0,方差为1的分布,且在实际应用中通常在卷积层和relu层中间使用批量归一化。

注:是调整一批数据的分布,并不是调整一个

如下图,有两个特征两个通道,进行求出每个特征的均值和方差,注意是一批数据同一个通道所有数据的均值和方差,最后通过计算公式计算出特征矩阵。

参考链接: Batch Normalization详解以及pytorch实验_pytorch batch normalization-CSDN博客

迁移学习

使用迁移学习,可以快速的训练出一个理想的结果,并且,当数据集较小时也能训练出理想的效果。

迁移学习学习的是网络通用的一些特征和信息,如下图,前面的是通用的信息,是底层通用的能力,迁移学习将其底层的权重迁移,可以快速训练出理想的结果。

并且,常见的迁移学习方式如下:

且2,3训练会更快,1训练的结果更精确。

代码实现

ResNet的架构如下:

在代码实现之前,我们要先要了解ResNet网络block的两种形式。

对于ResNet18和ResNet34网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且高宽减半,通道数目翻倍。

对于ResNet50,ResNet101,ResNet152网络,它们的网络的两种形式如下:第一种形式是输入的通道数目与输出的通道数目相同;第二种形式是输入的通道数目与输出的通道数目不同,且对于resnet架构中的conv2这一层的第一个block是只改变通道数目,不改变高和宽;对于resnet架构中的其它层是既改变通道数目又改变高和宽。

1、在了解到ResNet网络block的两种形式后,我们利用代码首先实现ResNet18,ResNet34网络的block结构,代码如下:

代码中的expansion代表卷积核的变化,对于这两个网络而言,每个block最后的输出通道都与block最开始的输出通道相同,所以expansion为1;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=3, stride=stride,
                               padding=1, bias=False)  # 不使用偏置
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=1, padding=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample  # 下采样方法

    def forward(self, x):
        identity = x
        if self.downsample is not None:  # 若下采样为None,则为实线,不需要对输入做变化
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

2、现在我们利用代码首先实现ResNet50,ResNet101,ResNet152网络的block结构,代码如下:

代码中的expansion为4,代表每个block最后的输出通道都是block最开始的输出通道的4倍,也就是第三层卷积核个数是第一层,第二层的4倍,且比上面两个网络的不同是,首先经过1*1的卷积核进行降维,再经过3*3的卷积核提取特征,最后经过1*1的卷积核用于升维;downsample表示下采样,表示在经过高宽和通道变化时调用downsample函数。

class Bottleneck(nn.Module):
    expansion = 4  # 卷积核的变化

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, kernel_size=1, stride=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)

        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)

        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, kernel_size=1,
                               stride=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

3、在实现ResNet的block结构后,现在实现ResNet网络的定义,代码如下:

其中代码中的include_top代表是否进行微调网络,在这里默认为True;_make_layer函数实现了ResNet网络的第二种形式的变换,里面的channel是残差结构中卷积层使用卷积核的个数,对于每个ResNet18,34和ResNet50,101,152网络,最后的输出通道数目都不相同。

并且一定要注意,对于ResNet50,101,152网络的layer1的第一个block是只改变通道,不改变高和宽,其他层的第一个block都是要改变通道和高,宽,也就是ResNet的block的第二种形式,除了第一个block,该层的其他block都是第一种形式,最后的输出通道数目与输入通道数目相同;对于ResNet18,34网络的layer1是不改变高宽,也不改变通道数目,其他层的第一个block都是通道数目翻倍,高宽减半的操作。

class ResNet(nn.Module):

    def __init__(self, block, block_num, num_classes=1000, include_top=True):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64  # 通过max pool之后的通道数目

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxPool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, block_num[0])
        self.layer2 = self._make_layer(block, 128, block_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, block_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, block_num[3], stride=2)
        if self.include_top:
            self.avgPool = nn.AdaptiveAvgPool2d((1, 1))
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion),
            )

        layers = []
        layers.append(block(self.in_channel, channel, stride, downsample))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxPool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgPool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x

4、其中定义ResNet34和ResNet101函数,返回这面定义的ResNet网络,代码如下:

其中3,4,6,3等数字代表该层的block数量;num_classes代表最后分类的数量

def ResNet34(num_classes=1000, include_top=True):
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes, include_top)


def ResNet101(num_classes=1000, include_top=True):
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes, include_top)

5、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,其中ResNet34的预训练参数是在pytorch官网中下载得到的,下载链接为https://download.pytorch.org/models/resnet34-b627a593.pth,之后将预训练的参数加载到网络中,记得要改模型最后的输出的特征数量,因为ResNet默认的输出特征数量为1000

net = ResNet34()
# 加载预训练参数
model_weight_path = './resnet34-pre.pth'
missing_keys, unexpected_keys = net.load_state_dict(torch.load(model_weight_path), strict=False)
print(f'missing_keys: {missing_keys}, unexpected_keys: {unexpected_keys}')
# 改变输出的特征数量
in_channel = net.fc.in_features
net.fc = nn.Linear(in_features=in_channel, out_features=5)
net.to(device)

6、进行模型训练的代码与前一致

'''
train epoch [1/5], loss: 0.5164: 100%|██████████| 207/207 [00:33<00:00,  6.10it/s]
100%|██████████| 23/23 [00:03<00:00,  7.45it/s]
epoch: 1, train loss: 0.5003, val acc: 0.8901
save model to /kaggle/working/Resnet34.pth
train epoch [2/5], loss: 0.6781: 100%|██████████| 207/207 [00:24<00:00,  8.28it/s]
100%|██████████| 23/23 [00:02<00:00, 10.96it/s]
epoch: 2, train loss: 0.3391, val acc: 0.9258
save model to /kaggle/working/Resnet34.pth
train epoch [3/5], loss: 0.3169: 100%|██████████| 207/207 [00:24<00:00,  8.38it/s]
100%|██████████| 23/23 [00:02<00:00, 11.15it/s]
epoch: 3, train loss: 0.2870, val acc: 0.8984
train epoch [4/5], loss: 0.1521: 100%|██████████| 207/207 [00:24<00:00,  8.29it/s]
100%|██████████| 23/23 [00:01<00:00, 11.52it/s]
epoch: 4, train loss: 0.2592, val acc: 0.9203
train epoch [5/5], loss: 0.6599: 100%|██████████| 207/207 [00:24<00:00,  8.29it/s]
100%|██████████| 23/23 [00:02<00:00, 10.61it/s]
epoch: 5, train loss: 0.2376, val acc: 0.9093
Finished Training
'''

7、训练后进行预测,加载模型的参数,预测一张图片

model = ResNet34(num_classes=5)
model_weight_path = './ResNet34.pth'
model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))

model.eval()
with torch.no_grad():
    output = torch.squeeze(model(img))
    predict = torch.softmax(output, dim=-1)
    cla_indict = torch.argmax(predict).numpy()

print(class_indict[str(cla_indict)], predict[cla_indict])

'''
tulips 0.9997768998146057
'''

利用ResNext进行图像分类

ResNeXt 是一种深度卷积神经网络架构,它在 ResNet 的基础上引入了额外的创新。

网络亮点:与ResNet相比,更新了block,并且ResNeXt 引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。

如下图,左侧为ResNet的block架构,右侧为ResNext的block的架构。

并且ResNext101在输入尺寸为224*224的条件下,比原始的ResNet101,ResNet200的效果都要好,ResNext的网络效果得到了提升

ResNext50也要比ResNet50所需的参数会更少

  • ResNext引入了分组卷积,降低了计算复杂度,减少了参数的数量,如下图,在上方是普通的卷积操作,假设输入的特征矩阵为Cin个通道,有n个卷积核,当然每个卷积核也有Cin个通道,最后的输出的特征矩阵通道数目为n,它的参数个数的计算公式为每个卷积核的高*每个卷积核的宽*输入通道数目*输出通道数目,设每个卷积核的高和宽都为k,所以普通卷积的参数个数为k*k*Cin*n。
  • 而组卷积的过程如下,假设输入的特征矩阵有Cin个通道,将这Cin个通道分为g个组,其中最后的输出特征矩阵的通道数目为n,则每个组的卷积核个数为n/g,所以每个组的输出特征矩阵的参数个数为k*k*Cin/g(每个组的输入通道数目)*n/g(每个组的输出通道数目),又因为有g个组,最后组卷积的参数个数为k*k*Cin*n*1/g。
  • 而只要分组的个数不为1,那么最后所需要的参数个数就一定比普通的卷积操作的要更少;同样的,若分组的数量等于输入的通道个数,输入的通道个数等于输出的通道个数,这就相当于对我们输入特征矩阵的每一个通道分配了一个通道为1的卷积核进行卷积,此时就是DW Conv。

  • ResNext的block架构如下,以下a,b,c三种block架构都是等同的,最精简的block架构是c,最详细的block架构是a。
  • 在这个架构中,若输入矩阵的通道数目为256,首先分为32个组,所以每个组所需的通道数目为8,每个组经过1*1的卷积核,输出通道数目为4,这也达到了降维,此时输出的通道数目为128。
  • 在每个组的输出通道数目为4后,这32个组,每个组再经过3*3的卷积核,输出通道还为4。
  • 最后,在每个组的输出通道数目为4后,每个组经过1*1的卷积核,输出通道为256,将每个组的输出特征矩阵相加,这等同于实现了b的concatenate的拼接操作,将此次的输出特征矩阵与输入的特征矩阵相加后,完成此次block的操作。

那为什么组卷积中组的数量要设置为32呢,关于这个问题,在文章的实验中有所提到,经过实验后,发现当组的数量设置为32时,对于ResNext50和ResNext101这两个网络,得到的损失最低。

最后,文章提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用,因为与普通卷积最后的结果相同,数学计算是一样的。

代码实现

ResNext引入了分组卷积,网络的卷积层在计算时将输入特征图分成多个组,并在每个组上进行卷积操作,最后将这些卷积结果合并,这种方式可以降低计算复杂度,同时保持较好的特征表示能力。

ResNext网络只能应用在ResNet50,101,152网络的基础上,因为ResNext的论文中提到,当block中的层数小于3时,将不会起到降低复杂度,参数个数的作用。

ResNet50月ResNext50的网络架构如下:

我们发现,ResNet50和ResNext50在每层的block数量上没有变化,变化的只是每一层最开始的通道数量,以及在每个block中间的分组卷积,所以,只需要在原来ResNet的代码中添加组的数量以及网络中每个分组卷积的宽度width_per_group。

其中每一层最开始的通道数量的公式为width = int(out_channel * (width_per_group / 64)) * groups,当组的数量为32,宽度为4时,最后的结果为2倍的out_channel,这个也是ResNext网络的超参数。

1、定义Bottleneck类,它代表ResNext网络的block

其中引入了组的数量和网络中每个分组卷积的宽度,若不传入其他值,那么组的数量为1,每个分组卷积的宽度为64,则和ResNet网络的效果一样,不进行分组

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, groups=1, width_per_group=64):  # 32 4
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64)) * groups  # groups=32,width_per_group=4,width=out_channel*2

        self.conv1 = nn.Conv2d(in_channel, width, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(width)

        self.conv2 = nn.Conv2d(width, width, groups=groups, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(width)

        self.conv3 = nn.Conv2d(width, out_channel * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

2、定义ResNext类,加入了分组数量,其他的与ResNet网络一样

class ResNeXt(nn.Module):
    def __init__(self, block, blocks_num, num_classes=1000, include_top=True, groups=1, width_per_group=64):
        super(ResNeXt, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)

        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion),
            )

        layers = []
        layers.append(block(in_channel=self.in_channel, out_channel=channel, stride=stride, downsample=downsample,
                            groups=self.groups, width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel, groups=self.groups, width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x

3、定义ResNext50和ResNext101网络,将分组数量和每组的卷积核个数传递给类中,将类实例化为ResNext对象,并返回

def resnext50_32x4d(num_classes=1000, include_top=True):
    groups = 32
    width_per_group = 4
    return ResNeXt(Bottleneck, [3, 4, 6, 3], num_classes, include_top, groups, width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    groups = 32
    width_per_group = 8
    return ResNeXt(Bottleneck, [3, 4, 23, 3], num_classes, include_top, groups, width_per_group)

4、进行训练,前面的数据预处理,训练数据集,测试数据集与之前一样,不一样的是将ResNext网络实例化,并冻结除全连接层之外的所有权重

首先先下载ResNext50的预训练权重,链接:https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth

net = resnext50_32x4d()
model_weight_path = './resnext50_pre.pth'
net.load_state_dict(torch.load(model_weight_path))
for param in net.parameters():
    param.requires_grad = False

in_channel = net.fc.in_features
net.fc = nn.Linear(in_features=in_channel, out_features=5)
net.to(device)

5、只训练全连接层的参数

params = [p for p in net.parameters() if p.requires_grad]
optimizer = torch.optim.Adam(params, lr=0.0001)

6、进行模型训练的代码与前一致

'''
train epoch [1/10], loss: 0.3461: 100%|██████████| 207/207 [00:21<00:00,  9.76it/s]
100%|██████████| 23/23 [00:02<00:00,  8.59it/s]
epoch: 1, train loss: 0.5362, val acc: 0.8709
save model to /kaggle/working/Resnext50.pth
train epoch [2/10], loss: 0.8194: 100%|██████████| 207/207 [00:21<00:00,  9.50it/s]
100%|██████████| 23/23 [00:02<00:00,  9.08it/s]
epoch: 2, train loss: 0.5200, val acc: 0.8846
save model to /kaggle/working/Resnext50.pth
train epoch [3/10], loss: 0.5423: 100%|██████████| 207/207 [00:21<00:00,  9.67it/s]
100%|██████████| 23/23 [00:02<00:00,  9.61it/s]
epoch: 3, train loss: 0.5047, val acc: 0.8929
save model to /kaggle/working/Resnext50.pth
train epoch [4/10], loss: 0.4425: 100%|██████████| 207/207 [00:21<00:00,  9.73it/s]
100%|██████████| 23/23 [00:02<00:00,  9.40it/s]
epoch: 4, train loss: 0.4673, val acc: 0.8901
train epoch [5/10], loss: 0.9804: 100%|██████████| 207/207 [00:21<00:00,  9.51it/s]
100%|██████████| 23/23 [00:02<00:00,  9.52it/s]
epoch: 5, train loss: 0.4588, val acc: 0.9011
save model to /kaggle/working/Resnext50.pth
train epoch [6/10], loss: 0.4052: 100%|██████████| 207/207 [00:21<00:00,  9.80it/s]
100%|██████████| 23/23 [00:02<00:00,  9.24it/s]
epoch: 6, train loss: 0.4590, val acc: 0.8874
train epoch [7/10], loss: 0.3596: 100%|██████████| 207/207 [00:21<00:00,  9.66it/s]
100%|██████████| 23/23 [00:02<00:00,  9.22it/s]
epoch: 7, train loss: 0.4366, val acc: 0.9038
save model to /kaggle/working/Resnext50.pth
train epoch [8/10], loss: 0.5366: 100%|██████████| 207/207 [00:21<00:00,  9.66it/s]
100%|██████████| 23/23 [00:02<00:00,  9.57it/s]
epoch: 8, train loss: 0.4463, val acc: 0.8874
train epoch [9/10], loss: 0.6280: 100%|██████████| 207/207 [00:21<00:00,  9.49it/s]
100%|██████████| 23/23 [00:02<00:00,  9.39it/s]
epoch: 9, train loss: 0.4193, val acc: 0.9121
save model to /kaggle/working/Resnext50.pth
train epoch [10/10], loss: 0.4213: 100%|██████████| 207/207 [00:21<00:00,  9.67it/s]
100%|██████████| 23/23 [00:02<00:00,  8.85it/s]
epoch: 10, train loss: 0.4153, val acc: 0.8984
Finished Training
'''

7、训练后进行预测,加载模型的参数,预测一张图片

model = resnext50_32x4d(num_classes=5)
model_weight_path = './ResNext50.pth'
model.load_state_dict(torch.load(model_weight_path, map_location='cpu'))

model.eval()
with torch.no_grad():
    output = model(img)
    output = torch.squeeze(output)
    predict = torch.softmax(output,dim=-1)
    idx = torch.argmax(predict,dim=-1).item()

print('img class: {}, predict class: {:.4f}'.format(class_indices[str(idx)],predict[idx]))

'''
img class: tulips, predict class: 0.9963
'''

8、若预测一批图片,需要将这些图片合成一个列表,并打包成一个batch

img_path_list = ['tulip.jpg', 'rose.jpg']
img_list = []
for img_path in img_path_list:
    assert os.path.exists(img_path), f'file {img_path} does not exist'
    img = Image.open(img_path)
    img = data_transform(img)
    img_list.append(img)

batch_img = torch.stack(img_list, dim=0)
print('batch_img shape:', batch_img.shape)

'''
batch_img shape: torch.Size([2, 3, 224, 224])
'''

9、预测一批图片时,加载模型参数与前一致,最后利用循环进行输出每次预测的图片种类和每次预测的概率

model.eval()

with torch.no_grad():
    outputs = model(batch)
    predict = torch.softmax(outputs, dim=-1)
    idx_list = torch.argmax(predict, dim=-1).numpy()

    for step, idx in enumerate(idx_list):
        print('image_path: {}, image_class: {}, image_predict: {:.4f}'.format(image_path_list[step], class_indices[str(idx)],
                                                                        predict[step][idx]))
'''
image_path: ../ResNext/tulip.jpg, image_class: tulips, image_predict: 0.9963
image_path: ../ResNext/rose.jpg, image_class: roses, image_predict: 0.9509
'''

MobileNet

MobileNet 是一种高效的卷积神经网络架构,专门设计用于在移动设备和边缘设备上进行高效的计算和推理。

MobileNet v1

MobileNet v1 网络的亮点如下:

MobileNet的核心是使用深度可分离卷积,传统的卷积操作是卷积核通道数目等于输入特征矩阵的通道数目,输出特征矩阵的通道等于卷积核的个数;而DW卷积将传统的卷积操作分解为更轻量的操作,它对每个通道独立的进行卷积,每个卷积核的通道数目都为1,且输入特征矩阵的通道等于卷积核的个数等于输出矩阵的通道数目,相当于传统卷积当卷积核个数为1时,不进行合并的操作。

PW卷积是在DW卷积之后,像传统卷积一样,每个卷积核有输入特征矩阵的通道数目,并且每个都是1*1的卷积核,卷积核的数目是最终的输出通道数目。

并且传统卷积最后的输出通道数目和DW+PW卷积最后的输出通道数目都是一致的,且DW+PW卷积所需的计算量会更少。

计算量的公式为:(kernel*kernel*map*map)*channel_input*channel_output。

其中kernel为卷积核的的大小,map为输出特征矩阵的大小,channel_input为输入特征矩阵的深度,channel_output为输出特征矩阵的深度。

设输入特征矩阵的高宽为DF,卷积核的大小为DK,M为输入矩阵的深度,N为输出特征矩阵的深度,也是卷积核的个数。

所以,对于普通卷积的计算量为:DK*DK*M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)

对于DW+PW卷积的计算量为:

DK*DK*M*DF*DF+M*N*DF*DF(其中这里默认stride为1,最后不改变输出特征矩阵的大小)

如下图,理论上普通卷积计算量是DW+PW的8到9倍

MobileNet V1的架构如下:

其中α代表卷积核个数的倍率,β代表分辨率,也就是输入尺寸,经过实验验证,MobileNet的的精确度相对于VGG16虽然差一些,但是它所需的计算量和所需的参数数量比VGG16要少的很多。

MobileNet v2

MobileNet v2网络相比MobileNet V1网络,准确率更高,模型更小,它网络中的亮点如下:

在ResNet中引入了残差结构,它是对图片先进行降维,进行卷积后,再进行升维,属于两头大,中间小的瓶颈结构,它所用的激活函数为Relu。

对于MobileNet v2中引入了倒残差结构,它是对图片先进行升维,进行DW卷积后,再进行降维,属于两头小,中间大的结果,它所用的激活函数为Relu6。

Relu6激活函数的图像如下:对于普通的Relu激活函数,当输入值小于0时,结果为0,当输入值大于0时,不进行处理;Relu6激活函数是当输入值小于0时,结果为0,当输入值位于0和6之间,不进行处理,当输入值大于6,结果为6。

且在MobileNet v2中的每个block的最后一个1*1的卷积核所用的激活函数为Linear激活函数,因为Relu激活函数对低维特征信息造成大量的损失。

MobileNet v2的block如下:

假设输入的形状为高宽分别为h和w,输入通道数目为k,首先经过1*1的卷积核升维,输出形状为h*w*(tk),其中t为扩展因子;接下来经过3*3的卷积核进行DW卷积,DW卷积不会改变输入的通道数目,所以输出形状为h/s*w/s*(tk),其中s为步距;最后再经过1*1的卷积核用于降维,输出形状为h/s*w/s*k',其中k'为最后指定的输出通道,高宽不变。

我们要注意,当stride=1时,输入特征矩阵与输出特征矩阵的shape相同时,才有shortcut连接,这里才可以进行相交操作,若stride=2,形状大小不一致,并没有ResNet中的输入特征矩阵的变化,则没有shortcut,不能进行连接。

MobileNet v2的架构如下:

经过实验的验证,MobileNet v2相比MobileNet v1和其他网络而言,在图片分类和目标检测测试上,准确率和模型所需的参数都优于其他的网络。

个人总结

本周主要学习了一些图像处理的方法和理论,以及各种网络实现图像分类,下周将继续学习其他的一些模型算法和理论知识,并且阅读相应的文献,理论与实践相结合。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值