GoogLeNet笔记

1. GoogLeNet网络

1.1 整体网络结构

  • 引入了Inception结构(融合不同尺度的特征信息)
  • 使用1 x 1的卷积核进行降维以及映射处理
  • 添加两个辅助分类器帮助训练

1.2 Inception网络结构


原始的网络结构
带降维的网络结构​​​​​​

  • 分支所得到的的特征矩阵高宽需要相同,才可以沿着深度方向拼接
  • 1*1的卷积层可以起到降维的作用,可以减少计算量

3.辅助分类器

AlexNet和VGG只有一个输出,GooLeNet有三个输出。

第一层   AveragePool:平均池化下采样层,池化核大小为5×5,步距为3
从Inception(4a)中输入到该层时,输入特征矩阵大小为14×14×512,输出为4×4×512;
从Inception(4b)中输入到该层时,输入特征矩阵大小为14×14×528 ,输出为4×4×528;
第二层   Conv:128个卷积核大小为1×1的卷积层(使用了ReLU激活函数),降维
第三层   FC:节点个数为1024的全连接层(使用了ReLU激活函数)
两个全连接层之间使用了dropout函数(70%的比例随机失活神经元)
第四层   FC:全连接层,对于ImageNet数据集有1000个类别,节点个数就是1000
Softmax激活函数

2. 网络构建

2.1 BasicConv2d构建

因为Conv2d函数通常与一个ReLu函数一起用,故构建一个BasicConv2d函数用于卷积层。

class BasicConv2d(nn.Module):
    def __init__(self,in_channels,out_channels,kernel_size):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels,out_channels,kernel_size)
        self.relu = nn.ReLU(inplace=True)

    def forward(self,x):
        x = self.conv(x)
        x = self.relu(x)
        return x

2.2 Inception构建

class Inception(nn.Module):
    def __init__(self,in_channels,conv1x1,conv3x3reduce,conv3x3,conv5x5reduce,conv5x5,pool_project):
        super(Inception, self).__init__()
        self.branch1 = BasicConv2d(in_channels,conv1x1,kernel_size=1)
        self.branch2 = nn.Sequential(
            BasicConv2d(in_channels,conv3x3reduce,kernel_size=3),
            BasicConv2d(conv3x3reduce,conv3x3,kernel_size=1)
        )
        self.branch3 = nn.Sequential(
            BasicConv2d(in_channels, conv5x5reduce, kernel_size=3),
            BasicConv2d(conv3x3reduce, conv5x5, kernel_size=1)

        )
        self.branch4 = nn.Sequential(
            #该层池化后输入大小等于输出大小
            nn.MaxPool2d(kernel_size=3,stride=1,padding=1),
            nn.Conv2d(in_channels,pool_project,kernel_size=1)
        )
    def forward(self,x):
        branch1 = self.branch1(x)
        branch2 = self.branch2(x)
        branch3 = self.branch3(x)
        branch4 = self.branch4(x)

        output = [branch1,branch2,branch3,branch4]
        #在dim=1维度上进行合并,因为第0维度是batch
        return torch.cat(output,dim=1)

2.3 辅助分类器的构建

#两个辅助分类器参数一样
import torch
from torch import nn
import torch.nn.functional as F

class Aux(nn.Module):
    def __init__(self, in_channels, num_classes):
        super(Aux, self).__init__()
        self.averagePool = nn.AvgPool2d(kernel_size=5, stride=3)  # output[batch, 512, 4, 4]/output[batch, 528, 4, 4]/
        self.conv = BasicConv2d(in_channels, 128, kernel_size=1)  # output[batch, 128, 4, 4]
		#定义第二个卷积层BasicConv2d  输入特征矩阵深度就是我们输入的in_channels,卷积核个数是128,卷积核大小为1x1
        # output batch不会改变 128是卷积核个数也是输出矩阵的深度128 4x4是输出的大小 是不会改变的 因为使用的卷积核大小是1x1的
        self.fc1 = nn.Linear(2048, 1024)
        self.fc2 = nn.Linear(1024, num_classes)


    def forward(self, x):

        x = self.averagePool(x)
        x = self.conv(x)
        x = torch.flatten(x, 1)
        x = F.dropout(x, 0.5, training=self.training)
        # N x 2048
        x = F.relu(self.fc1(x), inplace=True)
        x = F.dropout(x, 0.5, training=self.training)
        x = self.fc2(x)
        return x

此处注意dropuout的用法,nn.dropout不能输入参数,需要定义在函数里面

2.4 构建model

class GoogLeNet(nn.Module):
    def __init__(self,num_class = 1000,aux_logist = True,init_weight = False):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logist

        # (224-7+2*3)/2+1=112.5,向下取整,输出112
        self.conv1 = BasicConv2d(3, 64, kernel_size=7, stride=2, padding=3)

        self.maxpool1 = nn.MaxPool2d(3, stride=2, ceil_mode=True)  # ceil_mode=True 向上取整 False 向下取整

        self.conv2 = BasicConv2d(64, 64, kernel_size=1)
        self.conv3 = BasicConv2d(64, 192, kernel_size=3, padding=1)
        self.maxpool2 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception3a = Inception(192, 64, 96, 128, 16, 32, 32)
        self.inception3b = Inception(256, 128, 128, 192, 32, 96, 64)
        self.maxpool3 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception4a = Inception(480, 192, 96, 208, 16, 48, 64)
        self.inception4b = Inception(512, 160, 112, 224, 24, 64, 64)
        self.inception4c = Inception(512, 128, 128, 256, 24, 64, 64)
        self.inception4d = Inception(512, 112, 144, 288, 32, 64, 64)
        self.inception4e = Inception(528, 256, 160, 320, 32, 128, 128)
        self.maxpool4 = nn.MaxPool2d(3, stride=2, ceil_mode=True)

        self.inception5a = Inception(832, 256, 160, 320, 32, 128, 128)
        self.inception5b = Inception(832, 384, 192, 384, 48, 128, 128)
        if self.aux_logits:  # 如果使用辅助分类器的话,这个是我们刚刚传入的布尔参数
            aux = AUX()
            self.aux1 = aux(512, num_class)  # 输入特侦矩阵的深度(inception4a的输出)
            self.aux2 = aux(528, num_class)  # 输入特侦矩阵的深度(inception4d的输出)

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 自适应的平均池化下采样操作 参数所需要输出特征矩阵的高和宽
        # 好处:无论输入的特征矩阵的高和宽是多少 最后的输出都可以固定为1x1
        self.dropout = nn.Dropout(0.4)
        self.fc = nn.Linear(1024, num_class)
        # 输入的是展平后的向量节点个数是1024,输出节点个数是num_classes

        if init_weight:  # 如果有初始化权重参数就是用 这里的详解在AlexNet模型笔记中
            self._initialize_weights()


    def forward(self, x):  # 定义下采样层
        # N x 3 x 224 x 224
        x = self.conv1(x)
        # N x 64 x 112 x 112
        x = self.maxpool1(x)
        # N x 64 x 56 x 56
        x = self.conv2(x)
        # N x 64 x 56 x 56
        x = self.conv3(x)
        # N x 192 x 56 x 56
        x = self.maxpool2(x)

        # N x 192 x 28 x 28
        x = self.inception3a(x)
        # N x 256 x 28 x 28
        x = self.inception3b(x)
        # N x 480 x 28 x 28
        x = self.maxpool3(x)
        # N x 480 x 14 x 14
        x = self.inception4a(x)
        # N x 512 x 14 x 14


        if self.training and self.aux_logits:  # eval model lose this layer
            # 第一个参数是判断我们的当前处于的训练状态是否在训练状态 然后是是否使用辅助分类器
            aux1 = self.aux1(x)

        x = self.inception4b(x)
        # N x 512 x 14 x 14
        x = self.inception4c(x)
        # N x 512 x 14 x 14
        x = self.inception4d(x)
        # N x 528 x 14 x 14
        if self.training and self.aux_logits:  # eval model lose this layer
            # 第一个参数是判断我们的当前处于的训练状态是否在训练状态 然后是是否使用辅助分类器
            aux2 = self.aux2(x)

        x = self.inception4e(x)
        # N x 832 x 14 x 14
        x = self.maxpool4(x)
        # N x 832 x 7 x 7
        x = self.inception5a(x)
        # N x 832 x 7 x 7
        x = self.inception5b(x)
        # N x 1024 x 7 x 7

        x = self.avgpool(x)
        # N x 1024 x 1 x 1
        x = torch.flatten(x, 1)
        # N x 1024
        x = self.dropout(x)
        x = self.fc(x)
        # N x 1000 (num_classes)
        if self.training and self.aux_logits:  # eval model lose this layer
            return x, aux2, aux1
            # 返回三个值 一个是我们的主输出值 辅助分类器2的值 辅助分类器1的值
        return x


    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

3.训练

与AlexNet,VGG相似,先加载数据

import json
import torch
from torch import optim
from torch.utils.data import DataLoader
from torchvision import datasets,transforms
from model import vgg


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(device)
    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]),
        "val": transforms.Compose([transforms.Resize((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])}

    train_path = "zhang/train"
    val_path = "zhang/val"

    train_data = datasets.ImageFolder(root=train_path,transform=data_transform["train"])
    val_data = datasets.ImageFolder(root=val_path,transform = data_transform["val"])
    train_dataset = DataLoader(train_data,batch_size=32,shuffle=True)
    val_dataset = DataLoader(val_data, batch_size=32, shuffle=True)
    val_num = len(val_data)


    flower_list = train_data.class_to_idx
    class_dict = dict((val, key) for key, val in flower_list.items())  # 将 key 和val 反过来
    json_str = json.dumps(class_dict, indent=4)  # indent=4换行,更加美观
    with open("class_indices.json", "w", encoding="utf-8") as f:
        f.write(json_str)
    net = GoogLeNet(num_class=5,aux_logist=True,init_weight=True)
    net.to(device)
    loss_function = torch.nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=0.0003)
    best_acc = 0.0  # 用于后面设置准确率

    for epoch in range(30):
        print(f"----------第{epoch + 1}轮训练开始--------")
        # train dropout使用后需要.train .eval
        net.train()
        for i, (images, labels) in enumerate(train_dataset):
            optimizer.zero_grad()
            output1, aux_output2, aux_output1 = net(images.to(device))
            loss0 = loss_function(output1, labels.to(device))
            loss1 = loss_function(aux_output1, labels.to(device))
            loss2 = loss_function(aux_output2, labels.to(device))
            loss = loss0 + loss1 * 0.3 + loss2 * 0.3

            # 优化模型
            loss.backward()
            optimizer.step()
        # val
        net.eval()
        accuracy = 0.0
        with torch.no_grad():
            for data in val_dataset:
                test_Img, target = data
                output = net(test_Img.to(device))
                # dim=1表示的是行
                predict_y = torch.argmax(output, dim=1)
                accuracy += (predict_y == target.to(device)).sum().item()
            acc_rate = accuracy / val_num
            # 选取训练效果最好的模型保存
            if acc_rate > best_acc:
                best_acc = acc_rate
                torch.save(net.state_dict(), './GoogLeNet.pth')
            print(acc_rate)

if __name__ == '__main__':
    main()

与AlexNet的不同点

  #将我们的一批数据 训练图像传入到我们的网络当中,会得到三个输出
  #一个主输出,两个辅助分类器的输出
        loss = loss0 + loss1 * 0.3 + loss2 * 0.3
        #原论文中就是以0.3的权重配置到损失当中
#之前网络当中只有一个输出 现在采用了两个辅助分类器 ,所以一共有三个输出
 

4.predict 

需要注意构建模型的时候,载入模型的部分不需要构建辅助分类器aux_logits=False

但是我们保存模型的时候,已经将我们的参数保存在我们的模型当中,所以我们要将我们的strict参数设为False

import json
import torch
from PIL import Image
from torchvision import transforms
from model import GoogLeNet

data_transform = transforms.Compose([
            transforms.Resize((224,244)),
            transforms.ToTensor(),
            transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
    ])
img_path = "./th.jpg"
img = Image.open(img_path)
img = data_transform(img)
#扩充维度,否则图片是三维的
img = torch.unsqueeze(img,dim = 0)
try:
    json_file = open("class_indices.json","r")
    class_indict = json.load(json_file)
except Exception as e:
    print(e)
    exit(-1)
#初始化网络
model = GoogLeNet(num_class=5, aux_logist=False)
model_path = "GoogLeNet.pth"
missing_keys, unexpected_keys = model.load_state_dict(torch.load(model_path), strict=False)
#在unexpe_keys会有一系列层 这些层都是属于那些辅助分类器的

model.eval()
with torch.no_grad():
    output = torch.squeeze(model(img))          #压缩掉第一个维度
    predict = torch.softmax(output,dim=0)       #得到概率
    predict_cla = torch.argmax(predict).numpy()

print(class_indict[str(predict_cla)],predict[predict_cla].item())

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值