CNNs: AlexNet补充

最新推荐文章于 2024-09-14 09:59:20 发布

jjjstephen

最新推荐文章于 2024-09-14 09:59:20 发布

阅读量527

点赞数

分类专栏： CNN 文章标签：深度学习神经网络人工智能

本文链接：https://blog.csdn.net/jjjstephen/article/details/130495528

版权

CNN 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

CNNs: AlexNet的补充

导言
对`AlexNet`模型进行调整
模型不同层的表征
其他探索
总结

导言

上上篇和上一篇我们详细地讲述了AlexNet的网络结构和不同超参数对同一数据集的不同实验现象。

本节，我们就AlexNet的一些其他相关问题进行解剖，如修改AlexNet参数量调整和不同层的feature map表征的意义(当然，不同模型的不同层的feature map所表征的特征也有所不同，我们仅对模型做一个简单的探索)

对`AlexNet`模型进行调整

本节，我们对AlexNet进行简单实验，将每一层的参数量减半(相当于Alex当初在一张显卡上的模型结构)。

相关代码如下：

import torch
import torch.nn as nn
from torchsummary import summary

class AlexNet(nn.Module):
    def __init__(self, class_num = 5):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            # input[3, 227, 227]  output[48, 55, 55]
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=0),
            nn.ReLU(inplace=True),
            # output[48, 27, 27]
            nn.MaxPool2d(kernel_size=3, stride=2),

            # output[128, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            # output[128, 13, 13]
            nn.MaxPool2d(kernel_size=3, stride=2),

            # output[192, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[192, 13, 13]
            nn.Conv2d(192, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[128, 13, 13]
            nn.Conv2d(192, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[128, 6, 6]
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(128 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, class_num),
        )
    
    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

以下是基于调整过AlexNet模型参数batch size为8和16的实验结果。
在这里插入图片描述
以下是基于未调整过AlexNet模型参数batch size为8和16的实验结果。

由实验结果表明，调整过的AlexNet模型并没有未调整过模型的效果好。当然，我们并不是想要表明调整模型参数的效果没有原模型的效果好，模型的调整也需要综合数据集、超参数等各种因素去进行优化。模型的调整也有可能会对不同实验带来更好的效果，如ZFNet（这一篇文献我们将在下一个CNNs系列进行讲解）。

模型不同层的表征

为了能够窥视AlexNet卷积层的feature map，我们将AlexNet网络的编写稍作修改，其改变后的模型如下。

import torch
import torch.nn as nn
from torchsummary import summary

class AlexNet(nn.Module):
    def __init__(self, class_num = 5):
        super(AlexNet, self).__init__()
        self.conv1 = nn.Sequential(
            # input[3, 224, 224]  output[96, 55, 55]
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.ReLU(inplace=True),
            # output[96, 27, 27]
            nn.MaxPool2d(kernel_size=3, stride=2)
        )

        self.conv2 = nn.Sequential(
            # output[256, 27, 27]
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            # output[256, 13, 13]
            nn.MaxPool2d(kernel_size=3, stride=2)
        )

        self.conv3 = nn.Sequential(
            # output[384, 13, 13]
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[384, 13, 13]
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[256, 13, 13]
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            # output[256, 6, 6]
            nn.MaxPool2d(kernel_size=3, stride=2),
        )

        self.fc = nn.Sequential(
            nn.Linear(256 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, class_num),
        )
    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        output = self.fc(x.view(-1, 256 * 6 * 6))
        return output

基于以上代码的修改，我们将网络结构调整为三个卷积层。（当然，为了能够看到更加细节的feture map我们还可以进行拆分）

首先，我们使用上一篇的结论进行训练，并将模型保存至model目录中。

epoch <= 18, lr = 0.01
epoch > 18 ，lr = 0.003
epochs = 150
batch_size = 16

然后，我们将训练好的模型加载进来并且使用训练好的模型对图像进行预测。项目源码.

import sys
sys.path.append('.')

import cv2
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from PIL import Image
from utils_module import param_settings
from torchvision.models.feature_extraction import get_graph_node_names
from torchvision.models.feature_extraction import create_feature_extractor

transform = transforms.Compose(
    [transforms.Resize(256),
     transforms.CenterCrop(227),
     transforms.ToTensor(),
     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model_path = param_settings.SAVE_PATH
load_model = torch.load(model_path)

# nodes, _ = get_graph_node_names(load_model)
# print(nodes)

feature_extractor = create_feature_extractor(load_model, return_nodes={"conv1":"output1","conv2":"output2","conv3":"output3","fc":"output4"})

test_img_path = 'G:/learning/05dataset/pokeman/test/4.jpg'

img = Image.open(test_img_path)
img = transform(img)

#转换维度
img = img.unsqueeze(0).to(device)

out = feature_extractor(img) 
size = len(out["output1"][0])
for i in range(0, size):
    plt.imshow(out["output1"][0].cpu()[i,:,:].detach().numpy())

size = len(out["output2"][0])
for i in range(0, size):
    plt.imshow(out["output2"][0].cpu()[i,:,:].detach().numpy())

size = len(out["output3"][0])
for i in range(0, size):
    plt.imshow(out["output3"][0].cpu()[i,:,:].detach().numpy())
# 这里没有分通道可视化
plt.imshow(out["output1"][0].cpu().transpose(0, 1).sum(1).detach().numpy())
plt.imshow(out["output2"][0].cpu().transpose(0, 1).sum(1).detach().numpy())
plt.imshow(out["output3"][0].cpu().transpose(0, 1).sum(1).detach().numpy())
plt.show()

nodes, _ = get_graph_node_names(load_model)
print(nodes)

可以使用get_graph_node_names获取网络结构，并使用create_feature_extractor进行提取对应结构的特征图。
我们使用的测试图片如图所示：
在这里插入图片描述
卷积网络1的输出特征图如图所示，输出feature map的shape为[96, 27, 27]:

卷积网络2的输出特征图如图所示，输出feature map的shape为[256, 13, 13]:

由实验结果表明，卷积网络2所表征的意义十分抽象，已经和原图像中物体相差较远。

卷积网络2的输出特征图如图所示，输出feature map的shape为[256, 6, 6]:
在这里插入图片描述
以上是对不同的卷积网络不同通道进行的抽样，可以看到，基本上从第二个卷积网络生成的feature map会越来越抽象，很难表示feature与原图中物体的关系。

下面是将三个卷积网络的生成的feature map没有拆分通道的图像：
在这里插入图片描述

其他探索

本节，我们验证了文献AlexNet中的一个结论：如果两幅图像产生的特征激活向量具有很小的欧几里得分离，我们可以说神经网络的更高层次认为它们是相似的。(If two images produce feature activation vectors with a small Euclidean separation, we can say that the higher levels of the neural network consider them to be similar)
在这里插入图片描述
上图是第一列中的五个ILSVRC-2010测试图像。剩余的列示出了在最后一个隐藏层中产生特征向量的六个训练图像，其具有与测试图像的特征向量的最小欧几里得距离。其原理与以图搜图的功能类似。