数据输入卷积层后，其输出数据的维度计算笔记

最新推荐文章于 2024-08-28 16:52:20 发布

Smallngg

最新推荐文章于 2024-08-28 16:52:20 发布

阅读量647

点赞数

文章标签：深度学习图像处理分类

本文链接：https://blog.csdn.net/NgfSIX/article/details/132911527

版权

一、修改输入层的卷积核大小和步幅时，通常需要考虑以下几个因素：

1. 输入图像的尺寸：CIFAR-10图像的尺寸为32x32像素。

2. 希望的下采样程度：卷积层的步幅（stride）决定了下采样的程度。较大的步幅会导致图像尺寸减小得更快。

3. 滤波器的大小：卷积核的大小决定了每次卷积操作覆盖的图像区域。

根据上述因素，你可以采用以下步骤来确定新的卷积核大小和步幅：

1. 确定希望的输出尺寸：首先，确定你希望第一个卷积层的输出尺寸是多少。这通常由你的网络架构和任务需求决定。例如，如果你想保持输出的尺寸与输入相同，那么输出尺寸应为32x32。

2. 根据输出尺寸计算步幅：使用以下公式来计算步幅，以保持输入尺寸与输出尺寸相同：

步幅 (stride) = 输入尺寸 / 输出尺寸

在这种情况下，如果你希望输出尺寸为32x32，那么步幅将为1，因为32 / 32 = 1。

3. 确定卷积核大小：卷积核大小通常与步幅一起决定。较大的卷积核通常需要较大的步幅才能保持输出尺寸不变。你可以根据步幅和网络架构的需求来选择合适的卷积核大小。通常，3x3或5x5的卷积核在实践中经常使用。

二、在一个分类任务中，通常使用卷积神经网络（Convolutional Neural Network，CNN）来处理图像数据。

计算各层的输入通道和输出通道需要考虑网络的结构和层次，以确保信息的流动和特征的提取。以下是一个一般的计算方法：

1. 输入层：输入通道数等于输入图像的通道数。例如，对于RGB图像，输入通道数为3。计算输入数据通道数的方法：如果是PIL图像，则使用image.mode()获取其输入通道数，如果是Numpy数组，则使用image.shape来获取其形状，对于形状为 (height, width, channels) 的数组，通道数即为 channels。

2. 卷积层：卷积层通常由多个卷积核（过滤器）组成，每个卷积核生成一个输出通道。输出通道的数量可以根据设计要求和任务需求来确定。通常，你可以通过超参数来指定每个卷积层的输出通道数量。

- 如果你想增加输出通道的数量，可以增加卷积核的数量。

- 如果你想减少输出通道的数量，可以减少卷积核的数量。

3. 池化层：池化层通常不会改变通道的数量，它的主要作用是降低空间维度而不改变通道维度。

4. 全连接层：在全连接层之前，通常需要将卷积层的输出扁平化为一维向量。然后，全连接层的输入通道数等于扁平化后的向量长度，输出通道数取决于你的分类任务中的类别数量。通常，输出通道数等于类别数量。

5. 输出层：输出层的通道数等于分类任务中的类别数量。通常，输出通道数等于类别数量。

通常情况下，你可以通过调整卷积层和全连接层的输出通道数量来控制网络的容量和复杂度。更多的通道可能会提高网络的表示能力，但也会增加计算负担。对于分类任务，输出通道数应等于类别数量，以便进行分类。不过，这些通道数的具体选择通常需要在训练和验证中进行调整和优化，以找到最佳的网络结构。

输出通道的数量在卷积神经网络 (CNN) 的卷积层中通常是一个可以调整的超参数，它决定了每个卷积层产生多少个不同的特征图（feature map）。输出通道的数量可以根据你的网络设计需求来确定。以下是一些常见的方法来计算输出通道数量：

1. 手动指定：你可以手动指定每个卷积层的输出通道数量。这通常需要一些经验和领域知识。你可以根据任务的复杂性、数据集的特征以及计算资源来调整输出通道的数量。例如，你可以开始使用较少的输出通道，然后逐渐增加以提高网络的复杂性。

2. 自动调整：你可以使用超参数优化技术来自动选择最佳的输出通道数量。这可以通过交叉验证、网格搜索或随机搜索等方法来完成。这样可以找到在给定任务上表现最好的输出通道数量。

3. 网络架构：具体的网络架构（例如，ResNet、VGG、Inception等）可能会提供一些关于输出通道数量的指导。你可以参考这些网络的建议或指南，以选择合适的输出通道数量。

4. 任务需求：输出通道的数量也取决于你的任务需求。如果你的任务需要捕获更多的细节和特征，可能需要更多的输出通道。如果任务相对简单，可以考虑减少输出通道数量以降低模型的复杂性。

5. 计算资源：要考虑你的计算资源，包括GPU内存和训练时间。增加输出通道数量可能会增加计算和内存需求。

总之，输出通道数量通常是可以调整的超参数，你可以根据任务需求、网络架构、计算资源和超参数优化等因素来确定它。在实践中，通常会通过尝试不同的输出通道数量并评估模型性能来找到最佳的配置。

一个例子：

若想用AlexNet模型训练CIFAR10数据集，必须要对原模型进行修改，否则模型的输入维度，卷积核等无法与CIFAR10数据集相匹配，则不能训练。

以下为原始AlexNet模型

class AlexNet(nn.Module):
    """
    Neural network model consisting of layers propsed by AlexNet paper.
    """
    def __init__(self, num_classes=1000):
        """
        Define and allocate layers for this neural net.

        Args:
            num_classes (int): number of classes to predict with this model
        """
        super().__init__()
        # input size should be : (b x 3 x 227 x 227)
        # The image in the original paper states that width and height are 224 pixels, but
        # the dimensions after first convolution layer do not lead to 55 x 55.
        self.net = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11, stride=4),  # (b x 96 x 55 x 55)
            nn.ReLU(),
            nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2),  # section 3.3
            nn.MaxPool2d(kernel_size=3, stride=2),  # (b x 96 x 27 x 27)
            nn.Conv2d(96, 256, 5, padding=2),  # (b x 256 x 27 x 27)
            nn.ReLU(),
            nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2),
            nn.MaxPool2d(kernel_size=3, stride=2),  # (b x 256 x 13 x 13)
            nn.Conv2d(256, 384, 3, padding=1),  # (b x 384 x 13 x 13)
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, padding=1),  # (b x 384 x 13 x 13)
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, padding=1),  # (b x 256 x 13 x 13)
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # (b x 256 x 6 x 6)
        )
        # classifier is just a name for linear layers
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5, inplace=True),
            nn.Linear(in_features=(256 * 6 * 6), out_features=4096),
            nn.ReLU(),
            nn.Dropout(p=0.5, inplace=True),
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(),
            nn.Linear(in_features=4096, out_features=num_classes),
        )
        self.init_bias()  # initialize bias

    def init_bias(self):
        for layer in self.net:
            if isinstance(layer, nn.Conv2d):
                # 对于每个卷积层，都通过nn.init.normal_函数给其权重参数设置了
                # 随机生成的标准差为0.01的正态分布初始值；并通过nn.init.constant_函数将其偏置参数初始化为0。
                nn.init.normal_(layer.weight, mean=0, std=0.01)
                nn.init.constant_(layer.bias, 0)
        # original paper = 1 for Conv2d layers 2nd, 4th, and 5th conv layers
        # 给这3个卷积层的偏置参数设置常数值1，
        nn.init.constant_(self.net[4].bias, 1)
        nn.init.constant_(self.net[10].bias, 1)
        nn.init.constant_(self.net[12].bias, 1)

    def forward(self, x):
        """
        Pass the input through the net.

        Args:
            x (Tensor): input tensor

        Returns:
            output (Tensor): output tensor
        """
        x = self.net(x)
        # 这里使用 x.view() 函数将特征图展开成一维向量，-1表示这个维度的大小会自动计算。为了满足全连接层的输入要求，
        # 需要将特征图展开成一个二维矩阵，其中第一维代表样本数，第二维代表特征数。所以将特征图从三维向量变成二维矩阵就是将它展开成一维向量。
        x = x.view(-1, 256 * 6 * 6)  # reduce the dimensions for linear layer input
        return self.classifier(x)

AlexNet(
  (net): Sequential(
    (0): Conv2d(3, 96, kernel_size=(11, 11), stride=(4, 4))
    (1): ReLU()
    (2): LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=2)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (4): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (5): ReLU()
    (6): LocalResponseNorm(5, alpha=0.0001, beta=0.75, k=2)
    (7): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (8): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU()
    (10): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU()
    (12): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU()
    (14): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=True)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=True)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU()
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Process finished with exit code 0

修改后的AlexNet

'''AlexNet in PyTorch.'''
# 原论文中的AlexNet的网络结构
# 不适用于进行cifar10分类
# 因为cifar10图像大小是32 * 32，经过下采样后，输出tensor为0
# 因此，这里为了适配cifar10数据集，而进行适当改进
# 将特征提取层最后的最大池化层，更改为均值池化层
import torch.nn as nn
from torch.nn import init


class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        # 特征提取层
        self.feature_extraction = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=96, kernel_size=11,
                      stride=4, padding=2, bias=False),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=0),
            nn.Conv2d(in_channels=96, out_channels=192,
                      kernel_size=5, stride=1, padding=2, bias=False),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=0),
            nn.Conv2d(in_channels=192, out_channels=384,
                      kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=384, out_channels=256,
                      kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels=256, out_channels=256,
                      kernel_size=3, stride=1, padding=1, bias=False),
            nn.ReLU(inplace=True),
            # nn.MaxPool2d(kernel_size=3, stride=2, padding=0),
            nn.AdaptiveAvgPool2d((1, 1))
        )

        # 分类卷积层
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(in_features=256, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(in_features=4096, out_features=4096),
            nn.ReLU(inplace=True),
            nn.Linear(in_features=4096, out_features=num_classes),
        )

        # 初始化权重
        self.initialize_weights()

    def initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
                # 使用Xavier初始化
                init.xavier_uniform_(m.weight)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.feature_extraction(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

AlexNet(
  (feature_extraction): Sequential(
    (0): Conv2d(3, 96, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2), bias=False)
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(96, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (11): ReLU(inplace=True)
    (12): AdaptiveAvgPool2d(output_size=(1, 1))
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=256, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=10, bias=True)
  )
)

Process finished with exit code 0

改变：第一层增加了一个2*2的padding；去掉了卷积层后的本地归一化；第二个卷积层的输出通道从384变成了192；后面的卷积层和全连接层的输入输出通道也有修改。