关于tensorrtx里面的convBnSiLU以及相关应用解析

lindsayshuo

已于 2024-03-13 13:25:50 修改

阅读量823

点赞数 9

文章标签： YOLO

于 2024-03-11 11:44:50 首次发布

本文链接：https://blog.csdn.net/weixin_43269994/article/details/136619178

版权

可以参考我改的项目：

https://github.com/lindsayshuo/yolov8-cls-tensorrtx

先看convBnSiLU代码：

nvinfer1::IElementWiseLayer* convBnSiLU(nvinfer1::INetworkDefinition* network, std::map<std::string, nvinfer1::Weights> weightMap, 
nvinfer1::ITensor& input, int ch, int k, int s, int p, std::string lname){
    nvinfer1::Weights bias_empty{nvinfer1::DataType::kFLOAT, nullptr, 0};
    nvinfer1::IConvolutionLayer* conv = network->addConvolutionNd(input, ch, nvinfer1::DimsHW{k, k}, weightMap[lname+".conv.weight"], bias_empty);
    assert(conv);
    conv->setStrideNd(nvinfer1::DimsHW{s, s});
    conv->setPaddingNd(nvinfer1::DimsHW{p, p});

    nvinfer1::IScaleLayer* bn = addBatchNorm2d(network, weightMap, *conv->getOutput(0), lname+".bn", 1e-5);

    nvinfer1::IActivationLayer* sigmoid = network->addActivation(*bn->getOutput(0), nvinfer1::ActivationType::kSIGMOID);
    nvinfer1::IElementWiseLayer* ew = network->addElementWise(*bn->getOutput(0), *sigmoid->getOutput(0), nvinfer1::ElementWiseOperation::kPROD);
    assert(ew);
    return ew;
}

在提供的代码段中，convBnSiLU 函数创建了一个卷积层（Convolution）、一个批量归一化层（BatchNorm），并应用了Sigmoid线性单元（SiLU，也被称为Swish）激活函数。此函数针对一个深度学习网络构建特定的层结构，并将它们连接起来。现在依次解释这个函数的参数：
（1）nvinfer1::INetworkDefinition* network：这是网络定义的指针，用于构建网络。
（2）std::map<std::string, nvinfer1::Weights> weightMap：这是一个映射，以字符串键（通常是层的名称）存储一个权重结构（nvinfer1::Weights），包括权重数据和数据类型等。
（3）nvinfer1::ITensor& input：这是输入tensor，即当前层的输入数据。
（4）int ch：表示输出通道数，通常这决定了卷积层的过滤器数量。
（5）int k：表示卷积核大小，如果是正方形卷积核，则kxk 会是它的尺寸。
（6）int s：表示卷积核的步幅（stride），即卷积核在输入数据上移动的步长。
（7）int p：表示卷积层的填充（padding），用来在输入数据的边缘上添加额外的层，以保持尺寸或只进行边缘的减少。
（8）std::string lname：这是层的名称（或前缀），在为权重查找其在weightMap中的条目时使用。
整个 convBnSiLU 函数构造了一个包含卷积、批规范和SiLU激活的复合层，然后返回一个元素逐个操作的层（ElementWiseLayer），其中计算批规范的输出与Sigmoid激活后的输出相乘的结果。
再看代码：

nvinfer1::IElementWiseLayer* conv0 = convBnSiLU(network, weightMap, *data, get_width(64, gw, max_channels), 3, 2, 1, "model.0");

此外，在函数调用 convBnSiLU(network, weightMap, *data, get_width(64, gw, max_channels), 3, 2, 1, “model.0”) 中，函数参数为:
（1）network 是构建神经网络的前提条件。
（2）weightMap 提供了层需要的权重信息。
（3）*data 是前一层提供的输入tensor。
（4）get_width(64, gw, max_channels) 是根据一些逻辑计算出来的输出通道数。
（5）3 是卷积核大小。
（6）2 是卷积核的步长。
（7）1 是卷积层使用的填充。
（8）“model.0” 是这一层权重在weightMap中的键名前缀。

实验分析：
参考以下代码打印：

void printDims(const std::string& layerName, const nvinfer1::Dims& dims) {
    std::cout << layerName << " Dimensions(" << dims.nbDims << "): [ ";
    for (int i = 0; i < dims.nbDims; ++i) {
        std::cout << dims.d[i] << " ";
    }
    std::cout << "]" << std::endl;
}
// 示例用法
// printDims(conv0->getOutput(0)->getDimensions());



// ****************************************** 对他通道进行显示 **********************************************
static int get_width(int x, float gw, int max_channels, int divisor = 8) {
    auto channel = int(ceil((x * gw) / divisor)) * divisor;
    std::cout << "Calculated channel width: " << channel << std::endl;
    std::cout << "Maximum channel limit: " << max_channels << std::endl;
    return channel >= max_channels ? max_channels : channel;
}


// ****************************************** 对nvinfer1::ITensor* data进行显示 **********************************************
if (data) {
	// 获取维度对象
	nvinfer1::Dims dims = data->getDimensions();
	// 打印维度信息
	std::cout << "Input shape: [";
	for (int i = 0; i < dims.nbDims; ++i) {
		   std::cout << dims.d[i];
		   if (i < dims.nbDims - 1) {
		       std::cout << ", ";
		   }
}
std::cout << "]" << std::endl;


// ****************************************** 对model.9进行显示 **********************************************
nvinfer1::Dims pool2Dims = pool2->getOutput(0)->getDimensions();
// Output dimension information
std::cout << "Dimensions of the output from pool2 layer: ";
for (int i = 0; i < pool2Dims.nbDims; ++i) {
    std::cout << pool2Dims.d[i] << " ";
}
std::cout << std::endl;

// Assuming 'C' is the channel, located at index 1 (according to the standard NCHW format)
int featureMapCount = pool2Dims.d[1];
std::cout << "Number of feature maps: " << featureMapCount << std::endl;

auto& fcWeights = weightMap["model.9.linear.weight"];
std::cout << "model.9.linear.weight count: " << fcWeights.count << std::endl;

// Calculate the weight shape if needed
// Assuming the weight shape for fully connected layer is [number of output channels x number of input channels]
// The number of output nodes of the fully connected layer has already been defined as 1000 previously
int outputChannels = kClsNumClass; // Number of output nodes for the fully connected layer, according to the value defined in your network
// The following calculation assumes that the weights are stored with output channels prioritized
int inputChannels = fcWeights.count / outputChannels;
std::cout << "Shape of model.9.linear.weight: [" << outputChannels << " x " << inputChannels << "]" << std::endl;

如果conv0 Dimensions(3): [ 16 112 112 ] 指的是第一个卷积层输出特征图的维度。这里的 [ 16 112 112 ] 分别表示输出特征图的通道数、高度和宽度。
这些维度是根据输入特征图的大小以及卷积层的参数（如卷积核大小、步长、填充等）计算得来的。在这种情况下，如果输入特征图的尺寸是 [3, 224, 224]（即3个通道，高度和宽度都是224），那么输出尺寸的计算方式通常如下所示：
假设：
输入尺寸：[3, 224, 224] （通道数，高度，宽度）
卷积核尺寸：假设为 3x3 （这是常用的卷积核尺寸）
步长（stride）：2 （步长决定了卷积核移动的步幅）
填充（padding）：1 （填充用来保持特征图大小或者只进行边缘减少）
卷积操作后输出特征图的宽度和高度的计算公式是：
在这里插入图片描述

所以对于高度和宽度224的输入，我们有：
在这里插入图片描述

这样我们得到了新的高度和宽度都是112的输出特征图。由于这是卷积层 conv0，假设该层设置的输出通道数是16，那么完整的输出特征图的维度就是 [16, 112, 112]。
这解释了 conv0 Dimensions(3): [ 16 112 112 ] 中的112是如何得出的。

lindsayshuo

关注

9
点赞
踩
19

收藏

觉得还不错? 一键收藏
打赏
0
评论
关于tensorrtx里面的convBnSiLU以及相关应用解析

（2）std::map<std::string, nvinfer1::Weights> weightMap：这是一个映射，以字符串键（通常是层的名称）存储一个权重结构（nvinfer1::Weights），包括权重数据和数据类型等。由于这是卷积层 conv0，假设该层设置的输出通道数是16，那么完整的输出特征图的维度就是 [16, 112, 112]。（7）int p：表示卷积层的填充（padding），用来在输入数据的边缘上添加额外的层，以保持尺寸或只进行边缘的减少。
复制链接

扫一扫