【keras模型查看】（卷积层、池化层、全连接层、Batchnorm层）参数个数、乘法次数

最新推荐文章于 2023-10-09 19:47:41 发布

阑珊珊珊

最新推荐文章于 2023-10-09 19:47:41 发布

阅读量3.9k

点赞数 4

分类专栏： TensorFlow Keras Python

本文链接：https://blog.csdn.net/u010637291/article/details/112320280

版权

TensorFlow 同时被 3 个专栏收录

21 篇文章 7 订阅

订阅专栏

Python

21 篇文章 2 订阅

订阅专栏

Keras

3 篇文章 0 订阅

订阅专栏

1. 卷积层

1.1 输入参数

卷积的输入参数：指需要做卷积的输入图像/音频等，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，具体图片的含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和float64其中之一

源码说明：

Arguments:
    filters: Integer, the dimensionality of the output space
      (i.e. the number of output filters in the convolution).
    kernel_size: An integer or tuple/list of 2 integers, specifying the
      height and width of the 2D convolution window.
      Can be a single integer to specify the same value for
      all spatial dimensions.
    strides: An integer or tuple/list of 2 integers,
      specifying the strides of the convolution along the height and width.
      Can be a single integer to specify the same value for
      all spatial dimensions.
      Specifying any stride value != 1 is incompatible with specifying
      any `dilation_rate` value != 1.
    padding: one of `"valid"` or `"same"` (case-insensitive).

第一个参数filters：卷积核个数，也是输出通道数。Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
第二个参数kernel_size: 卷积核大小，指定二维卷积窗口的高和宽，（如果kernel_size只有一个整数，代表宽和高相等）：An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
第三个参数strides: 卷积步长，指定卷积窗沿高和宽方向的每次移动步长，An integer or tuple/list of 2 integers, （如果strides只有一个整数，代表沿着宽和高方向的步长相等） specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
第四个参数padding: 为valid或same中一种， one of "valid" or "same" (case-insensitive). 两种padding方式的区别如下：
- same mode
  
  当filter的中心(K)与image的边角重合时，开始做卷积运算。注意：这里的same还有一个意思，卷积之后输出的feature map尺寸保持不变(相对于输入图片)。当然，same模式不代表完全输入输出尺寸一样，也跟卷积核的步长有关系。same模式也是最常见的模式，因为这种模式可以在前向传播的过程中让特征图的大小保持不变，调参师不需要精准计算其尺寸变化(因为尺寸根本就没变化)。
- valid mode
  
  当filter全部在image里面的时候，进行卷积运算，可见filter的移动范围较same更小了。

1.2 输出维数

ref: https://www.cnblogs.com/sddai/p/10512784.html

由上一小节可知，卷积层的padding方式不同，其输出维数也会不同。在此，分为padding=valid和padding=same两种情况进行说明：

1.2.1 padding=valid

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels],
filters,
kernel_size=[k_h, k_w],
stride_size=[s_h, s_w]
padding=‘valid’

则输出参数为：

output = [batch_size, out_height, out_width, out_channels]

# 其中：
out_channels = filters
out_height = ceil((in_height - k_h + 1) / s_h)
out_width = ceil((in_width - k_w + 1) / s_w)

示例1：

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 34, 13, 1]
filters=128
kernel_size=[k_h, k_w]=[10, 4]
stride_size=[s_h, s_w]=[3, 2]
padding=‘valid’

则输出参数为：

output = [batch_size, out_height, out_width, out_channels]

# 其中：
out_channels = filters = 128
out_height = ceil((in_height - k_h + 1) / s_h) = ceil(34-10+1/3)=ceil(8.3) = 9
out_width = ceil((in_width - k_w + 1) / s_w) = ceil(13-4+1/2) = ceil(5) = 5

# 即
output = [batch_size, out_height, out_width, out_channels] = [1, 9, 5, 128]

实际示例:
在这里插入图片描述

示例2：

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 9, 7, 128]
filters=128
kernel_size=[k_h, k_w]=[3, 3]
stride_size=[s_h, s_w]=[1, 1]
padding=‘valid’

则输出参数为：

output = [batch_size, out_height, out_width, out_channels]

# 其中：
out_channels = filters = 128
out_height = ceil((in_height - k_h + 1) / s_h) = ceil(9-3+1/1)=ceil(7) = 7
out_width = ceil((in_width - k_w + 1) / s_w) = ceil(7-3+1/1) = ceil(5) = 5

# 即
output = [batch_size, out_height, out_width, out_channels] = [1, 7, 5, 128]

实际示例：

在这里插入图片描述

1.2.2 padding=same

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels],
filters,
kernel_size=[k_h, k_w],
stride_size=[s_h, s_w]
padding=‘same’

则输出参数为：

output = [batch_size, out_height, out_width, out_channels]

# 其中：
out_channels = filters
out_height = ceil(in_height / s_h)
out_width = ceil(in_width / s_w)

示例3：

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 34, 13, 1]
filters=128
kernel_size=[k_h, k_w]=[10, 4]
stride_size=[s_h, s_w]=[3, 2]
padding=‘valid’

则输出参数为：

output = [batch_size, out_height, out_width, out_channels]

# 其中：
out_channels = filters = 128
out_height = ceil(in_height / s_h) = ceil(34/3) = ceil(11.3) = 12
out_width = ceil(in_width / s_w) = ceil(13/2) = ceil(6.5) = 7

# 即
output = [batch_size, out_height, out_width, out_channels] = [1, 12, 7, 128]

1.3 参数个数

由上一篇博客（【keras模型查看】模型结构、模型参数、每层输入/输出：https://blog.csdn.net/u010637291/article/details/110677379）可知查看模型结构和参数的方式，即：

# 查看模型层及参数
model.summary()

可查看到每层模型的参数个数，如最后一层的全连接层有3999个参数，模型总参数个数为85023：

在这里插入图片描述

但其实可根据每层的输入参数计算出具体的参数个数，如卷积层的参数个数（ref：CNN中卷积层的计算细节：https://zhuanlan.zhihu.com/p/29119239）：

示例4.0：

普通二维卷积Conv2D，给定输入参数（同示例1）：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 34, 13, 1]
filters=128
kernel_size=[k_h, k_w]=[10, 4]
stride_size=[s_h, s_w]=[3, 2]
padding=‘valid’

参数个数：

卷积核大小=k_h $\times$ k_w $\times$ in_channels = 10 * 4 * 1= 40
卷积核个数 = filters = 128
偏置项个数 = 卷积核个数 = filters = 128
参数个数 = 卷积核个数 $\times$ 卷积核大小 + 偏置项个数 = 128 * 40 + 128 = 5248

在这里插入图片描述

示例4.1：

DepthWiseConv2D，给定输入参数（同示例1）：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 7, 7, 128]
filters=128
kernel_size=[k_h, k_w]=[3, 3]
stride_size=[s_h, s_w]=[1, 1]
padding=‘valid’

参数个数：

卷积核大小=k_h $\times$ k_w $\times$ in_channels = 3 * 3 * 1 = 9
卷积核个数 = filters = 128
偏置项个数 = 卷积核个数 = filters = 128
参数个数 = 卷积核个数 $\times$ 卷积核大小 + 偏置项个数 = 128 * 9 + 128 = 1280

示例4.2：

PointWiseConv2D，给定输入参数（同示例1）：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 7, 7, 128]
filters=128
kernel_size=[k_h, k_w]=[1, 1]
stride_size=[s_h, s_w]=[1, 1]
padding=‘valid’

参数个数：

卷积核大小=k_h $\times$ k_w $\times$ in_channels $\times$ k_depth= 1 * 1 * 1 * 128 = 128
卷积核个数 = filters = 128
偏置项个数 = 卷积核个数 = filters = 128
参数个数 = 卷积核个数 $\times$ 卷积核大小 + 偏置项个数 = 128 * 128 + 128 = 16512

在这里插入图片描述

1.4 乘法次数

只考虑乘法计算量：

为了得到输出的特征图的某一个位置的像素值，需要如下次乘法操作：in_channels $\times$ k_h $\times$ k_w

而特征图总共有out_channels $\times$ out_height $\times$ out_width 个像素。

因此总计算量（乘法次数）为:

multiply_times = in_channels * k_h * k_w * (out_channels * out_height * out_width)

示例5：

给定输入参数（同示例1）：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 34, 13, 1]
filters=128
kernel_size=[k_h, k_w]=[10, 4]
stride_size=[s_h, s_w]=[3, 2]
padding=‘valid’

乘法次数：

in_channels = 1
k_h, k_w = 10, 4
out_channels = 128
out_height = ceil((in_height - k_h + 1) / s_h) = ceil(34-10+1/3) = 9
out_width = ceil((in_width - k_w + 1) / s_w) = ceil(13-4+1/2) = 5
multiply_times = in_channels * k_h * k_w * (out_channels * out_height * out_width) = 1 * 10 * 4 * (128 * 9 * 5) = 230,400

2. 池化层

可有最大池化层和平均池化层，目前最大池化层更常用。即在输入的池化大小里寻找最大或平均值作为输出。以最大池化层示例：

在这里插入图片描述

池化层即对输入的特征图进行压缩，一方面使特征图变小，简化网络计算复杂度；一方面进行特征压缩，提取主要特征。

2.1 输入参数

输入参数：一个4维Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，类型为float32和float64其中之一。

源码说明：

"""Average pooling operation for spatial data.

  Arguments:
    pool_size: integer or tuple of 2 integers,
      factors by which to downscale (vertical, horizontal).
      `(2, 2)` will halve the input in both spatial dimension.
      If only one integer is specified, the same window length
      will be used for both dimensions.
    strides: Integer, tuple of 2 integers, or None.
      Strides values.
      If None, it will default to `pool_size`.
    padding: One of `"valid"` or `"same"` (case-insensitive).

第一个参数pool_size：池化大小
第二个参数strides：步长
第三个参数padding：同卷积层padding。

2.2 输出维数

和卷积类似，但不需要做卷积运算，做取平均值和最大值统计：

示例6：

给定输入参数：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 1, 5, 128]
pool_size=[p_h, p_w]=[1, 5]
strides=[s_h, s_w]=1
padding=‘valid’

输出维数：

output = [batch_size, out_height, out_width, out_channels]，其中：
out_height = ceil((in_height - p_h + 1) / s_h) = ceil((1-1+1)/1)=1
out_width = ceil((in_width - p_w + 1) / s_w) = ceil((5-5+1)/1) = 1
即 output = [1, 1, 1, 128]

在这里插入图片描述

2.3 参数个数

无。如下图所示：
在这里插入图片描述

2.4 乘法次数

如果为最大池化层，则不需做乘法；
如果为平均池化层，则乘法次数=out_height $\times$ out_width $\times$ out_channels。

示例7：

给定输入参数（同示例6）：

inputs=[batch_size, in_height, in_width, in_channels]=[1, 1, 5, 128]
pool_size=[p_h, p_w]=[1, 5]
strides=[s_h, s_w]=1
padding=‘valid’

乘法次数 = 1 * 1 * 128 = 128

3. 全连接层

实际上，全连接层也可以被视为是一种极端情况的卷积层，其卷积核尺寸就是输入矩阵尺寸，因此输出矩阵的高度和宽度尺寸都是1。

不过区别在于，一维卷积是单个像素位置的全部通道进行线性加权，而全连接是先把所有输入平铺（或者用池化层）成一维向量，即更偏向于对像素级别的线性加权。还有就是广义上的全连接网络还会带一个激活函数。

总之，一句话讲，两者的区别的就是，一维卷积是对图片通道级别的操作，全连接则更偏向于是像素级别的操作。

3.1 输入参数

输入参数：一个2维Tensor，具有[batch_size, in_channels]这样的shape，类型为float32和float64其中之一。

源码说明：

Arguments:
    units: Positive integer, dimensionality of the output space.
    activation: Activation function to use.
      If you don't specify anything, no activation is applied
      (ie. "linear" activation: `a(x) = x`).
    use_bias: Boolean, whether the layer uses a bias vector.

第一个参数units：输出维数
第二个参数activation：激活函数，如softmax
第三个参数use_bias：是否使用偏置项。

3.2 输出维数

由上可知，全连接层的[in_height, in_width] = [1, 1]， [k_h, k_w]=[1, 1], [out_height, out_width]=[1, 1]。

输出即为：outputs: [batch_size, units]

示例8：

给定输入参数：

inputs = [batch_size, in_channels] = [1, 128]
units = 31
activation = softmax

输出维数 = [batch_size, units] = [1, 128]

在这里插入图片描述

3.3 参数个数

参数个数 = units $\times$ in_channels + 偏置项个数

示例9：

给定输入参数（同示例8）：

inputs = [batch_size, in_channels] = [1, 128]
units = 31
activation = softmax

参数个数= 31 * 128 + 128 = 3999
在这里插入图片描述

3.4 乘法次数

只考虑乘法计算量：

multiply_times = in_channels * k_h * k_w * (out_channels * out_height * out_width)

示例10：

给定输入参数（同示例8）：

inputs=[batch_size, in_channels]=[1, 128]
units = 31
padding=‘valid’

乘法次数：

multiply_times = in_channels * k_h * k_w * (out_channels * out_height * out_width) = 128 * 1 * 1 * (31 * 1 * 1) = 3968

4. Batch normalization层

Batch Normalization，简称BatchNorm或BN，翻译为“批归一化”，是神经网络中一种特殊的层，如今已是各种流行网络的标配。

4.1 参数个数

针对输入bn_in，batchnorm层的计算为：

bn_out = gamma * (bn_in - mean) / sqrt(variance) + beta

共涉及到四个参数：gamma、mean、variance和beta。

如果batch size为m，则在前向传播过程中，网络中每个节点都有m个输出，所谓的Batch Normalization，就是对该层每个节点的这m个输出进行归一化再输出，具体计算方式如下：
在这里插入图片描述
所以batchnorm层的参数个数 = 4 $\times$ batch_size

示例11：

给定输入inputs=[1, 9, 5, 128]

则输出为outputs=[1, 9, 5, 128]

参数个数=4*128=512

在这里插入图片描述

4.2 乘法次数

根据batchnorm层的计算，即：

bn_out = gamma * (bn_in - mean) / sqrt(variance) + beta

共有2次加法运算，2次乘法。

则 乘法次数 = 2 * batch_size

无参数层 (InputLayer, ZeroPadding, ReLU, AveragePooling, Reshape)

实际中，参数为0的层包括：InputLayer, ZeroPadding, ReLU, AveragePooling, Reshape

在这里插入图片描述

阑珊珊珊

关注

4
点赞
踩
21

收藏

觉得还不错? 一键收藏
1
评论
【keras模型查看】（卷积层、池化层、全连接层、Batchnorm层）参数个数、乘法次数

文章目录1. 卷积层1.1 输入参数1.2 输出维数1.2.1 padding=valid1.2.2 padding=same1.3 参数个数乘法次数1. 卷积层1.1 输入参数卷积的输入参数：指需要做卷积的输入图像/音频等，它要求是一个Tensor，具有[batch, in_height, in_width, in_channels]这样的shape，具体图片的含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]，注意这是一个4维的Tensor，要求类型为float32和
复制链接

扫一扫