How to calculate the number of parameters in CNNs?_how to calculate param size in cnn-CSDN博客

本文详细解析了VGGNet中各卷积层及全连接层的参数数量，并给出具体计算方法，展示了整个网络的参数总数如何达到138M。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

How to calculate the number of parameters in CNNs?

This post comes from https://stackoverflow.com/questions/28232235/how-to-calculate-the-number-of-parameters-of-convolutional-neural-networks.

If you refer to VGG Net with 16-layer (table 1, column D) then 138M refers to the total number of parameters of this network, i.e including all convolutional layers, but also the fully connected ones.

Looking at the 3rd convolutional stage composed of 3 x conv3-256 layers:

the first one has N=128 input planes and F=256 output planes,
the two other ones have N=256 input planes and F=256 output planes.

The convolution kernel is 3x3 for each of these layers. In terms of parameters this gives:

128x3x3x256 (weights) + 256 (biases) = 295,168 parameters for the 1st one,
256x3x3x256 (weights) + 256 (biases) = 590,080 parameters for the two other ones.

As explained above you have to do that for all layers, but also the fully-connected ones, and sum these values to obtain the final 138M number.

UPDATE: the breakdown among layers give:

conv3-64  x 2       : 38,720
conv3-128 x 2       : 221,440
conv3-256 x 3       : 1,475,328
conv3-512 x 3       : 5,899,776
conv3-512 x 3       : 7,079,424
fc1                 : 102,764,544
fc2                 : 16,781,312
fc3                 : 4,097,000
TOTAL               : 138,357,544

In particular for the fully-connected layers (fc):

fc1 (x): (512x7x7)x4,096 (weights) + 4,096 (biases)
 fc2    : 4,096x4,096     (weights) + 4,096 (biases)
 fc3    : 4,096x1,000     (weights) + 1,000 (biases)

(x) see section 3.2 of the article: the fully-connected layers are first converted to convolutional layers (the first FC layer to a 7 × 7 conv. layer, the last two FC layers to 1 × 1 conv. layers).

Details about fc1

As precised above the spatial resolution right before feeding the fully-connected layers is 7x7 pixels. This is because this VGG Net uses spatial padding before convolutions, as detailed within section 2.1 of the paper:

[…] the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1 pixel for 3×3 conv. layers.

With such a padding, and working with a 224x224 pixels input image, the resolution decreases as follow along the layers: 112x112, 56x56, 28x28, 14x14 and 7x7 after the last convolution/pooling stage which has 512 feature maps.

This gives a feature vector passed to fc1 with dimension: 512x7x7.