Why Padding?
The main benefits of padding are the following:
-
It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the “same” convolution, in which the height/width is exactly preserved after one layer.
-
It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image.
There are two ways to padding:
- Valid convolutions: no padding. p = 0 p = 0 p=0
- Same convolutions: pad so that output size is the same as the input size. p = f − 1 2 ( s t r i d e = 1 ) p = \frac{f-1}{2}(stride = 1) p=2f−1(stride=1)
Why Pooling Layer?
Pooling allows features to shift relative to each other resulting in robust matching of features even in the presence of small distortions. There are also many other benefits of doing pooling, like:
- Reduces the spatial dimension of the feature map.
- And hence also reducing the number of parameters high up the processing hierarchy. This simplifies the overall model complexity.
Though sum and max pooling are a bit outdated as mostly, nowadays, strided convolution is used. The purpose of strided convolution is to skip some areas during the convolution operation thereby resulting in:
- Efficient convolution operation.
- And reduced spatial dimension of the output.
Why Convolutions?
-
“Shared Weights”:
A feature detector(such as a vertical edge detector) that is useful in one part of the image is probably useful in another part of the image. -
“Sparsity of the connections” :
In each layer, each output value depends only on a small number of inputs.