大范围的位移和缩放,还是需要fully connected layer,通过学习大量不同位置和形状比例的物体,来支持物体位置和缩放的识别。
参考资料:
https://www.quora.com/How-is-a-convolutional-neural-network-able-to-learn-invariant-features
https://stats.stackexchange.com/questions/208936/what-is-translation-invariance-in-computer-vision-and-convolutional-netral-netwo
1、The pooling regimes make convolution process invariant to translation, rotation and shifting. Most widely used one is max-pooling. You take the highest activation to propagate at the interest region so called receptive field. Even a images are relatively a little shifted, since we are looking for highest activation, we are able to capture commonalities between images.
2、For scale invariance, up to my knowledge, no way other than providing different scales of images to network or learned network filters might be applied at different scales.
3、Other forms of invariances are built up artificially by rotating, mirroring and scaling up the training examples. This is because it is important to see training sets from different points of view in order to generalize better.