最大池化层的作用

最新推荐文章于 2024-08-09 11:30:29 发布

长飞哥

最新推荐文章于 2024-08-09 11:30:29 发布

阅读量1.8w

点赞数 2

分类专栏：机器学习文章标签： Max pooling 最大池化池化层平移不变性

本文链接：https://blog.csdn.net/tigerda/article/details/78800552

版权

机器学习专栏收录该内容

27 篇文章 0 订阅

订阅专栏

●Theano中对Max-pooling的解释

Max-pooling is useful in vision for two reasons:

1、By eliminating non-maximal values, it reduces computation for upper layers.

2、It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

网上很多帖子千篇一律翻译为

最大池化技术用于视觉问题有两个原因：

（1）通过消除非极大值，降低了上层的计算复杂度。

（2）它提供了平移不变形的一种形式。想象一下，一个卷积层级联一个max-pooling层为了理解这种不变性，我们假设把最大池化层和一个卷积层结合起来，对于单个像素，有8个变换的方向（上、下、左、右、左上、左下、右上、右下），如果最大层是在2*2的窗口上面实现，这8个可能的配置中，有3个可以准确的产生和卷积层相同的结果。如果窗口变成3*3，则产生精确结果的概率变成了5／８。

没有给解释，单看翻译很难理解，而且这里5/8可能是不对的，解释看这里

https://www.quora.com/How-can-I-understand-this-point-about-max-pooling-in-Theano

Samir’s answer is great, and confirms my suspicions that the authors are incorrect. Here’s my logic as to a more complete answer for the 3x3 case.

Let’s say we have a 3x3 grid with the maximum pixel in the middle. For simplicity, set e=1 and all other values to 0.

a b c

d e f

g h i

The image can be translated by 1 pixel in 8 different directions: up, down, left, right, and the diagonals. If we assume the 2x2 max pooling box contains the pixels {a, b, d, e}, then 3 of the 8 possible translations will keep the max pixel in the 2x2 box, hence the 3/8. In fact, no matter where this 2x2 pooling box is, the max value will always be in a corner of the box, so 3/8 of the translations will keep it in the box.

在2×2pooling中，比如

a b

d e

如果a是这4个中的最大值，则不管a朝右，朝下，还是右下方向移动，最后这个区域pooling的结果都一样。所以是3/8

However, what happens when we have a 3x3 max pooling box? Well, there’s different places this max value can be found relative to the 3x3 box. If it’s found in the center, what Samir said applies, and any of the 8 translations will keep the max value in the same pooling box.

However, if it’s found in one of the corners of the 3x3 box (where a, c, g, and i are), then 3/8 of the translations will keep it in the same box. If it’s one of the edges (b, d, f, h), 5/8 of the translations will.

So if we assume that the positioning of the 3x3 box is totally random with respect to the max pixel, which it should, then the probability of the max pixel staying in the same box after translation should be P(same box | center_max) * P(center_max) + P(same box | center_corner) * P(center_corner) + P(same box | center_edge) * P(center_edge) = 1*(1/9) + (3/8)*(4/9) + (5/8)*(4/9) = 5/9.

当是3×3pooling时，情况不一样，比如

a b c

d e f

g h i

当最大值在a或g或c或i位置时，他们平移的方向只有3个方向（9个点中有4个点是这种情况），这时是3/8。而最大值在b，d，f或h时，他们平移的方向有5个方向（9个点中有4个点是这种情况），这时是5/8，当最大值在e时，平移方向有8个方向（9个点中有1个点是这种情况）。这时是8/8。