广义平均（Generalized Mean, GeM）

__init__:

已于 2022-12-01 22:32:47 修改

阅读量3.5k

点赞数 5

文章标签：算法深度学习人工智能 deep learning

于 2022-12-01 14:45:27 首次发布

本文链接：https://blog.csdn.net/qq_45802280/article/details/128131943

版权

广义平均（Generalized Mean, GeM）

该操作来自图像检索方向的一篇文章，感兴趣的请移步：MultiGrain: a unified image embedding for classes and instances

文中采用了一个generalized mean pooling（GeM） layer，具体形式如下：
$e=\left[ \left( \frac{1}{|\Omega|} \sum_{u \in \Omega} { x_{cu}^p } \right)^{\frac{1}{p}} \right]_{c=1\dots C}$
其中， $x\in \mathbb{R}^{C \times H \times W}$ 代表网络提取到的tensor， $C$ 代表特征通道数， $u\in \Omega = \{1, \dots, H\} \times \{1, \dots, W\}$ 代表特征图中的一个"pixel"。

GeM计算了每个特征通道的广义平均，文中提到Setting this exponent as p > 1 increases the contrast of the pooled feature map and focuses on the salient features of the image。

具体来说，我们可以从它的形式上进行理解，首先令 $p = 1$ ，得到
$e=\left( \frac{1}{|\Omega|} \sum_{u \in \Omega} { x_{cu}} \right)_{c=1\dots C}$
可以看到，此时变为平均池化操作。

再令 $p=\infty$ 得到
$\begin{aligned} e &= \lim_{p \to \infty} { \left[ \left(\frac{1}{|\Omega|} \sum_{u \in \Omega} { x_{cu}^p } \right)^{\frac{1}{p}}\right]_{c=1\dots C} } \\ 这里令 t = \max{ [x_c] }_{c=1 \dots C}，则有 \\ e &=t \lim_{p \to \infty} { \left[ \left(\frac{1}{|\Omega|} \sum_{u \in \Omega} { \frac{x_{cu}^p }{t^p} }\right)^{\frac{1}{p}}\right]_{c=1\dots C} } \\ &=t \end{aligned}$
此时变为最大池化操作。