Log-Sum-Exp Pooling
Papers
- From Image-level to Pixel-level Labeling with Convolutional Networks
- ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
LSE Pooling
在阅读这两篇文章之前,我印象中常用的 Pooling 有 Max Pooling 和 Average Pooling,而这两篇文章中用到了 Log-Sum-Exp Pooling,其定义为:
x p = 1 r ⋅ l o g [ 1 S ⋅ ∑ ( i , j ) ∈ S e x p ( r ⋅ x i j ) ] x_p=\frac{1}{r}\cdot log[\frac{1}{S}\cdot \sum_{(i,j)\in\mathbf{S}}exp(r\cdot x_{ij})] xp=r1⋅log[S1⋅(i,j)∈S∑exp(r⋅xij)]
其中, x i j x_{ij} xij 表示在 ( i , j ) (i,j) (i,j)的激活值, ( i , j ) (i,j) (i,j) 是池化区域 S \mathbf{S} S 的一点并且 S = s × s S=s\times s S=s×s 是池化区域 S \mathbf{S} S 总点数, r r r 是超参数。
在第一篇文章中,作者提到 LSE Pooling 的作用为:
The hyper-parameter r controls how smooth one wants the approximation to be: high r values implies having an effect similar to the max, very low values will have an effect similar to the score averaging. The advantage of this aggregation is that pixels having similar scores will have a similar weight in the training procedure, r controlling this notion of “similarity”.
在第二篇文章中,作者提到 LSE Pooling 的作用为:
By controlling the hyper-parameter, r, the pooled value ranges from the maximum in S (when r → ∞ r\to\infin r→∞) to average ( r → 0 r\to0 r→0).
一个直观的理解可以看下图:
数学证明
作为一个严谨的大学僧,肯定不会止步于直观理解啦,数学证明走起!
在证明前,不妨把式子简化一点:
x p = 1 r ⋅ l o g [ 1 n ⋅ ∑ i = 1 n e x p ( r ⋅ x i ) ] x_p=\frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)] xp=r1⋅log[n1⋅i=1∑nexp(r⋅xi)]
证明 r → 0 r\to0 r→0 相当于 Average Pooling
首先,我们需要借助均值不等式:
a 1 + a 2 + . . . + a n n ≥ a 1 ⋅ a 2 . . . a n n \frac{a_1+a_2+...+a_n}{n}\ge\sqrt[n]{a_1\cdot a_2...a_n} na1+a2+...+an≥na1⋅a2...an
当且仅当
a
1
=
a
2
=
.
.
.
=
a
n
a_1=a_2=...=a_n
a1=a2=...=an 时取等号。
x
p
=
1
r
⋅
l
o
g
[
1
n
⋅
∑
i
=
1
n
e
x
p
(
r
⋅
x
i
)
]
=
l
o
g
(
1
n
⋅
∑
i
=
1
n
e
r
⋅
x
i
)
1
r
\begin{aligned} x_p &= \frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)] \\ &= log(\frac{1}{n}\cdot\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \end{aligned}
xp=r1⋅log[n1⋅i=1∑nexp(r⋅xi)]=log(n1⋅i=1∑ner⋅xi)r1
应用均值不等式:
(
1
n
⋅
∑
i
=
1
n
e
r
⋅
x
i
)
1
r
≥
(
∏
i
=
1
n
e
r
⋅
x
i
)
1
n
⋅
1
r
=
(
∏
i
=
1
n
e
x
i
)
1
n
\begin{aligned} (\frac{1}{n}\cdot \sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} &\ge (\prod_{i=1}^{n} e^{r\cdot x_i})^{\frac{1}{n}\cdot\frac{1}{r}} \\ &= (\prod_{i=1}^{n} e^{x_i})^{\frac{1}{n}} \end{aligned}
(n1⋅i=1∑ner⋅xi)r1≥(i=1∏ner⋅xi)n1⋅r1=(i=1∏nexi)n1
当
r
=
0
r = 0
r=0 时,可取等号。代入整个式子:
x
p
=
l
o
g
(
1
n
⋅
∑
i
=
1
n
e
r
⋅
x
i
)
1
r
≥
l
o
g
(
∏
i
=
1
n
e
x
i
)
1
n
=
1
n
∑
i
=
1
n
x
i
\begin{aligned} x_p &= log(\frac{1}{n}\cdot\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \\ &\ge log(\prod_{i=1}^{n} e^{x_i})^{\frac{1}{n}} \\ &= \frac{1}{n}\sum_{i=1}^{n}x_i \end{aligned}
xp=log(n1⋅i=1∑ner⋅xi)r1≥log(i=1∏nexi)n1=n1i=1∑nxi
于是
r
→
0
r\to0
r→0 相当于 Average Pooling 得证。
证明 r → ∞ r\to \infin r→∞ 相当于 Max Pooling
x p = 1 r ⋅ l o g [ 1 n ⋅ ∑ i = 1 n e x p ( r ⋅ x i ) ] = l o g ( ∑ i = 1 n e r ⋅ x i ) 1 r − 1 r ⋅ l o g ( n ) \begin{aligned} x_p &= \frac{1}{r}\cdot log[\frac{1}{n}\cdot \sum_{i=1}^{n}exp(r\cdot x_i)] \\ &= log(\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} - \frac{1}{r}\cdot log(n) \end{aligned} xp=r1⋅log[n1⋅i=1∑nexp(r⋅xi)]=log(i=1∑ner⋅xi)r1−r1⋅log(n)
因为 r > 0 r > 0 r>0,我们有:
m
a
x
(
e
r
⋅
x
i
)
1
r
≤
(
∑
i
=
1
n
e
r
⋅
x
i
)
1
r
≤
[
n
⋅
m
a
x
(
e
r
⋅
x
i
)
]
1
r
\begin{aligned} max(e^{r\cdot x_i})^{\frac{1}{r}} \le (\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \le [n\cdot max(e^{r\cdot x_i})]^{\frac{1}{r}} \end{aligned}
max(er⋅xi)r1≤(i=1∑ner⋅xi)r1≤[n⋅max(er⋅xi)]r1
代入整个式子,得:
m
a
x
(
x
i
)
≤
l
o
g
(
∑
i
=
1
n
e
r
⋅
x
i
)
1
r
≤
1
r
⋅
l
o
g
(
n
)
+
m
a
x
(
x
i
)
max(x_i)\le log(\sum_{i=1}^{n}e^{r\cdot x_i})^{\frac{1}{r}} \le \frac{1}{r}\cdot log(n)+max(x_i)
max(xi)≤log(i=1∑ner⋅xi)r1≤r1⋅log(n)+max(xi)
当
r
→
∞
r\to\infin
r→∞ 时有:
1
r
⋅
l
o
g
(
n
)
→
0
\frac{1}{r}\cdot log(n)\to0
r1⋅log(n)→0,故
r
→
∞
r \to\infin
r→∞ 相当于 Max Pooling 得证