WAGE 邮件内容

最新推荐文章于 2024-01-30 23:02:02 发布

许多天的rua

最新推荐文章于 2024-01-30 23:02:02 发布

阅读量226

点赞数

分类专栏：量化

本文链接：https://blog.csdn.net/qq_18053809/article/details/93890275

版权

量化专栏收录该内容

36 篇文章 0 订阅

订阅专栏

Issue 1: How to understand scale factor $\alpha$

In your paper, you proposed a scale factor $\alpha$ which is used to replace the batch-calculated scaling parameters in original Batch Normalization. I got two questions about the use of $\alpha$

$\alpha$ always equals to 1

in your paper, $\alpha$ can be calculated by functions below:
$\alpha = max(Shift(L_{min}/L),1) \tag 1$
$2^{round(log_{2}x)} \tag 2$
$L_{min} = \beta \sigma \tag 3$
Where $\beta >1$ and $\sigma (k) = 2^{1-k},k\in N_{+}$
$max(\sqrt {6/n_{in}},L_{min}) \tag 4$
So we can get that
$L_{min}/L \leq1 \tag 5$
$Shift(L_{min}/L) \leq 1 \tag 6$
so $\alpha = max(Shift(L_{min}/L) , 1) \equiv 1 \tag 7$
Obviously, $\alpha$ should not always equals to 1.Or
$a_{q}=Q_{A}(a)=Q(a/\alpha,k_{A}) \tag 8$ will never be scaled

Why use $\alpha$

According to (3),(4),(7), $\alpha$ is not relevant to current batch of data. So it’s not straight forward to understand why $\alpha$ can take the place of variance which is highly relevant to current batch data.

Issue 2: Why the mean of activation can be hypothesized as 0

It’s written in the paper that:

Besides, we hypothesize that batch outputs of each hidden layer approximately have zero-mean, then …

But it seems that there is no futher explaination about the hypothesis.

Issue 3: How to shift the curve

在这里插入图片描述
Clearly, Shift(.) can change the mean of the blue curve. But it’s not stright forward that why the red curve remains excatly the same shape as the blue curve.

许多天的rua

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
WAGE 邮件内容

Issue 1: How to understand scale factor α\alphaαIn your paper, you proposed a scale factor α\alphaα which is used to replace the batch-calculated scaling parameters in original Batch Normalization. I...
复制链接

扫一扫