10.16meeting

最新推荐文章于 2021-08-15 21:08:43 发布

许多天的rua

最新推荐文章于 2021-08-15 21:08:43 发布

阅读量116

点赞数

分类专栏：量化

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_18053809/article/details/102595459

版权

量化专栏收录该内容

36 篇文章 0 订阅

订阅专栏

1:NormProp

问题1：

the analysis supporting the proposed algorithm can only be applied to the input layer of a network. The canonical error bound (Proposition 1) presumes that the input features are zero mean and have a scaled identity covariance matrix. It isn’t at all clear that the inputs to later layers, which will be vectors of random variables having a scaled and shifted rectified Gaussian distribution, will have the proper covariance for the analysis to hold：网络整体输入具有0均值，1方差，但是后面的输入是否满足这样的条件。
在这里插入图片描述
如上述公式所述，两个常数系数是半波N(0,1)的均值方差（且不论对错），如何保证 $ReLU(\frac {\gamma_i(W_i*x)}{||W_i||_F}+\beta_i)$ 就是一个半波高斯？

问题2

论文里面提到的半波高斯均值方差好像有问题。
$E (W X) =$
在这里插入图片描述
$E((WX)^2)=$

和论文结果相悖。

2:Online Norm

增大batchsize 可以降低gradients的误差

在这里插入图片描述
上图比较基准是把所有数据放入一个batch计算的导数。bias的衡量标准是余弦相似度。
而提高相似度的关键，就是提高对整体数据mean和var的估计。

layer sacling 可以降低误差的传播

在这里插入图片描述
假设 $\varepsilon$ 。当有layer scaling时，误差传播会比较小。

3: 结合论文综述

预感我们少除了一个东西

Online BN 加了layer scaling
weight norm 除了weight 的二阶范数
norm prop除了weight 的二阶范数
weight stand 对weight 做了归一化处理，之后再加上BN或者GN

4：实际数据分布观察

许多天的rua

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
10.16meeting

1:NormProp问题1：the analysis supporting the proposed algorithm can only be applied to the input layer of a network. The canonical error bound (Proposition 1) presumes that the input features are zero...
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。