论文阅读 [TPAMI-2022] A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality i

论文阅读 [TPAMI-2022] A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

论文搜索(studyai.com)

搜索论文: A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks

搜索论文: http://www.studyai.com/search/whole-site/?q=A+Comprehensive+and+Modularized+Statistical+Framework+for+Gradient+Norm+Equality+in+Deep+Neural+Networks

关键字(Keywords)

Jacobian matrices; Explosions; Measurement; Biological neural networks; Probability; Libraries; Deep neural networks; free probability; gradient norm equality

机器学习

权重规范化; 特征归一化

摘要(Abstract)

The rapid development of deep neural networks (DNNs) in recent years can be attributed to the various techniques that address gradient explosion and vanishing.

近年来,深度神经网络(DNNs)的快速发展可归功于解决梯度爆炸和消失的各种技术。

In order to understand the principle behind these techniques and develop new methods, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing.

为了理解这些技术背后的原理并开发新的方法,人们提出了大量的衡量标准来识别没有梯度爆炸和消失的网络。

However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community.

然而,由于现代DNN中网络组件的多样性和复杂的串并联混合连接,现有指标的评估通常需要强大的假设、复杂的统计分析,或者应用领域有限,这限制了它们在社区中的传播。

In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual blocks.

在本文中,受梯度规范平等和动态等值线的启发,我们首先提出了一种新的度量方法,称为块动态等值线,它测量单个块中梯度规范的变化。

Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry.

由于我们的块状动态等值线是基于规范的,与原始动态等值线相比,其评估需要更弱的假设。

To mitigate challenging derivation, we propose a highly modularized statistical framework based on free probability.

为了缓解挑战性的推导,我们提出了一个基于自由概率的高度模块化的统计框架。

Our framework includes several key theorems to handle complex serial-parallel hybrid connections and a library to cover the diversity of network components.

我们的框架包括几个处理复杂的串行-并行混合连接的关键定理和一个涵盖网络组件多样性的库。

Besides, several sufficient conditions for prerequisites are provided.

此外,还提供了几个充分的先决条件。

Powered by our metric and framework, we analyze extensive initialization, normalization, and network structures.

在我们的衡量标准和框架的支持下,我们分析了广泛的初始化、规范化和网络结构。

We find that our Block Dynamical Isometry is a universal philosophy behind them.

我们发现,我们的Block Dynamical Isometry是它们背后的一个普遍哲学。

Then, we improve some existing methods based on our analysis, including an activation function selection strategy for initialization techniques, a new configuration for weight normalization, a depth-aware way to derive coefficients in SeLU, and initialization/weight normalization in DenseNet.

然后,我们根据我们的分析改进了一些现有的方法,包括初始化技术的激活函数选择策略,权重归一化的新配置,SeLU中推导系数的深度感知方式,以及DenseNet中的初始化/权重归一化。

Moreover, we propose a novel normalization technique named second moment normalization, which has 30 percent fewer computation overhead than batch normalization without accuracy loss and has better performance under micro batch size.

此外,我们提出了一种新的归一化技术,名为第二时刻归一化,它比批量归一化的计算开销少30%,而且没有精度损失,在微批量规模下有更好的性能。

Last but not least, our conclusions and methods are evidenced by extensive experiments on multiple models over CIFAR-10 and ImageNet.

最后但并非最不重要的是,我们的结论和方法通过在CIFAR-10和ImageNet上的多个模型的广泛实验得到了证明。

作者(Authors)

[‘Zhaodong Chen’, ‘Lei Deng’, ‘Bangyan Wang’, ‘Guoqi Li’, ‘Yuan Xie’]

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值