dwSun的博客

一个正经的IT工作者/非著名人工智能表演艺术家/非著名业余摄影爱好者

1506.01186-Cyclical Learning Rates for Training Neural Networks

1506.01186-Cyclical Learning Rates for Training Neural Networks 1506.01186-Cyclical Learning Rates for Training Neural Networks 论文中提出...

2018-07-30 21:13:00

阅读数 39

评论数 0

1503.02531-Distilling the Knowledge in a Neural Network.md

1503.02531-Distilling the Knowledge in a Neural Network.md 原来交叉熵还有一个tempature,这个tempature有如下的定义: \[ q_i=\frac{e^{z_i/T}}{\sum_j{e^{z_...

2018-07-11 23:06:00

阅读数 27

评论数 0

1804.03235-Large scale distributed neural network training through online distillation.md

1804.03235-Large scale distributed neural network training through online distillation.md 现有分布式模型训练的模式 分布式SGD 并行SGD: 大规模训练中,一次的最长时...

2018-07-05 23:40:00

阅读数 44

评论数 0

提示
确定要删除当前文章?
取消 删除