题目:A CLOSER LOOK AT DEEP LEARNING HEURISTICS: LEARNING RATE RESTARTS, WARMUP AND DISTILLATION
ABSTRACT:
词:1.heuristics 启发式的 2.knowledge distillation 知识升华 3.underpinnings 基础 4.aid 援助 5.empirical 经验 6.linear interpolation and visualizations with dimensionality reduction 线性差值和降维可视化 7.mode connectivity and canonical correlation analysis 模式连通性和规范相关分析 8.hypothesize 假设 9.annealing 退火 10.viz.,即
段落:about mode connectivity and canonical correlation analysis
we explore knowledge distillation and learning rate heuristics of (cosine) restarts and warmup using mode connectivity and CCA
1 INTRODUCTION
词:1.commonplace 平凡 2. buttressed 支撑 3.intuitive 直觉的 4.ingredient 成分 5.step-decay 逐步衰减 6.mimic 模仿 7.piecewise 分段的 8.invariances 不变性 9.permutation and scaling 排列和缩放
短语:1.out of the need 出于需要
2. EMPIRICAL TOOLS
公式和证明。。。。。。
RESULTS:
Our empirical analysis sheds light on these heuristics and suggests that: (a) the reasons often quoted for the success of cosine annealing are not evidenced in practice; (b) that the effect of learning rate warmup is to prevent the deeper layers from creating training instability; and (c) that the latent knowledge shared by the teacher is primarily disbursed in the deeper layers.