Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel
Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron
Courville, Yoshua Bengio, Simon Lacoste-Julien, “A Closer Look at Memorization
in Deep Networks”, ICML, 2017
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail
Smelyanskiy, Ping Tak Peter Tang, “On Large-Batch Training for Deep Learning:
Generalization Gap and Sharp Minima”, ICLR, 2017
https://blog.csdn.net/zhangboshen/article/details/72853121
Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo
Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo
Zecchina, “Entropy-SGD: Biasing Gradient Descent Into Wide Valleys”, ICLR,
2017
Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, Nathan Srebro,
Exploring Generalization in Deep Learning, NIPS, 2017
Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio, Sharp Minima Can
Generalize For Deep Nets, PMLR, 2017
Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha
Sohl-Dickstein, Sensitivity and Generalization in Neural Networks: an Empirical
Study, ICLR, 2018
Visualizing the Loss Landscape of Neural Nets
深度 | 最优解的平坦度与鲁棒性,我们该如何度量模型的泛化能力
http://www.myzaker.com/article/5a66f6c2d1f1499f6b000077