1.1
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
1.1.1
论文Improving predictive inference under covariate shift by weighting the log-likelihood function 由于对hessian 矩阵非奇异不解,未完成。
2.1 WeightNormalization: ASimpleReparameterization toAccelerateTrainingofDeepNeuralNetworks
2.1.2Neural learning in structured parameter spaces
2.1.2.1 An Introduction to Multivariate Statistical Analysis (3ed)书
2.1.2.1.1 更高深度的概率论书籍