人工神经网络笔记(二)梯度爆炸与消失、批量归一化、过拟合问题
- Gradient exploding and vanishing
- Mini-batch issue
- Over-fitting issue
1、Gradient exploding and vanishing
1.1 模型训练过程
STEP0: 预设超参数
STEP1: 初始化模型参数
STEP2: 重复训练过程(次数为epoch)
STEP3: 保存模型
1.2 梯度爆炸与消失问题
在上图这个简单的神经网络模型中,由链式法则推导, ∂ l ∂ w 1 = ∂ l ∂ h l ( ∂ h l ∂ u l ∂ u l ∂ h l − 1 ) . . . . . . ( ∂ h 1 ∂ u 1 ∂ u 1 ∂ w 1 ) = ∂ l ∂ h l ( g ′ ( u l ) w l ) . . . . . . ( g ′ ( u 1 ) x ) \frac{\partial l}{\partial w_1}=\frac{\partial l}{\partial h_l}(\frac{\partial h_l}{\partial u_l}\frac{\partial u_l}{\partial h_{l-1}})......(\frac{\partial h_1}{\partial u_1}\frac{\partial u_1}{\partial w_1})=\frac{\partial l}{\partial h_l}(g^{'}(u_l)w_l)......(g^{'}(u_1)x) ∂w1∂l=∂hl∂l(∂ul∂hl∂hl−1∂ul)......(∂u1∂h1∂w1∂u1)=∂hl∂l(g′(ul)wl)......(g′(u1)x)
若 g ′ ( u i ) w i > 1 g^{'}(u_i)w_i>1 g′(ui)wi>1恒成立,那么 ∣ ∂ l ∂ w 1 ∣ > > 1 |\frac{\partial l}{\partial w_1}| >>1 ∣∂w1∂l∣>>