1.初始化 initialization
A well chosen initialization can:
- Speed up the convergence of gradient descent
- Increase the odds of gradient descent converging to a lower training (and generalization) error
要点笔记:
以三层神经网络为例
-
Zeros initialization
-
Random initialization
This initializes the weights to large random values.parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10 parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
随机初始化的数值过大时,效果不好
那么怎么确认改初始化多大的值呢?
看下面这个: He initialization
- He initialization
(This initializes the weights to random values scaled according to a paper by He et al., 2015.)
与 "Xavier initialization"类似,只是 Xavier initialization用了np.sqrt(1/layers_dims[l-1]
而He initialization 用的是np.sqrt(2/layers_dims[l-1]
parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2/layers_dims[l-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
W需要随机初始化,b不怎么需要.
2.正则化 Regularization
避免过拟合的常规方法是 L2 regularization.它修改了代价函数,
from:
J = − 1 m ∑ i = 1 m ( y ( i ) log ( a [ L ] ( i ) ) + ( 1 − y ( i ) ) log ( 1 − a [ L ] ( i ) ) ) (1) J = -\frac{1}{m} \sum\limits_{i = 1}^{m} \large{(}\small y^{(i)}\log\left(a^{[L](i)}\right) + (1-y^{(i)})\log\left(1- a^{[L](i)}\right) \large{)} \tag{1}