吴恩达deeplearning Lesson2 Improving Deep Neural Networks Week1

最新推荐文章于 2019-07-07 00:38:35 发布

pu扑朔迷离

最新推荐文章于 2019-07-07 00:38:35 发布

阅读量288

点赞数 1

分类专栏： Tensorflow 文章标签：吴恩达 deep learning L2W1

本文链接：https://blog.csdn.net/bluehatihati/article/details/89679869

版权

Tensorflow 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

吴恩达deeplearning Lesson：Improving Deep Neural Networks:Hyperparameter tuning, Regularization and Optimization Week1

初始化问题
- 0初始化
- He/Xavier 初始化
正则化
梯度检验
python 用到的函数

初始化问题

0初始化：Zeros initialization – setting initialization = “zeros” in the input argument.
随机初始化：Random initialization – setting initialization = “random” in the input argument. This initializes the weights to large random values.
HE初始化：He initialization – setting initialization = “he” in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015.

0初始化

这里不能使用0初始化，因为它未能打破对称性，训练结果如下：
在这里插入图片描述
性能非常差，成本并没有真正降低，算法的性能也不比随机猜测好多少。为什么?让我们看看预测和决策边界的细节:

predictions_train = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0]]
predictions_test = [[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
所有预测是全0！
一般来说，初始化所有的权值为零，会导致网络的对称性（不对称失败）。这意味着每一层中的每一个神经元都将学习相同的东西。
注意：这里的偏置b是可以置为0的，不会影响不对称性的大局，只要W别全置0就行。

He/Xavier 初始化

this is similar except Xavier initialization uses a scaling factor for the weights W[l]W[l] of sqrt(1./layers_dims[l-1]) where He initialization would use sqrt(2./layers_dims[l-1]）
He 说的是随机初始化矩阵再乘以 sqrt(2./layers_dims[l-1]）
Xavier 是乘以sqrt(1./layers_dims[l-1])

def initialize_parameters_he(layers_dims):
    """
    Arguments:
    layer_dims -- python array (list) containing the size of each layer.
    
    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
                    b1 -- bias vector of shape (layers_dims[1], 1)
                    ...
                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
                    bL -- bias vector of shape (layers_dims[L], 1)
    """
    
    np.random.seed(3)
    parameters = {}
    L = len(layers_dims) - 1 # integer representing the number of layers
     
    for l in range(1, L + 1):
        ### START CODE HERE ### (≈ 2 lines of code)
        parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
        ### END CODE HERE ###
        
    return parameters

上面是He初始化代码
准确率比随机初始化（大初始化值）提高了将近9个百分点。

结论：He随机初始化好，如果不用he，随机初始化也不要整太大的值

正则化

L2正则化与dropout，都很有效客服过拟合，具体不表。

梯度检验

在实施backprop时，有一个测试叫做梯度检验，它的作用是确保backprop正确实施。因为有时候，你虽然写下了这些方程式，却不能100%确定，执行backprop的所有细节都是正确的。

$\frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$
类比下式： $\frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$

实施过程：
1、将所有参数向量化（W、b、dW、db）
2、给指定位置的 $\theta$ 加上一个很小的误差（10^-7量级）
3、重新算损失函数并放入 $J^{+}_i$
4、同理算 $J^{-}_i$
5、计算差值 $\frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$
6、计算梯度误差评价值 $\frac {\| grad - gradapprox \|_2}{\| grad \|_2 + \| gradapprox \|_2 } \tag{3}$