吴恩达深度学习课后编程心得

1. 固定数据矩阵维度

X = (特征数,样本数m)
Y = (1, 样本数)
w = (n[L],n[L-1])
b = (n[L], 1)

2. 如何防止梯度消失或爆炸

1-4里面 assignment2里面关于深层神经网络的初始化,用了2-1里面讲的方法,即为了防止梯度消失或爆炸,可以使其权重除以输入层神经单元n[l-1]的个数(在初始化时),这样新得到的z就不会变化过大。具体的对于不同激活函数,人们研究了其对应的最优值:
这里写图片描述
最后一种又称Xavier initialization

在2-1的编程练习中,练习了0初始化,任意初始化,He初始化(上图第二个),得出的结论是:

Model Train accuracy Problem/Comment
3-layer NN with zeros initialization 50% fails to break symmetry
3-layer NN with large random initialization 83% too large weights
3-layer NN with He initialization 99% recommended method

推荐He初始化。
但文中为了支持这个结论,对任意初始化的W乘以10,如果去掉乘以10的操作,可以发现虽然任意初始化的初始cost比较高,但是收敛很快,精度也很高。但是还是推荐He初始化,因为它可以有效的防止梯度消失和爆炸问题。
He初始化:
这里写图片描述

任意初始化:
这里写图片描述

3. 正则化和dropout

L2正则化后可以看到,得到的最终参数比不正则化的参数要小,权重越小认为模型就越简单,因而可以防止过拟合。

Dropout注意事项:
1. Dropout is a regularization technique.
2. You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
3. Apply dropout both during forward and backward propagation.
4. During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.这一条是为了保持各层激活值的期望不变

model train accuracy test accuracy
3-layer NN without regularization 95% 91.5%
3-layer NN with L2-regularization 94% 93%
3-layer NN with dropout 93% 95%

可以看出,正则化降低了测试集的准确度,提高了测试准确度。这是因为正则化简化了模型,由于我们更关心测试准确度,所以正则化之后performance提高了

4. 优化算法

1. 动量梯度下降

How do you choose ββ ?
The larger the momentum β is, the smoother the update because the more we take the past gradients into account. But if β is too big, it could also smooth out the updates too much.
Common values for β range from 0.8 to 0.999. If you don’t feel inclined to tune this, β=0.9 is often a reasonable default.
Tuning the optimal β for your model might need trying several values to see what works best in term of reducing the value of the cost function J .
动量梯度的公式实际是:

vdW[l]=β
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值