吴恩达深度学习课后编程心得

最新推荐文章于 2024-06-19 21:37:28 发布

寻梦梦飞扬

最新推荐文章于 2024-06-19 21:37:28 发布

阅读量613

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/weixin_41043240/article/details/79560030

版权

1. 固定数据矩阵维度

X = (特征数，样本数m)
Y = （1，样本数）
w = （n[L]，n[L-1]）
b = (n[L], 1)

2. 如何防止梯度消失或爆炸

1-4里面 assignment2里面关于深层神经网络的初始化，用了2-1里面讲的方法，即为了防止梯度消失或爆炸，可以使其权重除以输入层神经单元n[l-1]的个数(在初始化时)，这样新得到的z就不会变化过大。具体的对于不同激活函数，人们研究了其对应的最优值：
这里写图片描述
最后一种又称Xavier initialization

在2-1的编程练习中，练习了0初始化，任意初始化，He初始化(上图第二个)，得出的结论是：

Model	Train accuracy	Problem/Comment
3-layer NN with zeros initialization	50%	fails to break symmetry
3-layer NN with large random initialization	83%	too large weights
3-layer NN with He initialization	99%	recommended method

推荐He初始化。
但文中为了支持这个结论，对任意初始化的W乘以10，如果去掉乘以10的操作，可以发现虽然任意初始化的初始cost比较高，但是收敛很快，精度也很高。但是还是推荐He初始化，因为它可以有效的防止梯度消失和爆炸问题。
He初始化：
这里写图片描述

任意初始化：
这里写图片描述

3. 正则化和dropout

L2正则化后可以看到，得到的最终参数比不正则化的参数要小，权重越小认为模型就越简单，因而可以防止过拟合。

Dropout注意事项：
1. Dropout is a regularization technique.
2. You only use dropout during training. Don’t use dropout (randomly eliminate nodes) during test time.
3. Apply dropout both during forward and backward propagation.
4. During training time, divide each dropout layer by keep_prob to keep the same expected value for the activations. For example, if keep_prob is 0.5, then we will on average shut down half the nodes, so the output will be scaled by 0.5 since only the remaining half are contributing to the solution. Dividing by 0.5 is equivalent to multiplying by 2. Hence, the output now has the same expected value. You can check that this works even when keep_prob is other values than 0.5.这一条是为了保持各层激活值的期望不变

model	train accuracy	test accuracy
3-layer NN without regularization	95%	91.5%
3-layer NN with L2-regularization	94%	93%
3-layer NN with dropout	93%	95%

可以看出，正则化降低了测试集的准确度，提高了测试准确度。这是因为正则化简化了模型，由于我们更关心测试准确度，所以正则化之后performance提高了

4. 优化算法

1. 动量梯度下降

How do you choose ββ ?
The larger the momentum β is, the smoother the update because the more we take the past gradients into account. But if β is too big, it could also smooth out the updates too much.
Common values for β range from 0.8 to 0.999. If you don’t feel inclined to tune this, β=0.9 is often a reasonable default.
Tuning the optimal β for your model might need trying several values to see what works best in term of reducing the value of the cost function J .
动量梯度的公式实际是：

v d W [l] = β *

最低0.47元/天解锁文章

寻梦梦飞扬

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
吴恩达深度学习课后编程心得

1. 固定数据矩阵维度X = (特征数，样本数m) Y = （1，样本数） w = （n[L]，n[L-1]） b = (n[L], 1)2. 如何防止梯度消失或爆炸1-4里面 assignment2里面关于深层神经网络的初始化，用了2-1里面讲的方法，即为了防止梯度消失或爆炸，可以使其权重除以输入层神经单元n[l-1]的个数(在初始化时)，这样新得到的z就不会变化过大。...
复制链接

扫一扫

专栏目录