深度学习算法调优

最新推荐文章于 2022-07-19 00:34:45 发布

weixin_48412526

最新推荐文章于 2022-07-19 00:34:45 发布

阅读量638

点赞数

分类专栏：深度学习

本文链接：https://blog.csdn.net/weixin_48412526/article/details/117284481

版权


	加速训练	权重：用较小的随机值进行赋值，可以加快梯度下降的收敛，也提高了模型获得更低训练错误的可能性。有效防止了梯度爆炸和梯度消失。随机的目的在于打破对称性，避免每一层的每个节点都在学习相同的东西。导致神经网络即使很深，其性能也不一定比线性分类器好如果用很大的随机值进行赋值，那么可能导致梯度下降或者爆炸 data vanishing and exploding gradients What that means is that when you're training a very deep network your derivatives or your slopes can sometimes get either very, very big or very, very small, maybe even exponentially small, and this makes training difficult, and we will have to spend a very long time to train our algorithme
	对数值型特征进行归一化	提升收敛速度梯度下降中，每次都会找到梯度最大的方向迭代更新模型参数。如果模型的特征的量纲不一致，那么寻求最优解的特征空间，就可看成椭圆形。

加速训练

权重：用较小的随机值进行赋值，可以加快梯度下降的收敛，也提高了模型获得更低训练错误的可能性。有效防止了梯度爆炸和梯度消失。

随机的目的在于打破对称性，避免每一层的每个节点都在学习相同的东西。导致神经网络即使很深，其性能也不一定比线性分类器好

如果用很大的随机值进行赋值，那么可能导致梯度下降或者爆炸

data vanishing and exploding gradients What that means is that when you're training a very deep network your derivatives or your slopes can sometimes get either very, very big or very, very small, maybe even exponentially small, and this makes training difficult, and we will have to spend a very long time to train our algorithme

对数值型特征进行归一化

提升收敛速度
梯度下降中，每次都会找到梯度最大的方向迭代更新模型参数。如果模型的特征的量纲不一致，那么寻求最优解的特征空间，就可看成椭圆形。

Train/Dev/Test - Cross Validation

Objectif: Pour évaluer la performance d’un modèle, pour sélectionner la meilleur modèle et les meilleurs hyperparamètres.

Training set: Train le modèle

Developpement set: Sélectionner le meilleur modèle avec meilleur performance

Test set: Tester le modèle sélectionné in order to get an unbiased estimate of how well your algorithm is doing.

High bias et High variance

High bias:

Utilise a plus profonde architecture de NN (#lager up up)

Entrainer NN plus longtement (#iteration up up)

Changer une autre architecture (ex: standard NN -> CNN)

Haigh variacne:

Plus de données d’apprentissage

Régularization

Optimization Methods - speed up training

Until now, you've always used Gradient Descent to update the parameters and minimize the cost. In this notebook, you will learn more advanced optimization methods that can speed up learning (accélérer l’apprentissage) and perhaps even get you to a better final value for the cost function (rendre un meilleur valeur pour la fonctionnalité de coût)Having a good optimization algorithm can be the difference between waiting days vs. just a few hours to get a good result.

Weight initialization

Initialisation aléatoire les poids à des valeurs petites nous permet d’accélérer l'apprentissage et augmenter les chances de descente de gradient convergeant vers une erreur petite, car initialisation aléatoire est pour rompre la symétrie et s'assurer que différentes unités cachées peuvent apprendre différentes choses, et les valeurs petites peuvent nous aident à éviter gradient disparition/explosion qui ralentit l'entraînement de l’algorithme

A well chosen initialization can:

Speed up the convergence of gradient descent
Increase the odds of gradient descent converging to a lower training (and generalization) error / Augmenter les chances de descente de gradient convergeant vers une erreur d’entraînement (et de généralisation) petite

Initialisation tous les poids à 0

In general, initializing all the weights to zero results in the network failing to break symmetry (briser la symétrie). This means that every neuron in each layer will learn the same thing, and the network is no more powerful than a linear classifier such as logistic regression.

What you should remember:

The weights W should be initialized randomly to break symmetry.
It is however okay to initialize the biases b[l] to zeros. Symmetry is still broken so long as W[l] is initialized randomly.

Initialisation tous les poids avec les valeurs grandes

The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when log(a[3])=log(0)log⁡(a[3])=log⁡(0), the loss goes to infinity.
Poor initialization can lead to vanishing/exploding gradients (gradients disparision / explosion), which also slows down the optimization algorithm. (ralentit l'algorithme d'optimisation) W变大意味着Z也增大，那么对于二分类问题，最后一层使用sigmoid函数的时候，Z很大的地方对应的导数很小，就会使得梯度下降特别慢
If you train this network longer you will see better results, but initializing with overly large random numbers slows down the optimization.

In summary:

Initializing weights to very large random values does not work well.
Hopefully intializing with small random values does better. The important question is: how small should be these random values be? Lets find out in the next part!

Gradients disparition / explosion

Gradients disparition / explosion est un problème de l'entraînement des réseaux de neurones, en particulier pour les réseaux de neurones très profonds,Cela signifie que lorsque vous vous entraînez sur un réseau très profond, vos dérivés peuvent parfois devenir très, très grands ou très très petits,et cela rend l'en

最低0.47元/天解锁文章

weixin_48412526

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
深度学习算法调优

加速训练权重：用较小的随机值进行赋值，可以加快梯度下降的收敛，也提高了模型获得更低训练错误的可能性。有效防止了梯度爆炸和梯度消失。随机的目的在于打破对称性，避免每一层的每个节点都在学习相同的东西。导致神经网络即使很深，其性能也不一定比线性分类器好如果用很大的随机值进行赋值，那么可能导致梯度下降或者爆炸 data vanishing and exploding gradients...
复制链接

扫一扫