神经网络算法优化_训练神经网络的各种优化算法

最新推荐文章于 2023-06-25 19:36:22 发布

weixin_26632369

最新推荐文章于 2023-06-25 19:36:22 发布

阅读量1k

点赞数

文章标签：算法神经网络深度学习人工智能机器学习

原文链接：https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6

版权

神经网络算法优化

Many people may be using optimizers while training the neural network without knowing that the method is known as optimization. Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses.

许多人在训练神经网络时可能正在使用优化器，却不知道该方法称为优化。优化器是用于更改神经网络属性(例如权重和学习率)以减少损失的算法或方法。

How you should change your weights or learning rates of your neural network to reduce the losses is defined by the optimizers you use. Optimization algorithms or strategies are responsible for reducing the losses and to provide the most accurate results possible.

您使用的优化程序定义了如何更改神经网络的权重或学习率以减少损失。优化算法或策略负责减少损失并提供最准确的结果。

We’ll learn about different types of optimizers and their advantages:

我们将学习不同类型的优化器及其优势：

梯度下降 (Gradient Descent)

Gradient Descent is the most basic but most used optimization algorithm. It’s used heavily in linear regression and classification algorithms. Backpropagation in neural networks also uses a gradient descent algorithm.

梯度下降是最基本但使用最多的优化算法。它在线性回归和分类算法中大量使用。神经网络中的反向传播也使用梯度下降算法。

Gradient descent is a first-order optimization algorithm which is dependent on the first order derivative of a loss function. It calculates that which way the weights should be altered so that the function can reach a minima. Through backpropagation, the loss is transferred from one layer to another and the model’s parameters also known as weights are modified depending on the losses so that the loss can be minimized.

梯度下降是一阶优化算法，它取决于损失函数的一阶导数。它计算应该更改权重的哪种方式，以便函数可以达到最小值。通过反向传播，损耗将从一层转移到另一层，并且根据损耗修改模型的参数(也称为权重)，以便可以将损耗最小化。

algorithm: θ=θ−α⋅∇J(θ)

算法： θ=θ-α⋅∇J(θ)

Advantages:

优点：

Easy computation.
易于计算。
Easy to implement.
易于实现。
Easy to understand.
容易理解。

Disadvantages:

缺点：

May trap at local minima.
可能会陷于局部最小值。
Weights are changed after calculating gradient on the whole dataset. So, if the dataset is too large than this may take years to converge to the minima.
在对整个数据集计算梯度后，权重将更改。因此，如果数据集太大，则可能要花费数年才能收敛到最小值。
Requires large memory to calculate gradient on the whole dataset.
需要大内存才能计算整个数据集的梯度。

随机梯度下降 (Stochastic Gradient Descent)

It’s a variant of Gradient Descent. It tries to update the model’s parameters more frequently. In this, the model parameters are altered after computation of loss on each training example. So, if the dataset contains 1000 rows SGD will update the model parameters 1000 times in one cycle of dataset instead of one time as in Gradient Descent.

它是“梯度下降”的一种变体。它尝试更频繁地更新模型的参数。在这种情况下，在每个训练示例上计算损失后，都会更改模型参数。因此，如果数据集包含1000行，则SGD将在一个数据集周期中将模型参数更新1000次，而不是像Gradient Descent中那样更新一次。

θ=θ−α⋅∇J(θ;x(i);y(i)) , where {x(i) ,y(i)} are the training examples.

θ=θ-α⋅∇J(θ; x(i); y(i))，其中{x(i)，y(i)}是训练示例 。

As the model parameters are frequently updated parameters have high variance and fluctuations in loss functions at different intensities.

由于模型参数经常更新，因此参数在不同强度下具有高方差和损失函数的波动。

Advantages:

优点：

Frequent updates of model parameters hence, converges in less time.
因此，模型参数的频繁更新可以在更短的时间内收敛。
Requires less memory as no need to store values of

最低0.47元/天解锁文章

weixin_26632369

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
神经网络算法优化_训练神经网络的各种优化算法

神经网络算法优化Many people may be using optimizers while training the neural network without knowing that the method is known as optimization. Optimizers are algorithms or methods used to change the attribut...
复制链接

扫一扫