神经网络算法优化_训练神经网络的各种优化算法

神经网络算法优化

Many people may be using optimizers while training the neural network without knowing that the method is known as optimization. Optimizers are algorithms or methods used to change the attributes of your neural network such as weights and learning rate in order to reduce the losses.

许多人在训练神经网络时可能正在使用优化器,却不知道该方法称为优化。 优化器是用于更改神经网络属性(例如权重和学习率)以减少损失的算法或方法。

How you should change your weights or learning rates of your neural network to reduce the losses is defined by the optimizers you use. Optimization algorithms or strategies are responsible for reducing the losses and to provide the most accurate results possible.

您使用的优化程序定义了如何更改神经网络的权重或学习率以减少损失。 优化算法或策略负责减少损失并提供最准确的结果。

We’ll learn about different types of optimizers and their advantages:

我们将学习不同类型的优化器及其优势:

梯度下降 (Gradient Descent)

Gradient Descent is the most basic but most used optimization algorithm. It’s used heavily in linear regression and classification algorithms. Backpropagation in neural networks also uses a gradient descent algorithm.

梯度下降是最基本但使用最多的优化算法。 它在线性回归和分类算法中大量使用。 神经网络中的反向传播也使用梯度下降算法。

Gradient descent is a first-order optimization algorithm which is dependent on the first order derivative of a loss function. It calculates that which way the weights should be altered so that the function can reach a minima. Through backpropagation, the loss is transferred from one layer to another and the model’s parameters also known as weights are modified depending on the losses so that the loss can be minimized.

梯度下降是一阶优化算法,它取决于损失函数的一阶导数。 它计算应该更改权重的哪种方式,以便函数可以达到最小值。 通过反向传播,损耗将从一层转移到另一层,并且根据损耗修改模型的参数(也称为权重),以便可以将损耗最小化。

algorithm: θ=θ−α⋅∇J(θ)

算法: θ=θ-α⋅∇J(θ)

Advantages:

优点

  1. Easy computation.

    易于计算。
  2. Easy to implement.

    易于实现。
  3. Easy to understand.

    容易理解。

Disadvantages:

缺点

  1. May trap at local minima.

    可能会陷于局部最小值。
  2. Weights are changed after calculating gradient on the whole dataset. So, if the dataset is too large than this may take years to converge to the minima.

    在对整个数据集计算梯度后,权重将更改。 因此,如果数据集太大,则可能要花费数年才能收敛到最小值。
  3. Requires large memory to calculate gradient on the whole dataset.

    需要大内存才能计算整个数据集的梯度。

随机梯度下降 (Stochastic Gradient Descent)

It’s a variant of Gradient Descent. It tries to update the model’s parameters more frequently. In this, the model parameters are altered after computation of loss on each training example. So, if the dataset contains 1000 rows SGD will update the model parameters 1000 times in one cycle of dataset instead of one time as in Gradient Descent.

它是“梯度下降”的一种变体。 它尝试更频繁地更新模型的参数。 在这种情况下,在每个训练示例上计算损失后,都会更改模型参数。 因此,如果数据集包含1000行,则SGD将在一个数据集周期中将模型参数更新1000次,而不是像Gradient Descent中那样更新一次。

θ=θ−α⋅∇J(θ;x(i);y(i)) , where {x(i) ,y(i)} are the training examples.

θ=θ-α⋅∇J(θ; x(i); y(i)),其中{x(i),y(i)}是训练示例

As the model parameters are frequently updated parameters have high variance and fluctuations in loss functions at different intensities.

由于模型参数经常更新,因此参数在不同强度下具有高方差和损失函数的波动。

Advantages:

优点

  1. Frequent updates of model parameters hence, converges in less time.

    因此,模型参数的频繁更新可以在更短的时间内收敛。
  2. Requires less memory as no need to store values of
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值