keras设置超参数_使用keras和ray tune进行超参数调整

keras设置超参数

In my previous article, I had explained how to build a small and nimble image classifier and what are the advantages of having variable input dimensions in a convolutional neural network. However, after going through the model building code and training routine, one can ask questions such as:

在我的前一篇文章中,我已经解释了如何构建一个小巧的图像分类器,以及在卷积神经网络中具有可变输入尺寸的优点。 但是,在完成了模型构建代码和培训例程后,您可以提出以下问题:

  1. How to choose the number of layers in a neural network?

    如何选择神经网络中的层数?
  2. How to choose the optimal number of units/filters in each layer?

    如何在每一层中选择最佳的单位/滤波器数量?
  3. What would be the best data augmentation strategy for my dataset?

    什么是我的数据集的最佳数据扩充策略?
  4. What batch size and learning rate would be appropriate?

    合适的批量大小和学习速度是多少?

Building or training a neural network involves figuring out the answers to the above questions. You may have an intuition for CNNs, for example, as we go deeper the number of filters in each layer should increase as the neural network learns to extract more and more complex features built on simpler features extracted in the earlier layers. However, there might be a more optimal model (for your dataset) with a lesser number of parameters that might outperform the model that you have designed based on your intuition.

建立或训练神经网络涉及找出以上问题的答案。 例如,您可能对CNN有一个直觉,随着我们越深入,随着神经网络学会提取基于在较早层中提取的较简单特征而构建的越来越复杂的特征,每层中的过滤器数量应会增加。 但是,可能存在一个更优化的模型(针对您的数据集),而参数数量较少,这些参数可能会胜过根据您的直觉设计的模型。

In this article, I’ll explain what these parameters are and how do they affect the training of a machine learning model. I’ll explain how do machine learning engineers choose these parameters and how can we automate this process using a simple mathematical concept. I’ll be starting with the same model architecture from my previous article and will be modifying it to make most of the training and architectural parameters tunable.

在本文中,我将解释这些参数是什么以及它们如何影响机器学习模型的训练。 我将说明机器学习工程师如何选择这些参数,以及如何使用简单的数学概念来自动化该过程。 我将从上一篇文章中的相同模型架构开始,并将对其进行修改以使大多数训练和架构参数可调整。

什么是超参数? (What is a hyperparameter?)

A hyperparameter is a training parameter set by a machine learning engineer before training the model. These parameters are not learned by the machine learning model during the training process. Examples include batch size, learning rate, number of layers and corresponding units, etc. The parameters that are learned by the machine learning model, from the data, during the training process are called model parameters.

超参数是机器学习工程师在训练模型之前设置的训练参数。 在训练过程中,机器学习模型不会学习这些参数。 示例包括批处理大小,学习率,层数和相应的单位等。机器学习模型在训练过程中从数据中学习的参数称为模型参数。

Why hyperparameters are important?

为什么超参数很重要?

When training a machine learning model the main goal is to get the best performing model that has the best performance on the validation set. We focus on the validation set as it represents how well the model generalizes (performance on unseen data). Hyperparameters form the premise of the training process. For eg., if the learning rate is set too high then the model might never converge to the minima as it will take too large steps after every iteration. On the other hand, if the learning rate is set too low it will take a long time for the model to reach the minima.

在训练机器学习模型时,主要目标是获得在验证集上具有最佳性能的最佳性能模型。 我们将重点放在验证集上,因为它代表了模型的概括程度(针对看不见的数据的性能)。 超参数构成训练过程的前提。 例如,如果将学习速率设置得太高,则该模型可能永远不会收敛到最小值,因为在每次迭代之后它将采取太大的步骤。 另一方面,如果学习率设置得太低,则模型要花很长时间才能达到最小值。

Image for post
Machine learning pipelines before and after hyperparameter tuning
超参数调整之前和之后的机器学习管道

Why is it difficult to choose hyperparameters?

为什么很难选择超参数?

Finding the right learning rate involves choosing a value, training a model, evaluating it, and trying again. Every dataset is unique on its own and with so many parameters to choose from, a beginner can easily get confused. Machine learning engineers who have had many failed training attempts eventually develop an intuition of how a hyperparameter affects a given training process. However, that intuition doesn’t generalize to all the datasets and a new use case usually needs some experiments before settling with convincing hyperparameters. And yet, it’s possible to miss out on best or optimal parameters.

要找到合适的学习率,需要选择一个值,训练一个模型,对其进行评估,然后重试。 每个数据集都是唯一的,并且具有众多参数可供选择,因此初学者很容易感到困惑。 经历了许多失败的训练尝试的机器学习工程师最终会直观地了解超参数如何影响给定的训练过程。 但是,这种直觉并不能推广到所有数据集,在使用令人信服的超参数之前,通常需要一些新的用例进行实验。 但是,有可能错过最佳或最佳参数。

We want to choose the hyperparameters such that, after the training process is complete, we have a model that is both precise and generalized. When dealing with neural networks evaluating the objective function can be very expensive as training takes a long time and trying out different hyperparameters manually may take days. This becomes a difficult task to do by hand.

我们希望选择超参数,以便在训练过程完成后,我们得到一个既精确又通用的模型。 在处理神经网络时,评估目标函数可能会非常昂贵,因为训练需要很长时间,而手动尝试不同的超参数可能需要几天的时间。 手工完成这项工作变得困难。

超参数调整/优化 (Hyperparameter tuning/optimization)

Hyperparameter tuning can be considered as a black-box optimization problem where we try to find a minimum of a function f(x) without knowing its analytical form. It is also called derivative-free optimization as we do not know its analytical form and no derivatives can be computed to minimize f(x), and hence techniques like gradient descent cannot be used.

超参数调整可以看作是黑盒优化问题,我们尝试在不知道函数f(x)解析形式的情况下找到其最小值。 它也称为无导数优化,因为我们不知道其解析形式,也无法计算导数来最小化f(x),因此无法使用诸如梯度下降之类的技术。

Few well-known techniques for hyperparameter tuning include grid search, random search, differential evolution, and bayesian optimization. Grid search and random search perform slightly better than manual tuning as we set up a grid of hyperparameters and run training and evaluation cycles on the parameters which are systematically or randomly chosen from the grid respectively.

很少有众所周知的超参数调整技术,包括网格搜索,随机搜索,差分演化和贝叶斯优化。 网格搜索和随机搜索的性能略好于手动调整,因为我们建立了一个超参数网格,并对分别从网格中系统或随机选择的参数运行训练和评估周期。

However, grid and random search are relatively inefficient as they do not choose the next set of hyperparameters based on previous results. On the other hand, Differential evolution is a type of evolutionary algorithm where the initial set of best performing hyperparameter configurations (which are one of the randomly initialized individuals) are chosen to produce more hyperparameters (offsprings). The new generation of hyperparameters (offsprings) are more likely to perform better as they inherit good traits of their parents and the population improves over time (generation after generation). Read more about this concept in this beautiful and practical tutorial here.

但是,网格和随机搜索效率相对较低,因为它们不会根据先前的结果选择下一组超参数。 另一方面,差分进化是一种进化算法,其中选择最佳性能的超参数配置的初始集合(其是随机初始化的个体之一)以产生更多的超参数(后代) 。 新一代的超参数(后代)表现出更好的性能,因为它们继承了父母的良好特征,并且种群随着时间的推移(世代相传)而有所改善。 了解更多关于这个概念,在这个美丽而实用的教程在这里

Though differential evolution works, it takes a long time and still doesn’t quite take informed steps or it isn’t aware of what are we trying to achieve/optimize. Bayesian

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值