贝叶斯优化神经网络参数_贝叶斯超参数优化：神经网络，TensorFlow，相预测示例

最新推荐文章于 2024-07-30 17:39:55 发布

weixin_26752765

最新推荐文章于 2024-07-30 17:39:55 发布

阅读量8.1k

点赞数 7

文章标签：神经网络 tensorflow 人工智能深度学习机器学习

原文链接：https://towardsdatascience.com/bayesian-hyper-parameter-optimization-neural-networks-tensorflow-facies-prediction-example-f9c48d21f795

版权

本文介绍了如何运用贝叶斯优化方法来优化神经网络的超参数，通过TensorFlow实现，并以面预测为例展示了具体应用过程。

摘要由CSDN通过智能技术生成

贝叶斯优化神经网络参数

The purpose of this work is to optimize the neural network model hyper-parameters to estimate facies classes from well logs. I will include some codes in this paper but for a full jupyter notebook file, you can visit my Github.

这项工作的目的是优化神经网络模型的超参数，以从测井中估计相类。我将在本文中包括一些代码，但要获取完整的jupyter笔记本文件，您可以访问我的Github 。

note: if you are new in TensorFlow, its installation elaborated by Jeff Heaton.

注意：如果您是TensorFlow的新手，那么其安装将由Jeff Heaton进行详细说明。

In machine learning, model parameters can be divided into two main categories:1- Trainable parameters: such as weights in neural networks learned by training algorithms and the user does not interfere in the process,2- Hyper-parameters: users can set them before training operation such as learning rate or the number of dense layers in the model.Selecting the best hyper-parameters can be a tedious task if you try it by hand and it is almost impossible to find the best ones if you are dealing with more than two parameters.One way is to divide each parameter into a valid evenly range and then simply ask the computer to loop for the combination of parameters and calculate the results. The method is called Grid Search. Although it is done by machine, it will be a time-consuming process. Suppose you have 3 hyper-parameters with 10 possible values in each. In this approach, you will run 10³ neural network models (even with reasonable training datasets size, this task is huge).Another way is a random search approach. In fact, instead of using organized parameter searching, it will go through a random combination of parameters and look for the optimized ones. You may estimate that chance of success decreases to zero for larger hyper-parameter tunings.

在机器学习中，模型参数可以分为两大类： 1-可 训练参数 ：例如通过训练算法学习的神经网络权重，并且用户不会干扰过程； 2- 超参数：用户可以在设置参数之前训练操作，例如学习率或模型中的密集层数。如果您手动尝试选择最佳超参数可能是一项繁琐的任务，并且如果您要处理的参数过多，则几乎找不到最佳参数两个参数。一种方法是将每个参数平均划分为有效范围，然后简单地让计算机循环以获取参数组合并计算结果。该方法称为“ 网格搜索” 。尽管它是由机器完成的，但这将是一个耗时的过程。假设您有3个超参数，每个参数都有10个可能的值。在这种方法中，您将运行10³神经网络模型(即使具有合理的训练数据集大小，此任务也非常艰巨)。另一种方法是随机搜索方法。实际上，它会使用参数的随机组合并寻找经过优化的参数，而不是使用有组织的参数搜索。您可能会估计，对于较大的超参数调整，成功的机会将减少为零。

Scikit-Optimize, skopt, which we will use here to the facies estimation task, is a simple and efficient library to minimize expensive noisy black-box functions. Bayesian optimization constructs another model of search-space for parameters. Gaussian Process is one kind of these models. This generates an estimate of how model performance varies with hyper-parameter changes.

Scikit-Optimize skopt是一个简单而有效的库，可最大程度地减少昂贵的嘈杂黑盒功能，我们将在此处将其用于相估计任务。贝叶斯优化为参数构造了另一种搜索空间模型。高斯过程就是这些模型中的一种。这样就可以估算模型性能如何随超参数变化而变化。

As we see in the picture, the true objective function(red dash line) is surrounded by noise (red shade). The red line shows how scikit optimize sampled the search space for hyper-parameters(one dimension). Scikit-optimize fills the area between sample points with the Gaussian process (green line) and estimates true real fitness value. In the areas with low samples or lack(like the left side of the picture between two red samples), there is great uncertainty (big difference between red and green lines causing big uncertainty green shade area such as two standard deviations uncertainty).In this process, then we ask a new set of hyper-parameter to explore more search space. In the initial steps, it goes with sparse accuracy but in later iterations, it focuses on where sampling points are more with the good agreement of fitness function with true objective function(trough area in the graph).For more study, you may refer to Scikit Optimize documentation.

如图所示，真正的目标函数(红色虚线)被噪声(红色阴影)包围。红线显示scikit如何优化对超参数(一维)的搜索空间进行采样。 Scikit优化使用高斯过程(绿线)填充采样点之间的区域，并估算真实的实际适应度值。在样本较少或不足的区域(例如两个红色样本之间的图片左侧)，存在很大的不确定性(红色和绿色线条之间的差异很大，导致绿色阴影区域的不确定性较大，例如两个标准偏差不确定性)。过程，然后我们要求使用一组新的超参数来探索更多的搜索空间。在最初的步骤中，它具有稀疏的准确性，但是在以后的迭代中，它着重于采样点更多，适应度函数与真实目标函数(图中的谷值区域)具有良好一致性的地方。更多的研究，您可以参考Scikit优化文档。

Data ReviewThe Council Grove gas reservoir is located in Kansas. From this carbonate reservoir, nine wells are available. Facies are studied from core samples in every half foot and matched with logging data in well location. Feature variables include five from wireline log measurements and two geologic constraining variables that are derived from geologic knowledge. For more detail refer here. For the dataset, you may download it from here. The seven variables are:

数据审查 Council Grove储气库位于堪萨斯州。从该碳酸盐岩储层中可获得九口井。从每半英尺的岩心样本中研究岩相，并与井眼位置的测井数据相匹配。特征变量包括来自测井测井的五个变量和来自地质知识的两个地质约束变量。有关更多详细信息，请参见此处。对于数据集，您可以从此处下载。七个变量是：