NLopt详细介绍

最新推荐文章于 2024-04-29 23:22:32 发布

双圣树下的阿尔达

最新推荐文章于 2024-04-29 23:22:32 发布

阅读量1w

点赞数 6

分类专栏： nlopt 文章标签： nlopt

原文链接：https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/

版权

nlopt 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

NLopt 介绍（NLopt Introduction）

在本手册的这一章中，我们首先概述NLopt解决的优化问题，不同类型的优化算法之间的主要区别，并评论以NLopt要求的形式转换各种问题的方法。我们还将描述NLopt的背景和目标

1 优化问题（Optimization problems）

NLopt解决了以下形式的一般非线性优化问题
$\min_{\mathbf{x}\in\mathbb{R}^n} f(\mathbf{x})$
其中 f 是目标函数，x表示n个 优化参数（也称为设计变量或决策参数）。
此问题可以有选择地受到边界约束（也称为box约束）：
$lb_i \leq x_i \leq ub_i$ $i=1,\ldots,n$
给定下限lb和上限ub（对于部分或完全无约束的问题分别为-∞和/或+∞）。如果 $lb_i = ub_i$ ，则该参数将被消除。
一个参数也可以选择具有m个非线性不等式约束（有时称为非线性规划问题）：
$fc_i(\mathbf{x}) \leq 0$ $i=1,\ldots,m$
对于约束函数fc_i(x)。一些NLopt算法还支持p个非线性等式约束:

$h_i(\mathbf{x}) = 0$ $i=1,\ldots,p$

更一般地，几个约束可以一次组合成一个返回向量值结果的函数。
满足所有边界约束、不等式和等式约束的点x称为可行点，所有可行点的集合为可行区域
【注意】：在本简介中，我们遵循通常的数学惯例，即让我们的索引（序号）从1开始。然而，在C编程语言中，NLopt遵循C的从零开始的约定（例如，约束 i 为从0到m -1）。

2 全局优化与局部优化（Global versus local optimization）

NLopt包含尝试对目标进行全局或局部优化的算法。

2.1 全局优化（Global optimization）

全局优化是找到在整个可行区域内使目标f（x）最小的可行点x的问题。
通常，这可能是一个非常困难的问题，随着参数数量n的增加，难度将成倍增加。
实际上，除非知道有关 f 的特殊信息，否则甚至无法确定是否找到了真正的全局最优值，因为可能会出现 f 值的突然下降，且这个值隐藏在您尚未查找的参数空间中。
但是，NLopt包含几种全局优化算法，如果维度n不太大（if the dimension n is not too large），它们可以很好且合理地处理优化问题（work well on reasonably well-behaved problems）。

2.2 局部优化（Local optimization）

局部优化是一个容易得多的问题。
它的目标是找到一个仅是局部最小值的可行点x：f（x）小于或等于所有附近可行点的f值（可行区域与x的至少某个小邻域的交点(the intersection of the feasible region with at least some small neighborhood of x)不知道该怎么翻译，这句话也不重要）。
通常，非线性优化问题可能有很多局部最小值，算法确定的最小值位置通常取决于用户提供给算法的起点。
另一方面，即使在非常高维的问题中（尤其是使用基于梯度的算法），局部优化算法通常也可以快速定位局部最小值。
在某种程度上令人迷惑的是，一个算法，如果可以保证从任何可行的起点找到局部最小值，则称为全局收敛。

2.3 凸函数优化的问题

在特殊类别的凸优化问题中，目标和不等式约束函数都是凸函数（并且等式约束是仿射变换，或者在任何情况下都具有凸水平集），因此只有一个局部最小值f，因此局部优化方法能找到一个全局最优解。
但是，可能会有不止一个点x产生相同的最小值f（x），最优点形成（凸）可行区域的凸子集。
通常，凸问题是由特殊分析形式的功能引起的，例如线性规划问题，半定规划，二次规划等，并且可以使用专门的技术非常有效地解决这些问题。
NLopt仅包含不假定凸性的常规方法；
如果您遇到了凸的问题，最好使用其他软件包，例如Stanford 的 CVX 软件包

3 基于梯度算法与无导数算法（Gradient-based versus derivative-free algorithms）

3.1 梯度算法

尤其是对于局部优化，最高效的算法通常要求用户提供梯度∇ ˚F来得到任何给定点X的˚F（X）值（以及类似地用于任何非线性约束）。
这利用了以下事实：原则上，几乎可以使用很少的额外计算量（最坏的情况是与第二次评估f相同）在计算f的值同时计算梯度。
如果用一个快速的方法来计算的导数F不明显，人们通常认为计算∇ F使用了伴随方法，或可能使用自动区分工具。
基于梯度的方法对于非常高维的参数空间（例如，成千上万个n或更大的n）的有效优化至关重要。

3.2 无导数算法

另一方面，如果将目标函数作为复杂程序提供，则计算梯度有时会很麻烦且不便。
如果f是不可导的（或更糟的是不连续的），可能是不可计算的。
在这种情况下，通常使用无导数算法进行优化更简单，该算法仅要求用户提供给定点x的函数值 f（x）。
这种方法通常必须对f进行至少几倍n点的评估（Such methods typically must evaluate f for at least several-times-n points），因此，当n比较小或中等（最多数百个）时，最好使用它们。

3.3 NLopt对于上述算法的处理

NLopt提供具有公共接口的无导数算法和基于梯度的算法。

3.4 注意事项

如果你发现自己通过有限差分近似来计算梯度，比如（在一维中）
$\partial f/\partial x \approx [f(x+\Delta x) - f(x -\Delta x)]/2\Delta x$
你应该使用一个无导数的算法。
有限差分近似不仅昂贵（使用中心差异对梯度进行2n函数评估），而且它们也很容易受到舍入误差的影响，除非你非常小心。
在另一方面，有限差分近似是非常有用的检查你的分析梯度计算是否正确的方式。
这是一个好主意，因为根据我的经验，梯度代码很容易产生bug。
不正确的梯度将会引起基于梯度的优化算法的奇怪问题。

4 优化问题的等效公式（Equivalent formulations of optimization problems）

There are many equivalent ways to formulate a given optimization problem, even within the framework defined above, and finding the best formulation can be something of an art.
即使在上述定义的框架内，也存在许多等效的方法来公式化给定的优化问题，而寻找最佳公式可以被称为是一种艺术

从一个简单的例子开始，假设你想要找到函数g(x)的最大值。其实这可以等效为求函数 f(x)=−g(x) 的最小值。因此，除了求最小例程外，NLopt没必要去提供一个求最大的例程，用户只需翻转符号即可进行最大化。但是，为方便起见，NLopt提供了一个最大化接口（该接口在内部为您执行必要的符号翻转）。
一个更有趣的示例是极大极小优化问题，其中目标函数 f(x) 是N个函数的最大值：
$f(\mathbf{x}) = \max \{ g_1(\mathbf{x}), g_2(\mathbf{x}), \ldots, g_N(\mathbf{x}) \}$
当然，您可以将此目标函数直接传递给NLopt，但是这里存在一个问题： f(x) 并非在所有地方都是可微的（假设 g_k 是可微的， f(x) 只是分段可微的）。这不仅意味着最有效的基于梯度的算法不适用，而且甚至无导数算法也可能会大大减慢速度。取而代之的是，可以通过添加虚拟变量t和N个新的非线性约束（以及任何其他约束）来将相同的问题表述为可微问题：
$\min_{x\in\mathbb{R}^n, t\in\mathbb{R}} t$
$g_k(\mathbf{x}) - t \leq 0,k=1,2,\ldots,N$
这完全解决了相同的最大最小问题，但是现在我们有了一个可微分的目标和约束条件。假设每个 g_k 都可以被微分，注意，在这种情况下，目标函数本身就是无聊的线性函数 t，所有有趣的东西都在约束中。这是许多非线性编程问题的典型特征。
另一个例子是最小化函数 g(x) 的绝对值 $|g(\mathbf{x})|$ ，这等价于最小化 $\max \{ g(\mathbf{x}), -g(\mathbf{x}) \}$ 。但是，这可以像上面的minimax示例中一样转换为可微分的非线性约束。

5 等式约束（Equality constraints）

假设您具有一个或多个非线性等式约束
$h_i(\mathbf{x}) = 0$ .
理论上，每个等式约束可以用两个不等式约束表示。 $h_i(\mathbf{x}) \leq 0$ 和 $-h_i(\mathbf{x}) \leq 0$ 。因此您可能会认为任何可以处理不等式约束的代码都可以自动处理等式约束。但是在实践中，这是不正确的——如果您尝试将等式约束表示为一对非线性不等式约束，则某些算法将无法收敛。
等式约束有时需要特殊处理，因为它们会减小可行区域的维度，而不仅仅是不等式约束的大小。当前只有某些NLopt算法（AUGLAG，COBYLA和ISRES）支持非线性等式约束。

5.1 消除（Elimination）

有时，可以通过消除过程来处理相等约束：您可以使用等式约束，根据其他未知参数，显式求解某些参数。然后将这个未知参数作为优化参数传入NLopt。

Sometimes, it is possible to handle equality constraints by an elimination procedure: you use the equality constraint to explicitly solve for some parameters in terms of other unknown parameters, and then only pass the latter as optimization parameters to NLopt.

举个例子，假设你有一个线性等式约束：
$A\mathbf{x} = \mathbf{b}$
对于某个常数矩阵 A ，给定这些方程的一个特解 ξ 和一个矩阵 N ，其中 N 的列决定了 A 的零空间。则可以用以下形式表示这些线性方程的所有可能解：

for some constant matrix A. Given a particular solution ξ of these equations and a matrix N whose columns are a basis for the nullspace of A, one can express all possible solutions of these linear equations in the form:

$\mathbf{x} = \boldsymbol{\xi} + N\mathbf{z}$

对于未知向量 z 。随后，您可以将 z 作为优化参数传递给NLopt而不是 x ，从而消除等式约束。
【注意】在矩阵 N 的数值计算零空间时需要注意，因为舍入误差会倾向于使矩阵 A 的奇异值比应有的少。一种技术标准就是去计算矩阵 A 的奇异值分解（SVD, Singular value decomposition）并且将所有小于阈值的奇异值设置为0

5.2 惩罚函数（Penalty functions）

解决等式约束（以及不等式约束）的另一种流行方法是在目标函数中包含某种惩罚函数（又称为罚函数），该函数惩罚违反约束的x值。这种标准技术被称为增强拉格朗日方法，该方法的一种变体是在NLopt的AUGLAG算法中实现的。
（对于不等式约束，惩罚概念的一种变体是一种障碍方法：这只是一种惩罚，随着接近约束而发散，这迫使优化保持在可行范围内。）

6 终止条件（Termination conditions）

对于任何优化算法，必须提供一些终止条件，以指定算法何时停止。理想情况下，当发现最佳值在某个所需的公差范围内时，算法应停止运行。然而，实际上，由于真正的最优值是不会提前知道的，因此人们对解决方案中的误差使用启发式估计，而不是实际误差。
NLopt为用户提供了几种不同的终止条件的选择。对于任何给定的问题，你无需指定所有的终止条件。您应该只设置所需的条件。当满足第一个指定终止条件（即，您指定的最弱条件是最重要的条件）时，NLopt将终止。
NLopt支持的终止条件如下：

6.1 函数值和参数公差（Function value and parameter tolerances）

首先，您可以在函数值上指定分数公差ftol_rel和绝对公差ftol_abs。理想情况下，与确切的最小函数值相比，它们将是最大分数和绝对误差，但这是不可能的，因为最小值未知。取而代之的是，大多数算法将其实现为函数值从一个迭代到下一个迭代（或类似迭代）减小Δf的容差：如果|Δf| / | f | 小于ftol_rel或|Δf| 小于ftol_abs，算法就会停止。
同样，您可以在参数x上指定分数公差xtol_rel和绝对公差xtol_abs_i 。同样，实际误差 Δx与（未知的）最小值相比得到误差是不可能的，所以在实际中Δx通常是测量x从一次迭代到下一次迭代的变化，或者是搜索区域的直径等。然后算法停止于|Δx_i| < xtol_abs_i 或 |Δx_i|/|x_i| < xtol_rel。
Note: generally, you can only ask for about half as many decimal places in the xtol as in the ftol. The reason is that, near the minimum, $\Delta f \approx f'' (\Delta x)^2 / 2$ from the Taylor expansion, and so (assuming $\approx 1$ for simplicity) a change in x by 10^–7 gives a change in f by around 10^–14. In particular, this means that it is generally hopeless to request an xtol_rel much smaller than the square root of machine precision.

In most cases, the fractional tolerance (tol_rel) is the most useful one to specify, because it is independent of any absolute scale factors or units. Absolute tolerance (tol_abs) is mainly useful if you think that the minimum function value or parameters might occur at or close to zero.

If you don’t want to use a particular tolerance termination, you can just set that tolerance to zero and it will be ignored.

Stopping function value

Another termination test that NLopt supports is that you can tell the optimization to stop when the objective function value f(x) reaches some specified value, stopval, for any feasible point x.

This termination test is especially useful when comparing algorithms for a given problem. After running one algorithm for a long time to find the minimum to the desired accuracy, you can ask how many iterations algorithms require to obtain the optimum to the same accuracy or to some better accuracy.

Bounds on function evaluations and wall-clock time

Finally, one can also set a termination condition by specifying a maximum number of function evaluations (maxeval) or a maximum wall-clock time (maxtime). That is, the simulation terminates when the number of function evaluations reaches maxeval, or when the total elapsed time exceeds some specified maxtime.

These termination conditions are useful if you want to ensure that the algorithm gives you some answer in a reasonable amount of time, even if it is not absolutely optimal, and are also useful ways to control global optimization.

Note that these are only rough maximums; a given algorithm may exceed the specified maximum time or number of function evaluations slightly.

Termination tests for global optimization

In general, deciding when to terminate a global optimization algorithm is a rather difficult problem, because there is no way to be certain (without special information about a particular f) that you have truly reached the global minimum, or even come close. You never know when there might be a much smaller value of the objective function lurking in some tiny corner of the feasible region.

Because of this, the most reasonable termination criterion for global optimization problems seems to be setting bounds on the run time. That is, set an upper bound on how long you are willing to wait for an answer, and use that as the maximum run time. Another strategy is to start with a shorter run time, and repeatedly double the run time until the answer stops changing to your satisfaction. (Although there can be no guarantee that increasing the time further won’t lead to a much better answer, there’s not much you can do about it.)

I would advise you not to use function-value (ftol) or parameter tolerances (xtol) in global optimization. I made a half-hearted attempt to implement these tests in the various global-optimization algorithms, but it doesn’t seem like there is any really satisfactory way to go about this, and I can’t claim that my choices were especially compelling.

For the MLSL algorithm, you need to set the ftol and xtol parameters of the local optimization algorithm control the tolerances of the local searches, not of the global search; you should definitely set these, lest the algorithm spend an excessive amount of time trying to run local searches to machine precision.

Background and goals of NLopt

NLopt was started because some of the students in our group needed to use an optimization algorithm for a nonlinear problem, but it wasn’t clear which algorithm would work best (or work at all). One student started by downloading one implementation from the Web, figuring out how to plug it into her Matlab program, getting it to work, only to find that it didn’t converge very quickly so she needed another one, and so on… Then another student went through the same process, only his program was in C and he needed to get the algorithms to work with that language, and he obtained a different set of algorithms. It quickly became apparent that the duplication of effort was untenable, and the considerable labor required to decipher each new subroutine, figure out how to build it, figure out how to bridge the gap from one language (e.g. Fortran) to another (e.g. Matlab or C), and so on, was so substantial that it was hard to justify trying more than one or two. Even though the first two algorithms tried might converge poorly, or might be severely limited in the types of constraints they could handle, or have other limitations, effective experimentation was impractical.

Instead, since I had some experience in wrapping C and Fortran routines and making them callable from C and Matlab and other languages, it made sense to put together a common wrapper interface for a few of the more promising of the free/open-source subroutines I could find online. Soon, it became clear that I wanted at least one decent algorithm in each major category (constrained/unconstrained, global/local, gradient-based/derivative-free, bound/nonlinear constraints), but there wasn’t always free code available. Reading the literature turned up tantalizing hints of algorithms that were claimed to be very powerful, but again had no free code. And some algorithms had free code, but only in a language like Matlab that was impractical to use in stand-alone fashion from C. So, in addition to wrapping existing code, I began to write my own implementations of various algorithms that struck my interest or seemed to fill a need.

Initially, my plan was to handle only bound constraints, and leave general nonlinear constraints to others—who needs such things? That attitude lasted until we found that we needed to solve a 10,000-dimensional minimax-type problem, which seemed intractable unless gradient-based algorithms could be brought to bear…as discussed above, this requires nonlinear constraints to make the problem differentiable. After some reading, I came across the MMA algorithm, which turned out to be easy to implement (300 lines of C), and worked beautifully (at least for my problem), so I expanded the NLopt interface to support nonlinear constraints.

Overall, I’ve found that this has been surprisingly fun. Every once in a while, I come across a new algorithm to try, and now that I’ve implemented a few algorithms and built up a certain amount of infrastructure, it is relatively easy to add new ones (much more so than when I first started out). So, I expect that NLopt will continue to grow, albeit perhaps more slowly now that it seems to include decent algorithms for a wide variety of problems.