【图书】没有免费的午餐定理

本文链接：https://blog.csdn.net/wq6qeg88/article/details/137816524

本文综述了DavidWolpert关于无免费午餐定理的重要贡献，探讨了优化、搜索和监督学习领域中算法性能的平均性。论文回顾了早期概念，分析了定理对这些领域的影响，并提出了关键问题和研究进展。

摘要由CSDN通过智能技术生成

No Free Lunch Theorem: A Review

没有免费的午餐定理:回顾

Abstract The “No Free Lunch” theorem states that, averaged over all optimization problems, without re-sampling, all optimization algorithms perform equally well.

Optimization, search, and supervised learning are the areas that have benefited more from this important theoretical concept. Formulation of the initial No Free Lunch theorem, very soon, gave rise to a number of research works which resulted in a suite of theorems that define an entire research field with significant results in other scientific areas where successfully exploring a search space is an essential and critical task. The objective of this paper is to go through the main research efforts that contributed to this research field, reveal the main issues, and disclose those points that are helpful in understanding the hypotheses, the restrictions, or even the inability of applying No Free Lunch theorems.

“天下没有免费的午餐”定理表明，对所有优化问题进行平均，不进行重新抽样，所有优化算法的性能都一样好。

优化、搜索和监督学习是从这一重要理论概念中受益更多的领域。最初的“没有免费的午餐”定理的公式很快引发了一系列研究工作，这些研究工作产生了一系列定理，这些定理定义了整个研究领域，并在其他科学领域取得了重大成果，在这些领域中，成功探索搜索空间是一项重要而关键的任务。本文的目的是通过对这一研究领域做出贡献的主要研究工作，揭示主要问题，并揭示那些有助于理解假设，限制，甚至是应用无免费午餐定理的无能的点。

1 Introduction

1 介绍

Optimization problems occurring in various fields of science, computing, and engineering depend on the number of parameters, the size of the solution space and, mainly, on the objective function whose definition is critical as it largely determines the level of difficulty of the problem. Hence, defining and solving an optimization problem is sometimes an extremely difficult and demanding task. Researchers from various fields have been involved in solving optimization problems either as this constitutes part of their main research or because the problem they face can be tackled by an optimization one. The research efforts on this matter have permitted the elaboration of numerous methods and techniques, built on solid mathematical concepts, whose application produced significantly good results.

在科学、计算和工程的各个领域中出现的优化问题取决于参数的数量、解空间的大小，主要取决于目标函数，目标函数的定义至关重要，因为它在很大程度上决定了问题的难度。因此，定义和解决优化问题有时是一项极其困难和苛刻的任务。来自不同领域的研究人员已经参与解决优化问题，因为这是他们主要研究的一部分，或者因为他们面临的问题可以通过优化问题来解决。对这一问题的研究工作使许多建立在坚实的数学概念基础上的方法和技术得以阐述，这些方法和技术的应用产生了显著的良好结果。

However, contrary to any opposite claim, none of these methods has proven to be successful to all types of the problems it was applied. This argument has been the objective of important theoretical work carried out by David Wolpert which gave rise to the well-known No Free Lunch (NFL) theorem. Briefly, the NFL theorem states that: “averaged over all optimization problems, without re-sampling all optimization algorithms perform equally well.” Besides optimization, the NFL theorem has been successfully used to tackle important theoretical issues pertaining supervised learning in machine learning systems. Actually, the NFL theorem has become a suite of theorems which has given significant results in various scientific fields where searching for some optimal solution is an important issue.

然而，与任何相反的说法相反，这些方法中没有一种被证明对所有类型的问题都是成功的。这一论点一直是大卫·沃尔珀特(David Wolpert)进行的重要理论工作的目标，该工作产生了著名的“没有免费的午餐”(NFL)定理。简单地说，NFL定理指出:“对所有优化问题进行平均，无需重新采样，所有优化算法的表现都一样好。”除了优化之外，NFL定理已经成功地用于解决机器学习系统中与监督学习相关的重要理论问题。实际上，NFL定理已经成为一组定理，在寻找最优解是一个重要问题的各个科学领域中都给出了重要的结果。

The NFL theorems constitute an important theoretic development which marked the limits of the range of successful application for a number of search, optimization, and supervised learning algorithms. At the same time the formulation of these theorems has provoked controversial discussions [4, 36, 44, 45] regarding the possibility to invent and effectively use general purpose algorithms in various fields where only a limited view of the real-world problem exists.

NFL定理是一个重要的理论发展，它标志着许多搜索、优化和监督学习算法成功应用的范围。与此同时，这些定理的表述引发了有争议的讨论[4,36,44,45]，涉及在现实世界问题存在有限观点的各个领域中发明和有效使用通用算法的可能性。

In this paper we aim at presenting a review on the most sound research work published by several researchers on this matter including its impact on the most important fields, that is, optimization and supervised learning. Other existing fields of interest such as user interface design [24], network calculus [8] are worth of merit but they are out of the scope of this review. The emphasis of this review will be, mainly, on the critical questions which promoted the development of NFL theorems as well as on the issues that proved to be important: namely for (a) optimization, (b) searching, and (c) supervised learning.

在本文中，我们的目的是对几位研究人员发表的关于这一问题的最完善的研究工作进行综述，包括其对最重要领域的影响，即优化和监督学习。其他现有的兴趣领域，如用户界面设计[24]、网络演算[8]也有价值，但它们超出了本文的讨论范围。这篇综述的重点将主要放在促进NFL定理发展的关键问题上，以及被证明是重要的问题:即(a)优化，(b)搜索和(c)监督学习。

The rest of this paper is structured as follows. Section 2 provides a review of the early concepts and constructs that underpinned the definition of the NFL theorems. Section 3 covers the main research efforts of Wolpert establishing NFL for optimization and search. In Section 4 we survey the more recent work of Wolpert which clarifies older concepts while offering some new results on this field. Next, Section 5 is dedicated to the main research carried out by several researchers on NFL for optimization and evolutionary algorithms. Part of the research surveyed concerns the cases where NFL theorems do not apply and researchers have proved the existence of “Free Lunches.” In Section 6 we describe the main research efforts on NFL theorems for supervised learning. The paper ends in Section 7 with a synopsis and some concluding remarks.

本文的其余部分结构如下。第2节回顾了支持NFL定理定义的早期概念和结构。第3节介绍了Wolpert建立NFL进行优化和搜索的主要研究工作。在第4节中，我们调查了Wolpert最近的工作，这些工作澄清了旧的概念，同时提供了该领域的一些新结果。接下来，第5节将介绍几位研究者在NFL优化和进化算法方面所做的主要研究。调查的部分研究涉及NFL定理不适用的情况，研究人员已经证明了“免费午餐”的存在。在第6节中，我们描述了用于监督学习的NFL定理的主要研究工作。第七节是论文的结束语和摘要。

2 Early Developments

2 早期发展

As noted by David Wolpert [56], the first attempt to underline the limits of inductive inference was made by the Scottish philosopher David Hume in 1740 in his seminal work “A treatise of human nature” [26, 27]. Hume wrote that:

正如David Wolpert所指出的[56]，第一次尝试强调归纳推理的局限性是由苏格兰哲学家David Hume于1740年在他的开创性著作“A treatise of human nature”中提出的[26,27]。休谟写道:

Even after the observation of the frequent conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience.

即使在观察到物体的频繁结合之后，我们也没有理由对超出我们已有经验的任何物体作出任何推断。

In the machine learning context this can be stated as follows:

在机器学习上下文中，这可以表述如下:

It is not reasonable to believe that the generalization error of a classifier-generalizer on test data drawn off the training set correlates with its performance on the training set itself by simply considering a priori information on the real world.

简单地考虑真实世界的先验信息，认为分类器-泛化器对训练集上的测试数据的泛化误差与其在训练集上的性能相关是不合理的。

Wolpert based his theoretical work on earlier developments elaborated in his paper “On the connection between in-sample testing and generalization error” [55].

In this paper the generalization error is taken as the off-training set (OTS) error and the question addressed concerns its correlation with the error produced using in-sample testing. Moreover, Wolpert tackles the question of how “. . . to take into account the probability distribution of target functions in the real world” as any theory of generalization is irrelevant concerning its applicability on real- world problems if it does not tackle the previous problem. Some, but not all, of the important issues arising in this paper are:

Wolpert将他的理论工作建立在他的论文“样本内测试与泛化误差之间的联系”[55]中阐述的早期发展的基础上。

本文将泛化误差作为偏离训练集误差，研究其与样本内测试误差的相关性。此外，沃尔珀特还处理了如何“……“考虑目标函数在现实世界中的概率分布”，因为任何泛化理论如果不解决前面的问题，就与它在现实世界问题中的适用性无关。本文中出现的一些(但不是全部)重要问题是:

(a) “Can one prove inductive inference from first principles?” In other words, given the performance of a learning algorithm on the training data set is it possible to obtain information on its ability to provide an exact representation of the target function for examples outside the data set? (b) If one cannot answer the previous question then, what are the assumptions on the distribution of real-world data (the target function) can help with the generalization for training algorithms, such as back-propagation, which aim to minimize the error on the training data? (c) Is there a mathematical basis of estimating when over-training occurs and proceed in modifying the learning algorithm in order to bound the effects of such over-training? (d) Is it possible to express in mathematical terms the ability of a training set to faithfully represent the distribution over the entire data space? (e) What are the hypotheses under which non-parametric statistics techniques, such as cross-validations, which are designed to choose between learning algorithms, succeed to diminish the generalization error?

(a)“一个人能证明从第一性原理得出的归纳推理吗?”换句话说，给定学习算法在训练数据集上的性能，是否有可能获得有关其为数据集外的示例提供目标函数的精确表示的能力的信息?(b)如果不能回答前面的问题，那么对真实世界数据(目标函数)分布的哪些假设可以帮助训练算法的泛化，例如旨在最小化训练数据误差的反向传播?(c)是否有数学基础来估计何时发生过度训练，并着手修改学习算法以限制这种过度训练的影响?(d)是否有可能用数学术语来表示训练集忠实地表示整个数据空间上的分布的能力?(e)设计用于在学习算法之间进行选择的非参数统计技术(如交叉验证)在哪些假设下能够成功地减少泛化误差?

In addressing these matters, the formalism proposed seems to extend the classical Bayesian formalism using the hypothesis function, i.e., the distribution of the data set as learned by the generalizer. The mathematical formalism adopted proposes a way to match the degree to which the distribution derived by the learning algorithm matches the distribution of the training data and it can be used to tackle various generalization issues such as over-training and minimum number of parameters for the model. From another point of view this formalism is proposed with the aim to express in mathematical terms the assumptions made by a generalizer so that the used model best fits the training set representing the real world. As a result the elaboration of important theoretical proofs proposes a solid basis for tackling several issues in machine learning and gives rise to the development of concepts such as the NFL theorems.

在解决这些问题时，所提出的形式似乎扩展了经典贝叶斯形式，使用假设函数，即由泛化器学习的数据集的分布。所采用的数学形式提出了一种使学习算法得到的分布与训练数据分布的匹配程度相匹配的方法，可用于解决模型的过度训练和最小参数数等各种泛化问题。从另一个角度来看，提出这种形式主义的目的是用数学术语表达由泛化器做出的假设，以便使用的模型最适合代表真实世界的训练集。因此，对重要理论证明的阐述为解决机器学习中的几个问题提供了坚实的基础，并引发了诸如NFL定理等概念的发展。

The first and foremost contributions of Wolpert concerning NFL theorems were presented in the papers [56, 57]. In this set of two papers, namely: (i) “The lack of a priori distinctions between learning algorithms” and (ii) “The existence of a priori distinctions between learning algorithms,”

Wolpert关于NFL定理的最早和最重要的贡献是在论文中提出的[56,57]。在这两篇论文中，即:(i)“学习算法之间缺乏先验区别”和(ii)“学习算法之间存在先验区别”，

Wolpert develops his theory and formulates the NFL theorems. In the former, he discusses the hypothesis that given any two learning algorithms one cannot claim having any prior information that these algorithms are distinct as far as the performance of these algorithms on specific class of problems is concerned. In the latter paper, Wolpert unfolds the arguments concerning the inverse assumption, i.e., there are prior distinctions regarding the performance of any two algorithms. These two papers deal with supervised learning but the theoretical constructs were applied to multiple domains where two different algorithms compete as for which performs better for a class of problems and associated error functions.

Wolpert发展了他的理论并提出了NFL定理。在前者中，他讨论了一个假设，即给定任何两种学习算法，就这些算法在特定类别问题上的表现而言，人们不能声称有任何先验信息，这些算法是不同的。在后一篇论文中，Wolpert展开了关于逆假设的论点，即任何两种算法的性能都存在先验区别。这两篇论文涉及监督学习，但理论结构被应用于多个领域，其中两种不同的算法在一类问题和相关误差函数中表现更好。

Focusing on supervised learning, in the first of the previously mentioned papers the concept of “off-training set” (OTS) is defined and the associated performance measure of the supervised learning algorithm is proposed. The mathematical formalism used is based on the so-called extended Bayesian formalism and is refined in order to take into account the generalization error, the cost function, and their relation to the learning algorithm while providing the necessary hypotheses for the training sets and the targets. In the sequel the probability of some cost “c” of the learning algorithm associated with the loss function is proposed as follows:

针对监督学习，在前面提到的论文中，首先定义了“非训练集”(off-training set, OTS)的概念，并提出了监督学习算法的相关性能度量。所使用的数学形式是基于所谓的扩展贝叶斯形式，并进行了改进，以考虑泛化误差、成本函数及其与学习算法的关系，同时为训练集和目标提供必要的假设。在后续中，我们提出了与损失函数相关的学习算法的某个代价c的概率:

3 No Free Lunch for Optimization and Search

3 优化和搜索没有免费的午餐

Another direction of research for applying the ideas of the NFL theorems, as presented above, concerns the domain of optimization. The work “No free lunch theorems for optimization” [62] published by Wolpert and McReedy deals with this matter based on two technical reports produced by the authors at the Santa Fe Institute. The first technical report published in [35] with the title “What makes an optimization problem hard?” raises the question: “Are some classes of combinatorial optimization problems intrinsically harder than others, without regard to the algorithm one uses, or can difficulty be assessed only relative to a particular algorithm?” The second technical report [61], entitled: “No free lunch theorems for search” focuses on proving that all algorithms searching for an optimum of an optimization problem, i.e., an extremum of an objective function, performs exactly the same, no matter the performance measure used, when taking the average over all possible objective functions.

应用NFL定理思想的另一个研究方向，如上所述，涉及优化领域。Wolpert和McReedy发表的著作“没有免费的优化午餐定理”[62]基于作者在圣达菲研究所制作的两份技术报告处理了这个问题。发表于[35]的第一份技术报告，题为“是什么让优化问题变得困难?”提出了一个问题:“是否某些类别的组合优化问题本质上比其他问题更困难，而不考虑使用的算法，或者难度只能与特定算法相关?”第二份技术报告[61]，题为:“搜索没有免费的午餐定理”，重点是证明所有搜索优化问题的最优算法，即目标函数的极值，在对所有可能的目标函数取平均值时，无论使用何种性能度量，执行完全相同。

The work of Wolpert and McReedy “No free lunch theorems for optimization” [62], sets up a formalism for investigating the relation of the effectiveness of optimization algorithms and the problems they are solving. The NFL theorems developed in the paper establish that the successful performance of any optimization algorithm on one class of problems is counterbalanced by its degraded performance on another class of problems. A geometric interpretation is provided concerning the meaning of the fitness of an algorithm to cope with some optimization problem.

Moreover, as mentioned in the previous technical reports the authors examine applications of NFL theorems to information-theoretic aspects of optimization as well as to defining measures of performance for optimization benchmarks.

Wolpert和McReedy的工作“优化没有免费的午餐定理”[62]，为研究优化算法的有效性与其所解决的问题之间的关系建立了一个形式体系。文中提出的NFL定理表明，任何优化算法在一类问题上的成功性能都会被其在另一类问题上的性能下降所抵消。给出了算法适应度含义的几何解释，以解决某些优化问题。

此外，正如前面的技术报告中提到的，作者研究了NFL定理在优化的信息论方面的应用，以及为优化基准定义性能度量。

Given the multitude of black-box optimization techniques available, the authors try to provide the formalism for tackling the following problem: “is there a relationship between how well an algorithm performs and the optimization problem on which it is run?” This problem can be cast in several other such as: (a) What are the mathematical constituents of optimization theory one needs to know before deciding on the necessary probability distributions to be applied? (b) Are information theory and Bayesian analysis suitable for understanding the previous issues? (c) Given the performance results of a certain algorithm on a certain class of problems can one provide a priori generalization of these results on other classes of problems? (d) Is there a suitable measure of such generalization? Can one evaluate the performance of algorithms on problems so that he is able to compare those algorithms?

考虑到大量可用的黑盒优化技术，作者试图提供解决以下问题的形式化方法:“算法执行的好坏与其运行的优化问题之间是否存在关系?”这个问题可以用其他几个方法来解决，例如:(a)在决定应用必要的概率分布之前，需要知道优化理论的哪些数学成分?(b)信息论和贝叶斯分析是否适用于理解前面的问题?(c)给定某一算法在某一类问题上的性能结果，能否将这些结果先验地推广到其他一类问题上?(d)是否有适当的衡量这种普遍化的办法?一个人可以评估算法在问题上的表现，以便他能够比较那些算法吗?