【读博客/翻译】A Visual Guide to Evolution Strategies 进化策略的视觉指南

最新推荐文章于 2023-07-05 23:27:27 发布

wxmcp3

最新推荐文章于 2023-07-05 23:27:27 发布

阅读量747

点赞数

分类专栏：文献阅读文章标签：人工智能算法

本文链接：https://blog.csdn.net/wxmcp3/article/details/109035199

版权

本文以直观的示例解释了进化策略（ES）的工作原理，对比了ES与反向传播在解决复杂问题时的优势。文章介绍了简单进化策略、遗传算法、CMA-ES等，并讨论了适应度整形在优化中的作用。此外，还探讨了ES在MNIST图像分类任务中的表现，以及在100维Rastrigin函数优化问题上的性能比较。

摘要由CSDN通过智能技术生成

进化策略的视觉指南 A Visual Guide to Evolution Strategies

brief

博客地址链接

这个图首先就好有趣！
Survival of the fittest.物竞天择，适者生存

下面开始正文：
在这篇文章中，我借助一些直观的例子来解释进化策略（ES）是如何工作的。我尽量让方程式保持轻松，如果读者希望了解更多细节，我提供了原始文章的链接。这是一系列文章的第一篇，我计划在这里展示如何将这些算法应用到从MNIST、OpenAI Gym、Roboschool到PyBullet环境的一系列任务中。

Introduction

Neural network models are highly expressive and flexible, and if we are able to find a suitable set of model parameters, we can use neural nets to solve many challenging problems. Deep learning’s success largely comes from the ability to use the backpropagation algorithm to efficiently calculate the gradient of an objective function over each model parameter. With these gradients, we can efficiently search over the parameter space to find a solution that is often good enough for our neural net to accomplish difficult tasks.
神经网络模型具有很强的表现力和灵活性，如果我们能够找到一组合适的模型参数，我们就可以使用神经网络来解决很多具有挑战性的问题。深度学习的成功很大程度上来自于使用反向传播算法高效计算目标函数在每个模型参数上的梯度的能力。有了这些梯度，我们就可以在参数空间上高效地搜索，找到一个通常足够好的解决方案，让我们的神经网完成困难的任务。
However, there are many problems where the backpropagation algorithm cannot be used. For example, in reinforcement learning (RL) problems, we can also train a neural network to make decisions to perform a sequence of actions to accomplish some task in an environment. However, it is not trivial to estimate the gradient of reward signals given to the agent in the future to an action performed by the agent right now, especially if the reward is realised many timesteps in the future. Even if we are able to calculate accurate gradients, there is also the issue of being stuck in a local optimum, which exists many for RL tasks.
然而，有很多问题不能使用反向传播算法。例如，在强化学习(RL)问题中，我们也可以训练一个神经网络来做出决策，以执行一系列的动作来完成环境中的一些任务。然而，要估计未来给予代理的奖励信号与代理现在所执行的动作的梯度并不是一件小事，尤其是当奖励在未来许多时间步实现的时候。即使我们能够计算出准确的梯度，也会存在卡在局部最优的问题，对于RL任务来说，存在很多问题。

A whole area within RL is devoted to studying this credit-assignment problem, and great progress has been made in recent years. However, credit assignment is still difficult when the reward signals are sparse. In the real world, rewards can be sparse and noisy. Sometimes we are given just a single reward, like a bonus check at the end of the year, and depending on our employer, it may be difficult to figure out exactly why it is so low. For these problems, rather than rely on a very noisy and possibly meaningless gradient estimate of the future to our policy, we might as well just ignore any gradient information, and attempt to use black-box optimisation techniques such as genetic algorithms (GA) or ES.
RL内部有一整个领域专门研究这个信用分配问题，近年来取得了很大的进展。然而，当奖励信号是稀疏的时候，信用分配仍然是困难的。在现实世界中，奖励可能是稀疏的、有噪音的。有时候，我们只得到一个单一的奖励，比如年底的奖金支票，根据雇主的不同，我们可能很难弄清楚它到底为什么这么低。对于这些问题，与其依靠一个非常嘈杂且可能毫无意义的未来梯度估计来制定我们的政策，不如直接忽略任何梯度信息，尝试使用遗传算法（GA）或ES等黑盒优化技术。
OpenAI published a paper called Evolution Strategies as a Scalable Alternative to Reinforcement Learning where they showed that evolution strategies, while being less data efficient than RL, offer many benefits. The ability to abandon gradient calculation allows such algorithms to be evaluated more efficiently. It is also easy to distribute the computation for an ES algorithm to thousands of machines for parallel computation. By running the algorithm from scratch many times, they also showed that policies discovered using ES tend to be more diverse compared to policies discovered by RL algorithms.
OpenAI发表了一篇名为《Evolution Strategies as a Scalable Alternative to Reinforcement Learning》的论文，他们在论文中表明，进化策略虽然数据效率比RL低，但却有很多好处。放弃梯度计算的能力使得这类算法的评估效率更高。也很容易将一个ES算法的计算分布到数千台机器上进行并行计算。通过从头开始多次运行该算法，他们还表明，与RL算法发现的策略相比，使用ES发现的策略往往更加多样化。
I would like to point out that even for the problem of identifying a machine learning model, such as designing a neural net’s architecture, is one where we cannot directly compute gradients. While RL, Evolution, GA etc., can be applied to search in the space of model architectures, in this post, I will focus only on applying these algorithms to search for parameters of a pre-defined model.
我想指出的是，即使是对于识别机器学习模型的问题，比如设计神经网的架构，也是我们无法直接计算梯度的。RL、Evolution、GA等，都可以应用在模型架构的空间中进行搜索，而在这篇文章中，我将只关注应用这些算法来搜索预先定义的模型的参数。

What is an Evolution Strategy? 什么是进化策略？

Two-dimensional Rastrigin function has many local optima 二维Rastrigin函数有许多局部最优值。

The diagrams below are top-down plots of shifted 2D Schaffer and Rastrigin functions, two of several simple toy problems used for testing continuous black-box optimisation algorithms. Lighter regions of the plots represent higher values of $F (x, y)$ . As you can see, there are many local optimums in this function. Our job is to find a set of model parameters $(x, y)$ , such that $F (x, y)$ is as close as possible to the global maximum.
下图是移位的二维Schaffer和Rastrigin函数的自上而下的图，这是用于测试连续黑盒优化算法的几个简单玩具问题中的两个。图中较浅的区域代表 $F (x, y)$ 的较高值。正如你所看到的，在这个函数中存在许多局部最优值。我们的工作是找到一组模型参数 $(x, y)$ ，使 $F (x, y)$ 尽可能地接近全局最大值。

Although there are many definitions of evolution strategies, we can define an evolution strategy as an algorithm that provides the user a set of candidate solutions to evaluate a problem. The evaluation is based on an objective function that takes a given solution and returns a single fitness value. Based on the fitness results of the current solutions, the algorithm will then produce the next generation of candidate solutions that is more likely to produce even better results than the current generation. The iterative process will stop once the best known solution is satisfactory for the user.
虽然进化策略的定义有很多，但我们可以将进化策略定义为：为用户提供一组候选解来评估问题的算法。评估是基于一个目标函数，它取一个给定的解决方案，并返回一个单一的适应度值。根据当前解决方案的适应度结果，算法将产生下一代候选解决方案，这些解决方案更有可能产生比当前一代更好的结果。一旦最佳已知解令用户满意，迭代过程将停止。
Given an evolution strategy algorithm called EvolutionStrategy, we can use in the following way:
给定一个名为EvolutionStrategy的进化策略算法，我们可以用以下方式：

solver = EvolutionStrategy()

while True:

  # ask the ES to give us a set of candidate solutions
  solutions = solver.ask()

  # create an array to hold the fitness results.
  fitness_list = np.zeros(solver.popsize)

  # evaluate the fitness for each given solution.
  for i in range(solver.popsize):
    fitness_list[i] = evaluate(solutions[i])

  # give list of fitness results back to ES
  solver.tell(fitness_list)

  # get best parameter, fitness from ES
  best_solution, best_fitness = solver.result()

  if best_fitness > MY_REQUIRED_FITNESS:
    break

Although the size of the population is usually held constant for each generation, they don’t need to be. The ES can generate as many candidate solutions as we want, because the solutions produced by an ES are sampled from a distribution whose parameters are being updated by the ES at each generation. I will explain this sampling process with an example of a simple evolution strategy.
虽然每一代的种群规模通常是保持不变的，但它们并不需要保持不变。ES可以产生我们想要的任何数量的候选解，因为ES产生的解是从一个分布中抽样出来的，而这个分布的参数在每一代都会被ES更新。我将用一个简单的进化策略的例子来解释这个采样过程。

Simple Evolution Strategy

One of the simplest evolution strategy we can imagine will just sample a set of solutions from a Normal distribution, with a mean $\mu$ and a fixed standard deviation $\sigma$ . In our 2D problem, $\mu=(\mu_x,\mu_y)$ and $\sigma=(\sigma_x,\sigma_y)$ . Initially, $\mu$ is set at the origin. After the fitness results are evaluated, we set $\mu$ to the best solution in the population, and sample the next generation of solutions around this new mean. This is how the algorithm behaves over 20 generations on the two problems mentioned earlier:
我们可以想象的最简单的演化策略之一是从正态分布中抽取一组解的样本，其平均值为 $\mu$ ，标准差为 $\sigma$ 。在我们的二维问题中， $\mu=(\mu_x,\mu_y)$ 和 $\sigma=(\sigma_x,\sigma_y)$ 。最初， $\mu$ 在原点设置。在评估了适应性结果后，我们将 $\mu$ 设置为种群中最好的解，并围绕这个新的平均值对下一代的解进行采样。这就是算法在前面提到的两个问题上超过20代的表现:

In the visualisation above, the green dot indicates the mean of the distribution at each generation, the blue dots are the sampled solutions, and the red dot is the best solution found so far by our algorithm.
在上面的可视化中，绿色的点表示每一代分布的平均值，蓝色的点是采样的解，红色的点是我们算法目前找到的最佳解。
This simple algorithm will generally only work for simple problems. Given its greedy nature, it throws away all but the best solution, and can be prone to be stuck at a local optimum for more complicated problems. It would be beneficial to sample the next generation from a probability distribution that represents a more diverse set of ideas, rather than just from the best solution from the current generation.
这种简单的算法一般只适用于简单的问题。考虑到它的贪婪性质，除了最佳解之外，它会丢掉所有的解，对于更复杂的问题，可能容易卡在局部最优。从代表更多不同想法的概率分布中抽取下一代的样本，而不是仅仅从当前一代的最佳解中抽取样本，这将是有益的。

Simple Genetic Algorithm 简单遗传算法

One of the oldest black-box optimisation algorithms is the genetic algorithm. There are many variations with many degrees of sophistication, but I will illustrate the simplest version here.
遗传算法是最古老的黑盒优化算法之一。有许多复杂程度的变化，但我将在这里说明最简单的版本。
The idea is quite simple: keep only 10% of the best performing solutions in the current generation, and let the rest of the population die. In the next generation, to sample a new solution is to randomly select two solutions from the survivors of the previous generation, and recombine their parameters to form a new solution. This crossover recombination process uses a coin toss to determine which parent to take each parameter from. In the case of our 2D toy function, our new solution might inherit xx or yy from either parents with 50% chance. Gaussian noise with a fixed standard deviation will also be injected into each new solution after this recombination process.
这个想法很简单：在当前一代中只保留10%的表现最好的解，让其余的解死亡。在下一代中，抽取新解的样本，就是从上一代的幸存者中随机选取两个解，并将它们的参数重新组合，形成一个新的解。这个交叉重组的过程采用掷硬币的方式来决定每个参数取自哪个父代。在我们的二维玩具函数的情况下，我们的新解可能会以50%的概率从任何一个父代继承xx或yy。在这个重新组合过程之后，每个新的解都会被注入一个固定标准差的高斯噪声。

The figure above illustrates how the simple genetic algorithm works. The green dots represent members of the elite population from the previous generation, the blue dots are the offsprings to form the set of candidate solutions, and the red dot is the best solution.
上图说明了简单遗传算法的工作原理。绿色的点代表上一代精英人群的成员，蓝色的点是形成候选解集的后代，红色的点是最佳解。
Genetic algorithms help diversity by keeping track of a diverse set of candidate solutions to reproduce the next generation. However, in practice, most of the solutions in the elite surviving population tend to converge to a local optimum over time. There are more sophisticated variations of GA out there, such as CoSyNe, ESP, and NEAT, where the idea is to cluster similar solutions in the population together into different species, to maintain better diversity over time.
遗传算法通过跟踪一组多样化的候选解来帮助多样性，以繁衍下一代。然而，在实践中，大多数精英存活种群中的解决方案往往会随着时间的推移而收敛到一个局部最优。有一些更复杂的GA变体，如CoSyNe、ESP和NEAT，其想法是将种群中相似的解聚成不同的物种，以在一段时间内保持更好的多样性。

Covariance-Matrix Adaptation Evolution Strategy (CMA-ES)协方差矩阵适应演化策略

A shortcoming of both the Simple ES and Simple GA is that our standard deviation noise parameter is fixed. There are times when we want to explore more and increase the standard deviation of our search space, and there are times when we are confident we are close to a good optima and just want to fine tune the solution. We basically want our search process to behave like this:
简单ES和简单GA的一个缺点是我们的标准差噪声参数是固定的。有的时候，我们希望探索更多，增加搜索空间的标准差，有的时候，我们自信已经接近一个好的optima，只想微调一下解决方案。我们基本上希望我们的搜索过程表现为这样的状态：

Amazing isn’it it? The search process shown in the figure above is produced by Covariance-Matrix Adaptation Evolution Strategy (CMA-ES). CMA-ES an algorithm that can take the results of each generation, and adaptively increase or decrease the search space for the next generation. It will not only adapt for the mean $\mu$ and $\sigma$ parameters, but will calculate the entire covariance matrix of the parameter space. At each generation, CMA-ES provides the parameters of a multi-variate normal distribution to sample solutions from. So how does it know how to increase or decrease the search space?
很神奇吧？上图所示的搜索过程是由协方差矩阵适应进化策略（CMA-ES）产生的。CMA-ES一种算法，它可以把每一代的结果，自适应地增加或减少下一代的搜索空间。它不仅会对均值 $\mu$ 和 $\sigma$ 参数进行自适应，而且会计算整个参数空间的协方差矩阵。在每一代中，CMA-ES都会提供一个多变量正态分布的参数来采样求解。那么它是如何知道如何增加或减少搜索空间的呢？
Before we discuss its methodology, let’s review how to estimate a covariance matrix. This will be important to understand CMA-ES’s methodology later on. If we want to estimate the covariance matrix of our entire sampled population of size of $N$ , we can do so using the set of equations below to calculate the maximum likelihood estimate of a covariance matrix $C$ . We first calculate the means of each of the $x_i$ and $y_i$ in our population:
在讨论其方法论之前，我们先回顾一下如何估计协方差矩阵。这对后面理解 CMA-ES 的方法论很重要。如果我们想估计我们整个大小为 $N$ 的抽样人口的协方差矩阵，我们可以使用下面的方程组来计算协方差矩阵 $C$ 的最大似然估计。我们首先计算人口中 $x_i$ 和 $y_i$ 各自的平均值:
$\mu_x=\frac{1}{N}\sum\limits_{i=1}^Nx_i$
The terms of the $2\times2$ covariance matrix $C$ will be:
$\sigma_x^2=\frac{1}{N}\sum\limits_{i=1}^N(x_i-\mu_x)^2$

最低0.47元/天解锁文章

wxmcp3

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
【读博客/翻译】A Visual Guide to Evolution Strategies 进化策略的视觉指南

进化策略的视觉指南 A Visual Guide to Evolution Strategiesbrief博客地址链接这个图首先就好有趣！Survival of the fittest.物竞天择，适者生存下面开始正文：在这篇文章中，我借助一些直观的例子来解释进化策略（ES）是如何工作的。我尽量让方程式保持轻松，如果读者希望了解更多细节，我提供了原始文章的链接。这是一系列文章的第一篇，我计划在这里展示如何将这些算法应用到从MNIST、OpenAI Gym、Roboschool到PyBullet
复制链接

扫一扫