凸优化机器学习深度学习_机器学习中的优化第1部分

最新推荐文章于 2021-10-21 18:23:47 发布

weixin_26756255

最新推荐文章于 2021-10-21 18:23:47 发布

阅读量806

点赞数

文章标签：机器学习人工智能深度学习 python tensorflow

原文链接：https://medium.com/swlh/optimization-in-machine-learning-part-1-e9da1aa1eedf

版权

凸优化机器学习深度学习

Optimization in Machine Learning is one of the most important steps and possibly the hardest to learn also. The optimizer is a function that optimizes Machine Learning models using training data. Optimizers use a Loss Function to calculate the loss of the model and then based on that tries to optimize it. So without an optimizer, a Machine Learning model can’t do anything amazing.

机器学习中的优化是最重要的步骤之一，也是最难学习的步骤。优化器是一项使用训练数据优化机器学习模型的功能。 优化器使用损失函数来计算模型的损失，然后基于该损失对其进行优化。 因此，没有优化器，机器学习模型就无法做任何令人惊奇的事情。

In this blog, my aim is to explain how optimization works, the logic behind it, and the math behind it. I’ll not explain/provide any code. If you’re looking for a Mathematical/Logical explanation, only then continue.

在此博客中，我的目的是解释优化的工作原理，背后的逻辑以及背后的数学原理。我不会解释/提供任何代码。如果您要查找数学/逻辑解释，请仅继续。

This is the first part of these series of blogs on Optimization on Machine Learning. In this blog, I’ll explain optimization in an ultra-simple way with a stupid example. This is specifically helpful for absolute beginners who do not have any idea how optimization works.

这是关于机器学习优化的系列博客的第一部分。在此博客中，我将通过一个愚蠢的示例以超简单的方式说明优化。这对于不知道优化如何工作的绝对初学者特别有用。

As I mentioned earlier, the optimizer uses a Loss Function to calculate the loss of the model, and then based on that the optimizer updates the model to achieve a better score, so let’s understand Loss Function first.

正如我前面提到的，优化器使用损失函数来计算模型的损失，然后基于该损失来更新模型以获得更好的评分，因此让我们首先了解损失函数。

那么，损失函数到底是什么呢？ (So what the hell is Loss Function?)

Loss Function (also known as Error Function, Cost Function, Energy Function) is a function that calculates how good/bad a Machine Learning model is. So if you train a model, you can use a loss function to calculate the error rate. If the error is 0, then your model is perfect.

损失函数(也称为误差函数，成本函数，能量函数)是一种功能，用于计算机器学习模型的优劣。因此，如果训练模型，则可以使用损失函数来计算错误率。如果误差为0，则您的模型是完美的。

In real-world projects, it is impossible to achieve error 0, so the aim is always to achieve something that is close to 0.

在现实世界的项目中，不可能达到错误0，因此目标总是要达到接近0的水平。

如何计算呢？ (How to calculate it?)

There are several ways to calculate the loss of a model using some Loss Functions. As this blog is on Optimizer, I don't want to spend too much time of Loss Functions, but basically it uses the predicted values from the model, the actual values for that input, and then perform some calculations to find the error rate.

有几种方法可以使用某些损失函数来计算模型的损失。正如此博客上的Optimizer一样，我不想花费太多时间损失函数，但基本上，它使用模型的预测值，该输入的实际值，然后执行一些计算以找到错误率。

均方误差 (Mean Squared Error)

One popular way to calculate the error rate of the model is called MSE or Mean Squared Error. In Mean Squared Error, we calculate the mean of the squared difference between all the predicted values and actual values for all inputs. The mathematical formula is given below.

一种计算模型错误率的流行方法是MSE或均方误差。在均方误差中，我们计算所有输入的所有预测值和实际值之间的平方差的均值。数学公式如下。

Image for post — The formula for Mean Squared Error

MSE basically outputs a number. The lowest that we can achieve is 0. The output is 0 or always greater than zero.

MSE基本上输出一个数字。我们可以实现的最低值为0。输出为0或始终大于零。

Now as we have some understanding of Loss Functions, let’s jump into optimization.

现在，我们对损失函数有了一些了解，让我们进入优化。

优化开始 (Optimization Begins)

Once we have the Loss Function, we have a measure to tell how good/bad a model is, and the optimizer now can reduce the error rate from the Loss Function, and hence optimize the model.

一旦有了损失函数，就可以衡量模型的好坏，并且优化器现在可以降低损失函数的错误率，从而优化模型。

This part of this blog is heavily inspired by the YouTube video of Brandon Rohrer. Please check his video on YouTube for better explanation.

该博客的这一部分受到YouTube上Brandon Rohrer视频的启发。请查看他在YouTube上的视频，以获得更好的解释。

演示地址

让我们介绍一个令人耳目一新的问题 (Let’s introduce a refreshing problem)

Let’s say we’re making tea. The recipe for making tea is very simple, we just need to boil some water, add some tea, some milk, and some sugar, and that’s it, we have some tea.

假设我们正在泡茶。泡茶的方法很简单，我们只需要烧开水，加一些茶，牛奶和糖，就可以了。

Now let’s focus on the sugar part. If we add too much sugar to the tea, then it tastes bad, and on the other hand, if we add very little sugar on it, then it also tastes bad. If we add the perfect amount of sugar to it, only then it just tastes perfect👌.

现在，让我们专注于糖部分。如果我们在茶中添加过多的糖，那么它的味道很不好；另一方面，如果我们在茶中添加的糖很少，那么它的味道也很差。如果我们添加适量的糖，那么它的味道就很完美👌。

Based on this if we plot a graph, where the X-axis represents the amount of sugar, and the Y-axis represents how bad the taste of the tea is, then it looks something like this. Here in the graph, the lowest point is the sweet spot.

基于此，如果我们绘制一个图，其中X轴表示糖的量，Y轴表示茶的味道有多糟糕，则它看起来像这样。在图中的最低点是最佳点。

Now as I’m stupid, I’m not going to ask my mom about how much sugar to add on tea, so instead, well I decided to write a Machine Learning based solution that can tell me how much to add for that “perfect tea”.

现在，由于我很愚蠢，我不会再问妈妈要在茶中添加多少糖，所以我决定写一个基于机器学习的解决方案，告诉我为“完美”添加多少糖。茶”。

让我们对其进行优化 (Let’s optimize it)

Here the goal is basically to find to amount of sugar needed to make the perfect tea. So the job of the optimizer here is to find the optimal amount of sugar needed for the “perfect tea”.

在这里，目标基本上是找到制作完美茶所需的糖量。因此，优化器在这里的工作是找到“完美茶”所需的最佳糖量。

There are different ways to optimize a problem, the first and possibly the most inefficient way to do this is by using something called “Exhaustive Search”

优化问题的方法有多种，第一种，可能也是效率最低的方法是使用“ 穷举搜索 ”

So In the Exhaustive Search algorithm, we basically just look at the lowest point on the graph.

因此，在穷举搜索算法中，我们基本上只看图上的最低点 。

However, in real-world situations, we don’t have a graph like that. So that means, we need some data, or in other words, I need to make a lot of tea, measure the amount of sugar that I added, ask my virtual girlfriend😭 to taste it, and then store her feedback.

但是，在现实世界中，我们没有这样的图。因此，这意味着我们需要一些数据，或者换句话说，我需要泡很多茶，测量添加的糖量，让我的虚拟女友品尝，然后存储她的反馈。

For real-world problems, Loss Functions are used here. As I explained earlier, Loss Functions are the function that tells how good/bad our model is. So in this case, it can tell how bad/good the tea is.

对于实际问题，此处使用损失函数。正如我之前解释的，损失函数是告诉我们模型好坏的函数。因此，在这种情况下，它可以判断茶的质量。

So after doing this step to hundreds of times, I have the amount of sugar on one column and feedback (some type of score for the tea, for some weird reason, a higher score means the tea is awful) on the second column.

因此，在执行了数百次此步骤之后，我在第一列上显示了糖的量，并在第二列上给出了反馈(某些类型的茶分数，出于某种奇怪的原因，分数越高表示茶的质量越差)。

Once I have the data, the Exhaustive Search algorithm can scan through the data and find the optimal solution, or in simple words, how much sugar to add on tea.

获得数据后， 穷举搜索算法可以扫描数据并找到最佳解决方案 ，或者简单地说，可以在茶中添加多少糖。

This Exhaustive Search algorithm is very simple and robust, but on the other hand, computationally very expensive as we have a check all possible solutions to find the optimal solution. The complexity is so high that for many real-world problems we just can not use this algorithm.

这种穷举搜索算法非常简单且健壮，但另一方面，由于我们检查了所有可能的解以找到最优解，因此计算上非常昂贵 。 复杂性如此之高，以至于对于许多现实世界中的问题，我们只是无法使用此算法。

The other possible way to optimize is by using Gradient Descent. Gradient Descent uses Partial Derivatives to calculate the slope at any point in the curve, and then based on that, it changes the amount of sugar to find the optimal solution(best tea).

优化的另一种可能方法是使用Gradient Descent 。梯度下降使用偏导数来计算曲线中任意点的斜率 ，然后基于此值改变糖的量以找到最佳溶液(最佳茶)。

Currently, in this industry, Gradient Descent is a technique used most often, so to understand it properly, let’s dive a little deep.

当前，在这个行业中，梯度下降是一种最常用的技术，因此，为了深入理解它，让我们深入研究一下。

If you want to learn Gradient Descent with some mathematical details then, please jump into the second part of the blog, here I’ll explain it in an ultra-simple way without too much maths.

如果您想学习一些数学细节的Gradient Descent，请跳到博客的第二部分，在这里，我将以一种非常简单的方式来解释它，而无需过多的数学运算。

简单的梯度下降 (Gradient Descent in simple words)

In the graph above, let’s pick any random point. Here in the graph, see there is a point, so for making tea, I’ll add that amount of sugar(based on the point in the graph, remember the X-axis represents the amount of sugar) into the tea.

在上图中，让我们选择任何随机点。在图形中的此处，有一个点，因此在制作茶时，我将糖的量(基于图形中的点，记住X轴表示糖的量)添加到茶中。

Once I make the tea, well, I can ask my virtual girlfriend😭 to taste it, based on that I can store the feedback.

泡好茶后，我可以请虚拟女朋友😭品尝茶，然后在此基础上存储反馈。

Now let’s pick more random points but around the old point. Again, I’ll make the tea, add the amount of sugar needed, and finally check the quality of the tea.

现在，让我们在旧点附近选择更多随机点。同样，我将煮茶，添加所需的糖量，最后检查茶的质量。

Now based on the data, now I can make check which tea is best and I can make assumptions on what direction related to my first selected point, to make better tea.

现在，基于数据，现在我可以检查哪种茶是最好的，并且可以对与我的第一个选定点相关的方向做出假设，以制成更好的茶。

Clearly, based on the graph, we need to go down (remember the lowest point is the sweet spot).

显然，基于该图，我们需要下降(记住最低点是最佳点)。

Now we can repeat this several times to reach very close to that sweet spot and hence make the perfect tea.

现在，我们可以重复几次以达到非常接近的最佳点，从而制成完美的茶。

美好结局 (Happy Ending)

The explanation of Gradient Descent is extremely simple here. To make it extremely simple to understand, I’ve skipped several important concepts.

梯度下降的解释在这里非常简单。为了使其非常简单易懂，我跳过了几个重要的概念。

This is part one in the series of blogs on Optimization on Machine Learning. So in the next part, I’ll explain Gradient Descent with a real-world example with the complete deprivation and all the important topics like Learning Rate, Momentum, etc.

这是关于机器学习优化的系列博客的第一部分。因此，在下一部分中，我将通过一个真实的示例来说明“梯度下降”，该示例包含完全剥夺和所有重要主题，例如“学习率”，“动量”等。

Currently, there is no second part of this blog, I’m still writing the second part, so once I have it, I will put a link here.

当前，该博客没有第二部分，我仍在编写第二部分，因此，一旦有了它，我将在此处放置一个链接。