机器学习学习笔记——1.1.1.6.6 Running gradient descent（运行梯度下降）

预见未来to50

于 2024-09-18 15:15:46 发布

阅读量248

点赞数 3

分类专栏：机器学习、深度学习（ML/DL) 文章标签：机器学习学习笔记

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/142334288

版权

机器学习、深度学习（ML/DL) 专栏收录该内容

148 篇文章 12 订阅

订阅专栏

Let's see what happens when you run gradient descent for linear regression. Let's go see the algorithm in action. Here's a plot of the model and data on the upper left and a contour plot of the cost function on the upper right and at the bottom is the surface plot of the same cost function. Often w and b will both be initialized to 0, but for this demonstration, lets initialized w = -0.1 and b = 900. So this corresponds to f(x) = -0.1x + 900. Now, if we take one step using gradient descent, we ended up going from this point of the cost function out here to this point just down and to the right and notice that the straight line fit is also changed a bit. Let's take another step.

The cost function has now moved to this third and again the function f(x) has also changed a bit. As you take more of these steps, the cost is decreasing at each update. So the parameters w and b are following this trajectory. And if you look on the left, you get this corresponding straight line fit that fits the data better and better until we've reached the global minimum. The global minimum corresponds to this straight line fit, which is a relatively good fit to the data. I mean, isn't that cool. And so that's gradient descent and we're going to use this to fit a model to the holding data. And you can now use this f(x) model to predict the price of your clients house or anyone else's house. For instance, if your friend's house size is 1250 square feet, you can now read off the value and predict that maybe they could get, I don't know, $250,000 for the house.

To be more precise, this gradient descent process is called batch gradient descent. The term batch gradient descent refers to the fact that on every step of gradient descent, we're looking at all of the training examples, instead of just a subset of the training data. So in computing grading descent, when computing derivatives, when computing the sum from i =1 to m. And bach gradient descent is looking at the entire batch of training examples at each update. I know that bash grading percent may not be the most intuitive name, but this is what people in the machine learning community call it. If you've heard of the newsletter The Batch, that's published by DeepLearning.AI. The newsletter The batch was also named for this concept in machine learning. And then it turns out that there are other versions of gradient descent that do not look at the entire training set, but instead looks at smaller subsets of the training data at each update step. But we'll use batch gradient descent for linear regression. So that's it for linear regression.

Congratulations on getting through your first machine learning model. I hope you go and celebrate or I don't know maybe take a nap in your hammock. In the optional lab that follows this video. You'll see a review of the gradient descent algorithm as was how to implement it in code. You'll also see a plot that shows how the cost decreases as you continue training more iterations. And you'll also see a contour plot, seeing how the cost gets closer to the global minimum as gradient descent finds better and better values for the parameters w and b. So remember that to do the optional lab. You just need to read and run this code. You will need to write any code yourself and I hope you take a few moments to do that. And also become familiar with the gradient descent code because this will help you to implement this and similar algorithms in the future yourself.

Thanks for sticking with me through the end of this last video for the first week and congratulations for making it all the way here. You're on your way to becoming a machine learning person. In addition to the optional labs, if you haven't done so yet. I hope you also check out the practice quizzes, which are a nice way that you can double check your own understanding of the concepts. It's also totally fine, if you don't get them all right the first time. And you can also take the quizzes multiple times until you get the score that you want. You now know how to implement linear regression with one variable and that brings us to the close of this week. Next week, we'll learn to make linear regression much more powerful instead of one feature like size of a house, you learn how to get it to work with lots of features. You'll also learn how to get it to fit nonlinear curves. These improvements will make the algorithm much more useful and valuable.

Lastly, we'll also go over some practical tips that will really hope for getting linear regression to work on practical applications. I'm really happy to have you here with me in this class and I look forward to seeing you next week.

让我们看看当你对线性回归运行梯度下降时会发生什么。我们去看看算法是如何运作的。这里有一个模型和数据在左上角的图，成本函数的等高线图在右上角，底部是相同成本函数的曲面图。通常w和b都会初始化为0，但为了这次演示，我们将w初始化为-0.1，b初始化为900。所以这对应于f(x) = -0.1x + 900。现在，如果我们使用梯度下降迈出一步，我们从成本函数的这一点移动到右边下方的另一点，并注意到直线拟合也改变了一点。让我们再走一步。

成本函数现在已经移动到了第三个点，而且函数f(x)也有所改变。随着你采取更多的步骤，成本在每次更新时都在减少。因此，参数w和b沿着这条轨迹移动。如果你看左边，你会得到一个相应的直线拟合，它越来越好地拟合数据，直到我们达到了全局最小值。全局最小值对应于这条直线拟合，这是对数据的相对较好的拟合。我的意思是，这不酷吗？这就是梯度下降，我们将使用它来将模型拟合到持有数据。你现在可以使用这个f(x)模型来预测你的客户的房子或任何人的房子的价格。例如，如果你朋友的房子大小是1250平方英尺，你现在可以读出价值并预测他们可能得到的价格，我不知道，可能是250,000美元。

更精确地说，这个过程称为批量梯度下降。批量梯度下降这个词指的是在梯度下降的每一步，我们都在查看所有训练样本，而不是仅仅查看训练数据的一个子集。所以在计算梯度下降时，当计算导数时，当计算从i=1到m的总和时。而批量梯度下降在每次更新时都查看整个训练样本集合。我知道批量梯度下降可能不是最直观的名字，但机器学习社区的人们就是这么称呼它的。如果你听说过由DeepLearning.AI出版的《The Batch》通讯，那么这个名字也是基于机器学习中的这个概念。然后事实证明，还有其他版本的梯度下降，它们不查看整个训练集，而是在每次更新步骤中查看训练数据的较小子集。但我们将对线性回归使用批量梯度下降。这就是关于线性回归的全部内容。

恭喜你完成了第一个机器学习模型的学习。我希望你去庆祝一下，或者我不知道，也许在你的吊床上小睡一会儿。在这个视频之后的可选实验室里。你会看到对梯度下降算法的回顾以及如何在代码中实现它。你还会看到一个图，显示了随着你继续训练更多迭代，成本如何降低。你还会看到一个等高线图，看到随着梯度下降找到更好的参数w和b的值，成本如何更接近全局最小值。所以记住要做可选的实验室。你只需要阅读并运行这段代码。你自己不需要编写任何代码，我希望你花一些时间去做那件事。并且还要熟悉梯度下降代码，因为这将帮助你将来自己实现这个和类似的算法。

感谢你一直坚持到最后这个第一周的最后一个视频，恭喜你一直走到这里。你正在成为机器学习人的路上。除了可选的实验室之外，如果你还没有做的话。我也希望你检查一下练习测验，这是一个你可以自我检查概念理解的好方法。如果你第一次没有全部答对也完全没问题。你也可以多次参加测验，直到你得到你想要的分数。你现在知道如何实现单变量线性回归，这就结束了本周的内容。下周，我们将学习如何使线性回归变得更加强大，而不是像房子的大小这样的单一特征，你将学习如何让它与许多特征一起工作。你还将学习如何使其适应非线性曲线。这些改进将使算法更加有用和有价值。

最后，我们还将介绍一些实用技巧，这些技巧真的有助于将线性回归应用于实际应用。我很高兴你能和我一起参加这门课，我期待下周见到你。