机器学习学习笔记——1.1.1.4.6 Visualizing the cost function（成本函数可视化）

预见未来to50

于 2024-09-18 14:08:29 发布

阅读量377

点赞数 4

分类专栏：机器学习、深度学习（ML/DL) 文章标签：机器学习学习笔记

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/142331852

版权

机器学习、深度学习（ML/DL) 专栏收录该内容

148 篇文章 13 订阅

订阅专栏

In the last video, you saw one visualization of the cost function J of w or J of w, b. Let's look at some further richer visualizations so that you can get an even better intuition about what the cost function is doing. Here is what we've seen so far. There's the model, the model's parameters w and b, the cost function J of w and b, as well as the goal of linear regression, which is to minimize the cost function J of w and b over parameters w and b. In the last video, we had temporarily set b to zero in order to simplify the visualizations.

Now, let's go back to the original model with both parameters w and b without setting b to be equal to 0. Same as last time, we want to get a visual understanding of the model function, f of x, shown here on the left, and how it relates to the cost function J of w, b, shown here on the right. Here's a training set of house sizes and prices. Let's say you pick one possible function of x, like this one. Here, I've set w to 0.06 and b to 50. f of x is 0.06 times x plus 50. Note that this is not a particularly good model for this training set, is actually a pretty bad model. It seems to consistently underestimate housing prices. Given these values for w and b let's look at what the cost function J of w and b may look like. Recall what we saw last time was when you had only w, because we temporarily set b to zero to simplify things, but then we had come up with a plot of the cost function that look like this as a function of w only. When we had only one parameter, w, the cost function had this U-shaped curve, shaped a bit like a soup bowl. That sounds delicious.

Now, in this housing price example that we have on this slide, we have two parameters, w and b. The plots becomes a little more complex. It turns out that the cost function also has a similar shape like a soup bowl, except in three dimensions instead of two. In fact, depending on your training set, the cost function will look something like this. To me, this looks like a soup bowl, maybe because I'm a little bit hungry, or maybe to you it looks like a curved dinner plate or a hammock. Actually that sounds relaxing too, and there's your coconut drink. Maybe when you're done with this course, you should treat yourself to vacation and relax in a hammock like this. What you see here is a 3D-surface plot where the axes are labeled w and b. As you vary w and b, which are the two parameters of the model, you get different values for the cost function J of w, and b. This is a lot like the U-shaped curve you saw in the last video, except instead of having one parameter w as input for the j, you now have two parameters, w and b as inputs into this soup bowl or this hammock-shaped function J. I just want to point out that any single point on this surface represents some particular choice of w and b.

For example, if w was minus 10 and b was minus 15, then the height of the surface above this point is the value of j when w is minus 10 and b is minus 15. Now, in order to look even more closely at specific points, there's another way of plotting the cost function J that would be useful for visualization, which is, rather than using these 3D-surface plots, I like to take this exact same function J. I'm not changing the function J at all and plot it using something called a contour plot. If you've ever seen a topographical map showing how high different mountains are, the contours in a topographical map are basically horizontal slices of the landscape of say, a mountain. This image is of Mount Fuji in Japan. I still remember my family visiting Mount Fuji when I was a teenager. It's beautiful sights. If you fly directly above the mountain, that's what this contour map looks like. It shows all the points, they're at the same height for different heights. At the bottom of this slide is a 3D-surface plot of the cost function J. I know it doesn't look very bowl-shaped, but it is actually a bowl just very stretched out, which is why it looks like that. In an optional lab, that is shortly to follow, you will be able to see this in 3D and spin around the surface yourself and it'll look more obviously bowl-shaped there.

Next, here on the upper right is a contour plot of this exact same cost function as that shown at the bottom. The two axes on this contour plots are b, on the vertical axis, and w on the horizontal axis. What each of these ovals, also called ellipses, shows, is the center points on the 3D surface which are at the exact same height. In other words, the set of points which have the same value for the cost function J. To get the contour plots, you take the 3D surface at the bottom and you use a knife to slice it horizontally. You take horizontal slices of that 3D surface and get all the points, they're at the same height. Therefore, each horizontal slice ends up being shown as one of these ellipses or one of these ovals. Concretely, if you take that point, and that point, and that point, all of these three points have the same value for the cost function J, even though they have different values for w and b. In the figure on the upper left, you see also that these three points correspond to different functions, f, all three of which are actually pretty bad for predicting housing prices in this case.

Now, the bottom of the bowl, where the cost function J is at a minimum, is this point right here, at the center of this concentric ovals. If you haven't seen contour plots much before, I'd like you to imagine, if you will, that you are flying high up above the bowl in an airplane or in a rocket ship, and you're looking straight down at it. That is as if you set your computer monitor flat on your desk facing up and the bowl shape is coming directly out of your screen, rising above you desk. Imagine that the bowl shape grows out of your computer screen lying flat like that, so that each of these ovals have the same height above your screen and the minimum of the bowl is right down there in the center of the smallest oval. It turns out that the contour plots are a convenient way to visualize the 3D cost function J, but in a way, there's plotted in just 2D. In this video, you saw how the 3D bowl-shaped surface plot can also be visualized as a contour plot. Using this visualization too, in the next video, let's visualize some specific choices of w and b in the linear regression model so that you can see how these different choices affect the straight line you're fitting to the data. Let's go on to the next video.

在上一个视频中，你看到了成本函数J关于w的可视化，或者J关于w和b的可视化。让我们来看一些更丰富的可视化，这样你可以更好地理解成本函数在做什么。到目前为止，我们所看到的有模型、模型的参数w和b、成本函数J关于w和b，以及线性回归的目标，即最小化参数w和b的成本函数J。在上一个视频中，为了简化可视化，我们暂时将b设为0。

现在，让我们回到原始模型，包含两个参数w和b，不将b设为0。与上次相同，我们想要直观地理解模型函数f(x)，如左侧所示，以及它与右侧所示的成本函数J关于w和b的关系。这里有一个房屋大小和价格的训练集。假设你选择了一个可能的x函数，比如这个。这里，我将w设为0.06，b设为50。f(x)是0.06乘以x加上50。注意，对于这个训练集来说，这并不是一个特别好的模型，实际上是一个相当糟糕的模型。它似乎一直低估了房价。给定这些w和b的值，让我们看看成本函数J关于w和b可能是什么样子。回顾上次我们看到的，当你只有一个参数w时，因为我们暂时将b设为0来简化事情，但我们得出了一个像这样的成本函数图，只作为w的函数。当我们只有一个参数w时，成本函数有这样的U形曲线，有点像汤碗。听起来很美味。

现在，在这个幻灯片上的房屋价格例子中，我们有两个参数，w和b。绘图变得有点复杂。事实证明，成本函数也有类似汤碗的形状，只是在三维空间而不是二维空间。事实上，根据你的训练集，成本函数会看起来像这样。对我来说，这看起来像一个汤碗，也许是因为我有点饿了，或者对你来说它看起来像一个弯曲的餐盘或吊床。实际上那听起来也很放松，还有你的椰子饮料。也许当你完成这门课程后，你应该给自己放个假，像这样在吊床上放松一下。你在这里看到的是一个3D表面图，轴标为w和b。当你改变w和b时，这两个模型的参数，你会得到成本函数J关于w和b的不同值。这很像你在上一个视频中看到的U形曲线，只是现在你有两个参数w和b作为输入到这个汤碗或这个吊床形状的函数J中。我只是想指出，表面上的任何单一点代表了w和b的某个特定选择。

例如，如果w是-10，b是-15，那么这个点上方表面的高点就是当w是-10且b是-15时j的值。现在，为了更仔细地查看特定点，有一种绘制成本函数J的方法对可视化很有用，那就是，我使用与之前完全相同的函数J。我没有改变函数J，而是使用所谓的等高线图来绘制它。如果你曾经看过显示不同山峰高度的地形图，地形图中的轮廓基本上是景观的水平切片，比如说一座山。这张图片是日本的富士山。我还记得我十几岁时和家人一起去参观富士山。那是美丽的景色。如果你直接飞越山顶，这就是那张等高线地图的样子。它显示了所有在同一高度的点，对于不同的高度。在这张幻灯片的底部是成本函数J的3D表面图。我知道它看起来不太像碗形，但实际上它确实是一个碗，只是非常拉伸了，这就是为什么它看起来像那样。在一个可选的实验室里，很快就会跟进，你将能够在3D中看到它并自己旋转表面，那时它会更明显地呈碗形。

接下来，在右上角这里是一个与底部所示相同的成本函数的等高线图。这个等高线图的两个轴是b，在垂直轴上，和w在水平轴上。这些椭圆形，也称为椭圆，显示的是3D表面上的中心点，它们处于完全相同的高度。换句话说，具有相同成本函数J值的一组点。为了得到等高线图，你取底部的3D表面，然后用一把刀水平切割它。你取那个3D表面的水平切片，得到所有在同一高度的点。因此，每个水平切片最终被显示为其中一个椭圆或一个卵形。具体来说，如果你取那个点、那个点和那个点，这三个点都有相同的成本函数J值，即使它们对于w和b有不同的值。在左上角的图中，你还可以看到这三个点对应于不同的函数f，实际上这三个函数在这个案例中都相当不适合预测房价。

现在，碗的底部，即成本函数J处于最小值的地方，就是这个点，位于这些同心椭圆的中心。如果你以前没见过很多等高线图，我想让你想象一下，如果你愿意的话，你正坐在飞机或火箭飞船上高高地飞过这个碗，直视着它。就好像你把电脑显示器平放在桌子上朝上，碗的形状直接从你的屏幕上升起，超出你的桌子。想象这个碗形状从你平放的电脑屏幕中长出来，这样每个椭圆都在你的屏幕上方有相同的高度，碗的最小值就在最小的椭圆中心的正下方。事实证明，等高线图是一种方便的方式来可视化3D成本函数J，但在某种程度上，它是以2D的方式绘制的。在这个视频中，你看到了3D碗形表面图也可以被可视化为等高线图。使用这种可视化方法，在接下来的视频中，让我们可视化线性回归模型中一些特定的w和b选择，这样你就能看到这些不同的选择如何影响你拟合数据的直线。让我们继续下一个视频。