机器学习学习笔记——1.1.2.1.3 Vectorization part 2（向量化——第2部分）

预见未来to50

于 2024-09-18 15:56:47 发布

阅读量465

点赞数 8

分类专栏：机器学习、深度学习（ML/DL) 文章标签：机器学习学习笔记

本文链接：https://blog.csdn.net/hpdlzu80100/article/details/142335957

版权

机器学习、深度学习（ML/DL) 专栏收录该内容

146 篇文章 12 订阅

订阅专栏

I remember when I first learned about vectorization, I spent many hours on my computer taking an un-vectorized version of an algorithm running it, see how long it run, and then running a vectorized version of the code and seeing how much faster that run, and I just spent hours playing with that. And it frankly blew my mind that the same algorithm vectorized would run so much faster. It felt almost like a magic trick to me. In this video, let's figure out how this magic trick really works.

Let's take a deeper look at how a vectorized implementation may work on your computer behind the scenes. Let's look at this for loop. The for loop like this runs without vectorization. If j ranges from 0 to say 15, this piece of code performs operations one after another. On the first timestamp which I'm going to write as t0. It first operates on the values at index 0. At the next time-step, it calculates values corresponding to index 1 and so on until the 15th step, where it computes that. In other words, it calculates these computations one step at a time, one step after another. In contrast, this function in NumPy is implemented in the computer hardware with vectorization. The computer can get all values of the vectors w and x, and in a single-step, it multiplies each pair of w and x with each other all at the same time in parallel.

Then after that, the computer takes these 16 numbers and uses specialized hardware to add them altogether very efficiently, rather than needing to carry out distinct additions one after another to add up these 16 numbers. This means that codes with vectorization can perform calculations in much less time than codes without vectorization. This matters more when you're running algorithms on large data sets or trying to train large models, which is often the case with machine learning. That's why being able to vectorize implementations of learning algorithms, has been a key step to getting learning algorithms to run efficiently, and therefore scale well to large datasets that many modern machine learning algorithms now have to operate on.

Now, let's take a look at a concrete example of how this helps with implementing multiple linear regression and this linear regression with multiple input features. Say you have a problem with 16 features and 16 parameters, w1 through w16, in addition to the parameter b. You calculate it 16 derivative terms for these 16 weights and codes, maybe you store the values of w and d in two np.arrays, with d storing the values of the derivatives. For this example, I'm just going to ignore the parameter b. Now, you want to compute an update for each of these 16 parameters. W_j is updated to w_j minus the learning rate, say 0.1, times d_j, for j from 1 through 16. Encodes without vectorization, you would be doing something like this. Update w1 to be w1 minus the learning rate 0.1 times d1, next, update w2 similarly, and so on through w16, updated as w16 minus 0.1 times d16. Encodes without vectorization, you can use a for loop like this for j in range 016, that again goes from 0-15, said w_j equals w_j minus 0.1 times d_j. In contrast, with factorization, you can imagine the computer's parallel processing hardware like this. It takes all 16 values in the vector w and subtracts in parallel, 0.1 times all 16 values in the vector d, and assign all 16 calculations back to w all at the same time and all in one step. In code, you can implement this as follows, w is assigned to w minus 0.1 times d.

Behind the scenes, the computer takes these NumPy arrays, w and d, and uses parallel processing hardware to carry out all 16 computations efficiently. Using a vectorized implementation, you should get a much more efficient implementation of linear regression. Maybe the speed difference won't be huge if you have 16 features, but if you have thousands of features and perhaps very large training sets, this type of vectorized implementation will make a huge difference in the running time of your learning algorithm. It could be the difference between codes finishing in one or two minutes, versus taking many hours to do the same thing. In the optional lab that follows this video, you see an introduction to one of the most used Python libraries and Machine Learning, which we've already touched on in this video called NumPy. You see how they create vectors encode and these vectors or lists of numbers are called NumPy arrays, and you also see how to take the dot product of two vectors using a NumPy function called dot. You also get to see how vectorized code such as using the dot function, can run much faster than a for-loop. In fact, you'd get to time this code yourself, and hopefully see it run much faster.

This optional lab introduces a fair amount of new NumPy syntax, so when you read through the optional lab, please still feel like you have to understand all the code right away, but you can save this notebook and use it as a reference to look at when you're working with data stored in NumPy arrays. Congrats on finishing this video on vectorization. You've learned one of the most important and useful techniques in implementing machine learning algorithms. In the next video, we'll put the math of multiple linear regression together with vectorization, so that you will influence gradient descent for multiple linear regression with vectorization. Let's go on to the next video.

我记得我第一次了解向量化时，我在电脑上花了好几个小时，运行一个未经向量化版本的算法，看看它运行多久，然后运行一个向量化版本的代码，看它运行得有多快。我只是花了几个小时玩这个。坦白说，同一个算法经过向量化后运行速度会如此之快，这让我大吃一惊。对我来说，这几乎就像是一个魔术把戏。在这段视频中，让我们弄清楚这个魔术把戏究竟是如何实现的。

让我们更深入地看看向量化实现如何在幕后在你的计算机上工作。让我们来看看这个for循环。这样的for循环没有使用向量化运行。如果j从0到15，这段代码将依次执行操作。在第一个时间戳t0，它首先处理索引0的值。在下一个时间步，它计算对应于索引1的值，以此类推，直到第15步，它计算那个值。换句话说，它一次一步地计算这些计算，一个接一个。相比之下，NumPy中的这个函数在计算机硬件中使用了向量化。计算机可以获取向量w和x的所有值，并在单个步骤中同时并行地将每个w和x对相乘。

接下来，计算机将这些数字加起来，使用专门的硬件非常高效地完成这一任务，而不是需要逐个进行不同的加法来累加这16个数字。这意味着使用向量化的代码可以在比未使用向量化代码少得多的时间内完成计算。当你在大型数据集上运行算法或尝试训练大型模型时，这一点尤其重要，这在机器学习中往往是常见的情况。这就是为什么能够将学习算法的实现向量化是使学习算法高效运行的关键步骤，因此能够很好地扩展到许多现代机器学习算法现在必须在其上运行的大型数据集。

现在，让我们来看一个具体的例子，看看这是如何帮助实现多元线性回归的。假设你有一个具有16个特征和16个参数的问题，从w1到w16，除了参数b之外。你为这16个权重和代码计算了16个导数项，也许你将w和d的值存储在两个np.arrays中，其中d存储导数值。在这个例子中，我将忽略参数b。现在，你想为这16个参数中的每个参数计算一个更新。W_j更新为w_j减去学习率，假设为0.1，乘以d_j，对于j从1到16。未经向量化编码，你会像这样操作。更新w1为w1减去学习率0.1乘以d1，接下来类似地更新w2，一直到w16，更新为w16减去0.1乘以d16。未经向量化编码，你可以像这样使用一个for循环：对于j在范围0到16，再次从0-15，设置w_j等于w_j减去0.1乘以d_j。相反，使用向量化，你可以想象计算机的并行处理硬件像这样工作。它获取向量w中的所有16个值，并同时并行减去0.1乘以向量d中的所有16个值，并将所有16个计算结果一次性全部赋值给w。在代码中，你可以这样实现：w被赋值为w减去0.1乘以d。

在幕后，计算机获取这些NumPy数组w和d，并使用并行处理硬件有效地执行所有16次计算。使用向量化实现，你应该会得到一个更高效的线性回归实现。也许如果你有16个特征，速度差异不会很大，但如果你有几万个特征，可能还有非常大的训练集，这种类型的向量化实现将在你的学习算法的运行时间上产生巨大的差异。这可能是代码在一两分钟内完成与花费数小时做同样事情之间的差别。在这个视频之后的可选实验室中，你会看到一个在机器学习中使用最广泛的Python库之一的介绍，我们在这段视频中已经提到了，叫做NumPy。你会看到他们如何创建向量编码，这些向量或数字列表被称为NumPy数组，你还可以看到如何使用一个名为dot的NumPy函数来计算两个向量的点积。你还会看到像使用dot函数这样的向量化代码如何比for循环运行得更快。实际上，你将亲自计时这段代码，希望看到它运行得更快。

这个可选的实验室介绍了相当多的NumPy新语法，所以当你阅读可选实验室时，请不要觉得你必须立即理解所有代码，但你可以保存这个笔记本，当你处理存储在NumPy数组中的数据时作为参考查看。恭喜你完成了关于向量化的视频。你已经学会了实现机器学习算法中最重用的最重要的技术之一。在接下来的视频中，我们将把多元线性回归的数学与向量化结合起来，以便你将影响梯度下降用于多元线性回归的向量化。让我们继续观看下一个视频。