深入了解NumPy

什么是NumPy? (What is NumPy?)

NumPy is an open-source Python library which is at the core of several of the science, engineering, and technology efforts you’ve heard of.

NumPy是一个开源Python库,它是您听说过的一些科学,工程和技术工作的核心。

Don’t believe me?

不相信我吗

Remember that black hole image? NumPy. Remember whenever scientists confirmed Einstein’s century-old prediction by detecting gravitational waves? NumPy. Machine Learning? You already know.

还记得那个黑洞形象吗? NumPy。 还记得每当科学家通过检测引力波来证实爱因斯坦的百年预测时吗? NumPy。 机器学习 ? 你已经知道了。

Image for post
NASA’s blackhole image
美国宇航局的黑洞图像

It’s a powerful thing. But what does it actually do?

这是很强大的事情。 但是它实际上是做什么的呢?

NumPy allows you to create multidimensional homogeneous arrays in Python, and do a whole collection of different mathematical operations with them.

NumPy允许您在Py​​thon中创建多维齐次数组,并使用它们进行一整套不同的数学运算。

An array is essentially just a list, and usually in our case, of numbers. “Multidimensional” in this sense means that you can have an array that is made up of an arrays of numbers. Which is also called an array of arrays or a 2D array. Or you could have an array which is made of arrays which are made up of arrays of numbers, or an array of arrays of arrays. (Real pretty, I know).

数组本质上只是一个列表,通常在我们的例子中是数字。 从这个意义上说,“多维”意味着您可以拥有一个由数字数组组成的数组。 也称为数组数组或2D数组。 或者,您可能有一个由数字数组或数组数组组成的数组,该数组由数字数组或数组组成。 (真漂亮,我知道)。

Also known as 3D. And so on forever.

也称为3D 。 以此类推。

All of the code for this blog post can be found on my GitHub.

可以在我的GitHub上找到此博客文章的所有代码。

Image for post
Photo by Scott Graham on Unsplash
Scott GrahamUnsplash拍摄的照片

为什么要使用它? (Why should you use it?)

1.功能 (1. Functionality)

One of the greatest advantages of the NumPy array is that there are many types of operations you can do with them. Anything from linear algebra, random number generation, to Fourier transforms. All of these operations are a part of the NumPy library and can easily be used without having to implement them yourself.

NumPy数组的最大优点之一是可以使用它们进行多种操作。 从线性代数,随机数生成到傅立叶变换,一应俱全。 所有这些操作都是NumPy库的一部分,可以轻松使用,而无需自己实现。

Just to list a few:

仅列举一些:

All of these and more can be found in the NumPy documentation.

所有这些以及更多内容都可以在NumPy文档中找到。

2.速度(运行时和开发) (2. Speed (of Runtime and Development))

Let’s first image a world where you just created a collection of numbers, from 0 to 999,999. But alas! You actually wanted a collection from 3 to 1,000,002! You’ve decided that the fastest way to go about this would be to just add 3 to every term in your current collection.

让我们首先想象一个您刚刚创建了一个从0到999,999的数字集合的世界。 可惜! 您实际上想要从3到1,000,002的收藏! 您已经决定,解决此问题的最快方法是在当前馆藏的每个术语中加3。

But should you use a list or an array? To find out, first you import the time module to compare times against lists and arrays, and then you define the size they will be.

但是,您应该使用列表还是数组? 为了找出答案,首先导入时间模块以将时间与列表和数组进行比较,然后定义它们的大小。

Then you run your experiment on the list. You note that it took around 138 ms and required looping through each element to add 3 to it.

然后,您在列表上运行实验。 您注意到,这花费了大约138毫秒,并且需要循环遍历每个元素才能向其中添加3。

Finally you run your experiment with a NumPy array. Not only did it take under 1 millisecond, it was also much easier to implement!

最后,您可以使用NumPy数组运行实验。 它不仅花费了不到1毫秒的时间,而且实施起来也容易得多!

Not only can NumPy arrays speed up the actual execution of your code in some scenarios, they are also much easier to program. Rather than having to loop through every element, NumPy was magically able to add an array (or vector) and a number (or scalar).

在某些情况下,NumPy数组不仅可以加快代码的实际执行速度,而且编程起来也容易得多。 不必遍历每个元素,NumPy就能神奇地添加一个数组(或向量)和一个数字(或标量)。

How did it accomplish this magic?

它是如何实现这一魔力的?

NumPy utilizes a concept called Array Broadcasting. This allows NumPy to deal with arithmetic operations between arrays of different shapes. In our case, the scalar value, 3, was “stretched” to an array of the same length as our array. That way, when they were added it could easily be done component-wise like two matrices.

NumPy利用称为数组广播的概念。 这使NumPy可以处理不同形状的数组之间的算术运算。 在我们的例子中,标量值3被“拉伸”到与数组长度相同的数组。 这样,当添加它们时,可以像两个矩阵一样轻松地逐个组件地完成。

Also, because NumPy arrays have to have homogeneous elements, this operation can be “Vectorized.” What this means is that the operation can be carried out with optimized, pre-compiled C code, which is much faster. This side-steps Python’s usual (and slow) requirement that it check the data type of each element in the list for each loop of the operation.

另外,由于NumPy数组必须具有齐次元素,因此该操作可以“向量化” 。 这意味着该操作可以使用优化的,预编译的C代码来执行,这要快得多。 这回避了Python的通常(缓慢)要求,即它针对操作的每个循环检查列表中每个元素的数据类型。

A similar process occurs if for example you had two arrays like so:

例如,如果您有两个像这样的数组,则会发生类似的过程:

The way to conceptualize this addition is much the same as before. Our 1 by 3 array was “stretched” so that it was the same size as our 3 by 3 array, meaning the computation actually looked like:

概念化此添加项的方法与以前几乎相同。 我们的1 x 3数组是“拉伸的”,因此它的大小与3 x 3数组相同,这意味着计算实际上如下所示:

3.储存 (3. Storage)

Another key area in which NumPy significantly beats out Python lists is in terms of storage.

在存储方面,NumPy明显胜过Python列表的另一个关键领域。

In Python, integers are objects with some added functionality over being just numbers. While this can be convenient, it also comes with the drawback of requiring more space in memory. Additionally, whenever they are stored in a list, rather than storing the numbers themselves, the list stores “pointers” which basically just tell Python where the numbers are in memory.

在Python中, 整数是对象,除了数字以外,还具有一些附加功能。 尽管这很方便,但它也具有需要更多内存空间的缺点。 另外,每当它们存储在列表中时,列表都会存储“ 指针 ”,而不是自己存储数字,它们基本上只是告诉Python数字在内存中的位置。

This comes with the drawback of requiring more space, for both the pointers and the extra functionality of the integers.

这样做的缺点是指针和整数的额外功能都需要更多空间。

NumPy arrays on the other hand allow you to specify the size of the numbers you are working with. That way, if you know that you are working with smaller numbers, you can save space by using less memory per element in the array. This can be specified with the “dtype” parameter, with types such as “int8” and “int32”, corresponding to 1 byte or 8 bits and 4 bytes or 32 bits.

另一方面,NumPy数组允许您指定要处理的数字的大小。 这样,如果您知道要使用较小的数字,则可以通过减少数组中每个元素的内存来节省空间。 可以使用“ dtype”参数指定类型,例如“ int8”和“ int32”,对应于1 字节或8位和4字节或32位。

Additionally, whenever a NumPy array is created, all of its elements are stored side-by-side in memory. This is known as contiguous memory. By contrast, the elements of a list are not stored contiguously, and the list’s pointers tell Python where they are in memory.

此外,无论何时创建NumPy数组,其所有元素都并排存储在内存中。 这就是所谓的连续内存 。 相比之下,列表的元素不是连续存储的,并且列表的指针告诉Python它们在内存中的位置。

Image for post
Contiguous vs non-contiguous memory
连续与非连续内存

This has two major implications.

这有两个主要含义。

  1. The extra lookup time to find elements of a list in memory decreases performance

    在内存中查找列表元素的额外查找时间会降低性能
  2. Contiguous memory can be easily stored in a smaller local memory, known as your cache, to make it even easier to access

    连续内存可以轻松存储在较小的本地内存(称为缓存)中,以使其更易于访问

你如何使用它? (How do you use it?)

The core object of NumPy is the ndarray, or N-dimensional array. In addition, you can optionally specify a datatype for the array,

NumPy的核心对象是ndarray或N维数组。 此外,您可以选择为数组指定数据类型

One of the key characteristic of these arrays are their shape, which specify the number of elements in each dimension.

这些数组的主要特征之一是它们的形状 它指定了每个维度中的元素数量。

There are several other ways that an array can be constructed:

几种其他的方式,一个阵列可以构造:

Additionally, an important and somewhat strange concept of the ndarray is the axis. Essentially, an axis is one of the dimensions of the array, just like the axes of a coordinate plane.

另外,ndarray的一个重要且有点奇怪的概念是axis 。 本质上,轴是数组的维度之一,就像坐标平面的轴一样。

Image for post
A Coordinate Plane
座标平面

These axes are numbered starting from 0, and go up to n — 1, where n is the number of dimensions of the array.

这些轴的编号从0开始,一直到n_1,其中n是数组的维数。

So for example, I can find the maximum along the 0th axis of an array with a shape of (2, 3, 3). The 0th axis in this case is the one which houses the two 3 by 3 arrays. The best way to visualize is it is stacking up both of the 3 by 3 arrays and looking at the elements that line up and seeing which is the largest.

因此,例如,我可以沿着形状为(2,3,3)的数组的第0轴找到最大值。 在这种情况下,第0轴是容纳两个3×3阵列的轴。 可视化的最佳方法是将3 x 3阵列都堆叠在一起,然后查看排列的元素并查看最大的元素。

结论 (Conclusion)

NumPy arrays are the foundation of modern Python data science. Not only do they allow for large data to be stored in a more efficient manner, they also have a plethora of functionality for any sort of project you are working on.

NumPy数组是现代Python数据科学的基础。 它们不仅允许以更有效的方式存储大数据,而且还为您正在从事的任何类型的项目提供了众多功能。

So the next time you open up a square bracket, consider an array. It may just make your job a whole lot easier.

因此,下次您打开方括号时,请考虑使用数组。 这可能会使您的工作变得容易得多。

And it sounds way cooler. Come on, list? Array is a much more sci-fi sounding name.

听起来很酷。 来吧, 清单? 数组是一个听起来更科幻的名字。

Thanks

谢谢

-Jordan

-约旦

Originally published at jordantwells.com

最初发布于 jordantwells.com

翻译自: https://medium.com/swlh/a-deep-dive-into-numpy-dbd032d6a1be

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值