南派五式： Numpy 从入门到浑水摸鱼_南派数组是不是-CSDN博客

本文链接：https://blog.csdn.net/bigbro/article/details/106456346

本文介绍了numpy库在数据科学中的重要性，并通过南派五式详细讲解了numpy的使用，包括数组的创建、属性和方法，以及坐标和选择等操作。读者将能掌握numpy的基本用法，为数据处理打下坚实基础。

摘要由CSDN通过智能技术生成

0. 序

作为众多python库的基础，numpy在数据科学的地位就跟大米在南方人的饮食里的地位一样重要。作为一个线性代数库，因为与c语言的库有捆绑，所以numpy运行起来速度极快。大师曰：猿欲搞data sci，必先玩numpy.
这篇blog简要地列出一些常见地玩法，玩家过目之后，基本上可以一斑见豹，举一反三，触类旁通。

1. 南派第一式：起势

欲用numpy，先导入：

import numpy as np

2. 南派第二式：数组

2.1 Create arrays from a python list

python list：

my_list = [1,2,3]

Creat array from python list.
一维(一个数字串)：

array_1d = np.array(my_list)

二维（一个数字串的串）：

array_2d = np.array([my_list, my_list])

2.2 Create arrays from Built-in methods

2.2.1 arrange 函数

创建array([0,2,4,6,8,10])

np.arrange(0,11,2)

2.2.2 zeros and ones 函数

1d array:

np.zeros(3)
np.ones(3)

2D array:

np.zeros(5,4)
np.ones(5,4)

2.2.3 linspace函数

Return evenly spaced numbers over a specified interval

np.linspace(0,12,4)

–> array([0, 4, 8 ,12])

2.2.4 eye 函数

Return identity matrices:

np.eye(5)

2.2.5 rand and randn 函数

Create random number arrays.

2.2.5.1 rand

[0,1) 之间的随机数

numpy.random.rand(d0,d1,…,dn)
rand函数根据给定维度生成[0,1)之间的数据，包含0，不包含1
dn表格每个维度
返回值为指定维度的array
作者：leenard
链接：https://www.jianshu.com/p/214798dd8f93
来源：简书

np.random.rand(2)
np.random.rand(5,5)

2.2.5.2 randn

符合正态分布的随机数

.numpy.random.randn(d0,d1,…,dn)
randn函数返回一个或一组样本，具有标准正态分布。
dn表格每个维度
返回值为指定维度的arra
作者：leenard
链接：https://www.jianshu.com/p/214798dd8f93
来源：简书

np.random.randn(2)
np.random.randn(5,5)

2.2.5.3 randint

numpy.random.randint(low, high=None, size=None, dtype=‘l’)
返回随机整数，范围区间为[low,high），包含low，不包含high
参数：low为最小值，high为最大值，size为数组维度大小，dtype为数据类型，默认的数据类型是np.int
high没有填写时，默认生成随机数的范围是[0，low)
作者：leenard
链接：https://www.jianshu.com/p/214798dd8f93
来源：简书

输出一个数

np.random.randint(1,100)

输出十个数

np.random.randint(1,100,10)

2.2.5.4 random_integers

numpy.random.random_integers(low, high=None, size=None)
返回随机整数，范围区间为[low,high]，包含low和high
参数：low为最小值，high为最大值，size为数组维度大小
high没有填写时，默认生成随机数的范围是[1，low]
该函数在最新的numpy版本中已被替代，建议使用randint函数
作者：leenard
链接：https://www.jianshu.com/p/214798dd8f93
来源：简书

2.2.5.5 numpy.random.seed()

np.random.seed()的作用：使得随机数据可预测。
当我们设置相同的seed，每次生成的随机数相同。如果不设置seed，则每次会生成不同的随机数
作者：leenard
链接：https://www.jianshu.com/p/214798dd8f93
来源：简书

更多关于random的信息可以参考 https://www.jianshu.com/p/214798dd8f93，作者写的很详细

3. 南派第三式：数组的属性和方法

3.1 方法

基本上跟matlab reshape函数相似，不多说

array.reshape(num of rows, num of cols)

3.2 max, min, argmax, argmin 方法

These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

array.max()  array.min()

3.3 shape 属性

array.shape

3.4 dtype 属性

array.dtype

4. 南派第四式：坐标和选择（indexing and selection）

4.1 基本坐标和选择

对于一维数组 array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])，我们可以看下一下命令的输出：

arr[8] --> 8
arr[1:5]-->  array([1, 2, 3, 4])

基本上和python 串的坐标引用一致

对于二维数组

arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

arr_2d[1] 会输出第二行
arr_2d[1][0] 会输出20，即第二行，第一列
arr_2d[:2,1:] 会输出 array([[10, 15], [25, 30]])

4.2 broadcasting

4.2.1 Numpy中，broadcasting是什么意思？

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.
— Broadcasting, SciPy.org

In the context of deep learning, we also use some less conventional notation. We allow the addition of matrix and a vector, yielding another matrix: C = A + b, where Ci,j = Ai,j + bj. In other words, the vector b is added to each row of the matrix. This shorthand eliminates the need to define a matrix with b copied into each row before doing the addition. This implicit copying of b to many locations is called broadcasting.
— Page 34, Deep Learning, 2016.

4.2.2 Numpy中，为什么要用Broadcasting这种算法？

arrays with different sizes cannot be added, subtracted, or generally be used in arithmetic.
A way to overcome this is to duplicate the smaller array so that it is the dimensionality and size as the larger array. This is called array broadcasting and is available in NumPy when performing array arithmetic, which can greatly reduce and simplify your code.
-----https://machinelearningmastery.com/broadcasting-with-numpy-arrays/

大意就是不同大小的数组是无法直接进行加减乘除这些算术的（多维与一维的运算除外，如np.array([1,2,3])+np.array([2])= array([3, 4, 5])），因此可以从大数组里取需要的那一块，跟小数组进行算术运算。同时为了效率考虑，取出来的那个小数组，会直接引用大数组存储的地址，最终，小数组的重新赋值会直接改变大数组对应位置的数据。

4.2.3 实例：

arr = np.array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])
slice_of_arr = arr[0:6]
slice_of_arr[:]=99
arr

此时，arr已经也随着slice_of_arr的重新赋值，变成了：
arr–>array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])

如果继续改变新数组的值：

slice_of_arr[:]=88

arr会重新变成：
array([88, 88, 88, 88, 88, 88, 6, 7, 8, 9, 10])

同样，修改arr，slice_of_arr也会随之改变

arr[0:6] = 55
slice_of_arr

slice_of_arr --> array([55, 55, 55, 88, 88, 88])

但是一旦对arr重新赋值定义，那么两者的联系就会中断：

arr = np.arange(0,11)
slice_of_arr[:]=22
arr

arr --> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
这是因为，一旦重新定义，arr会指向新的地址，跟之前的地址就脱钩了。

4.3 fancy indexing

花样选择功能

arr2d = np.array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.,  5.],
       [ 6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.,  6.],
       [ 7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.,  7.],
       [ 8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.,  8.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.]])

arr2d[[2,4,6,8]]:
array([[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 8., 8., 8., 8., 8., 8., 8., 8., 8., 8.]])
arr2d[[6,4,2,7]]
array([[ 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.],
[ 4., 4., 4., 4., 4., 4., 4., 4., 4., 4.],
[ 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[ 7., 7., 7., 7., 7., 7., 7., 7., 7., 7.]])

arr2d[arr2d>2] 结果会输出一维数组，而且只保留大于2的元素
array([3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 4., 4., 4., 4., 4., 4., 4.,
4., 4., 4., 5., 5., 5., 5., 5., 5., 5., 5., 5., 5., 6., 6., 6., 6.,
6., 6., 6., 6., 6., 6., 7., 7., 7., 7., 7., 7., 7., 7., 7., 7., 8.,
8., 8., 8., 8., 8., 8., 8., 8., 8., 9., 9., 9., 9., 9., 9., 9., 9.,
9., 9.])