NumPy的学习历程

最新推荐文章于 2024-08-23 17:06:02 发布

lzq.Vector

最新推荐文章于 2024-08-23 17:06:02 发布

阅读量529

点赞数 2

文章标签： numpy

本文链接：https://blog.csdn.net/qq_43462118/article/details/84399374

版权

学习网站

学习网站为[官方文档](https://docs.scipy.org/doc/numpy-1.15.1/user/quickstart.html)[NumPy官方快速入门教程（译)](https://juejin.im/post/5a76d2c56fb9a063557d8357)

关于NumPy

NumPy的主要对象是同构多维数组，其中维度被称为**axes**。**axes**的数量被称为**rank**。 NumPy的数组类为**ndarray**亦可被称为**array**，与python的标准库里面的**array.array**不同，标准库只能处理一维数组，且功能更少。

NumPy的几个属性

>>> import numpy as np
>>> array = np.array([[1,2,3],[2,3,4]])
>>> array
array([[1, 2, 3],
       [2, 3, 4]])
 >>> type(array)
<class 'numpy.ndarray'>

ndarry.ndim：数组的维度大小

>>> array.ndim
2

ndarry.shape：对于一个 n 行 m 列的矩阵。shape 是 (n, m)。

>>> array.shape
(2, 3)

ndarry.size：数组元素的总和，即n*m。

>>> array.size
6

ndarray.dtype：数组中描述元素类型的一个对象。

>>> array.dtype
dtype('int32')

ndarray.itemsize：数组中每个元素所占字节数。

>>> array.itemsize
4

ndarray.data：数组实际元素的缓存区。通常来说，我们不需要使用这个属性，因为我们会使用索引的方式访问数据。

>>> array.data
<memory at 0x00000195F3CC6990>

创建数组

>>> import numpy as np
>>> a = np.array([(1.5,2,3), (4,5,6)])
>>>> a
array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

数组类型可在创建时指定：

>>> b = np.array([[1,2],[3,4]],dtype = complex)
>>> b
array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])
>>> c = np.array([[1,2],[3,4]],dtype = np.int16)
>>> c
array([[1, 2],
       [3, 4]], dtype=int16)

zeros：创建全为0的数组。
ones：创建全为1的数组。
empty：创建一个随机数组。默认类型float64。

>>> np.zeros((3,4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
>>> np.ones((3,4))
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> np.empty((2,3))
array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

arrange函数：创建数字序列。

>>> np.arange(10,30,5)
array([10, 15, 20, 25])

linspace函数：当参数的浮点型时最好用这个。其中第三个变量不是步长而是元素数量。

>>> np.linspace(0,2,9)
array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

还有很多函数，比如zeros_like, ones_like, empty, empty_like, numpy.random.rand, numpy.random.randn, fromfunction, fromfile

>>> y = np.arange(3, dtype=float)
>>> y
array([ 0.,  1.,  2.])
>>> np.zeros_like(y)
array([ 0.,  0.,  0.])
>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],
       [ 0.37601032,  0.25528411], 
       [ 0.49313049,  0.94909878]])
>>> np.random.random_sample()
0.47108547995356098
>>> type(np.random.random_sample())
<type 'float'>
>>> np.random.random_sample((5,))
array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428])
>>> np.fromfunction(lambda i, j: i == j, (3, 3), dtype=int)
array([[ True, False, False],
       [False,  True, False],
       [False, False,  True]])

打印数组

**reshape**：改变数组形状。

>>> a = np.arange(6)  
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3)       
>>> print(b)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4)
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

如果一个数组太大而不能被打印，那么 NumPy 会自动忽略中间的只打印角上的数据。为了取消这种行为，强制 NumPy 去打印整个数组，你可以通过 set_printoptions 改变打印选项

>>> print(np.arange(10000).reshape(100,100))
[[   0    1    2 ...   97   98   99]
 [ 100  101  102 ...  197  198  199]
 [ 200  201  202 ...  297  298  299]
 ...
 [9700 9701 9702 ... 9797 9798 9799]
 [9800 9801 9802 ... 9897 9898 9899]
 [9900 9901 9902 ... 9997 9998 9999]]
>>> np.set_printoptions(threshold=np.nan)
>>> print(np.arange(10000).reshape(100,100))
[[   0    1    2    3    4    5    6    7    8    9   10   11   12   13
    14   15   16   17   18   19   20   21   22   23   24   25   26   27
    28   29   30   31   32   33   34   35   36   37   38   39   40   41
    42   43   44   45   46   47   48   49   50   51   52   53   54   55
    56   57   58   59   60   61   62   63   64   65   66   67   68   69
    70   71   72   73   74   75   76   77   78   79   80   81   82   83
    84   85   86   87   88   89   90   91   92   93   94   95   96   97
    98   99]
 [ 100  101  102  103  104  105  106  107  108  109  110  111  112  113
   114  115  116  117  118  119  120  121  122  123  124  125  126  127
   128  129  130  131  132  133  134  135  136  137  138  139  140  141
   142  143  144  145  146  147  148  149  150  151  152  153  154  155
   156  157  158  159  160  161  162  163  164  165  166  167  168  169
   170  171  172  173  174  175  176  177  178  179  180  181  182  183
   184  185  186  187  188  189  190  191  192  193  194  195  196  197
   198  199]
 [ 200  201  202  203  204  205  206  207  208  209  210  211  212  213
   214  215  216  217  218  219  220  221  222  223  224  225  226  227
   228  229  230  231  232  233  234  235  236  237  238  239  240  241
   242  243  244  245  246  247  248  249  250  251  252  253  254  255
   256  257  258  259  260  261  262  263  264  265  266  267  268  269
   270  271  272  273  274  275  276  277  278  279  280  281  282  283
   284  285  286  287  288  289  290  291  292  293  294  295  296  297
   298  299]
   ...
#以上是我手动省略

基本操作

矩阵中+-*/运算是作用于每个元素，然后用结果填充进一个新的数组。 +=，*=是直接在原数组上进行修改，不会创建新数组。

>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])
>>> a<35
array([ True, True, False, False], dtype=bool)

在NumPy中矩阵乘法是通过dot函数实现的。

>>> A = np.array( [[1,1],
...             [0,1]] )
>>> B = np.array( [[2,0],
...             [3,4]] )
>>> A*B            
array([[2, 0],
       [0, 4]])
>>> A.dot(B)         
array([[5, 4],
       [3, 4]])
>>> np.dot(A, B)       
array([[5, 4],
       [3, 4]])

其中A.dot(B) 和np.dot(A, B)是等价的。

sum min cumsum

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

可以通过改变axis的参数来将操作作用于具体的axis。在二维数组中axis=0对列操作，axis=1对行操作。

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> b.sum(axis=0)                            # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1)                            # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1)                         # cumulative sum along each row
array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

数学上的函数

[all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.all.html#numpy.all)

索引，切片和迭代

一维数组可以被索引，切片和迭代，就像列表和其他Python序列一样。

>>> a = np.arange(10)**3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
>>> a[ : :-1]                                 # reversed a
array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])
>>> for i in a:
...     print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0

多维数组对于每个 axis 都有一个索引，这些索引用逗号分隔。

>>> def f(x,y):
...     return 10*x+y
...
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])

当提供的索引少于 axis 的数量时，缺失的索引按完全切片考虑。

>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

迭代多维数组是对第一 axis 进行的。

>>> for row in b:
...     print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

然而，如果你想模拟对数组中每一个元素的操作，你可以使用 flat 属性，它是一个 iterator，能够遍历数组中每一个元素。

>>> for element in b.flat:
...     print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

改变数组形状

**numpy.ndarray.shape**：获取数组的当前形状。

>>> x = np.array([1, 2, 3, 4])
>>> x.shape
(4,)
>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
>>> y.shape = (3, 6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
>>> np.zeros((4,2))[::2].shape = (-1,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: incompatible shape for a non-contiguous array

reshape：上文已提到。
numpy.resize（a，new_shape ）：返回具有指定形状的新数组。如果新数组大于原始数组，则新数组将填充a的重复副本。请注意，这种行为是从a.resize（new_shape）与零，而不是重复副本填充不同的一个。

>>> a=np.array([[0,1],[2,3]])
>>> np.resize(a,(2,3))
array([[0, 1, 2],
       [3, 0, 1]])
>>> np.resize(a,(1,4))
array([[0, 1, 2, 3]])
>>> np.resize(a,(2,4))
array([[0, 1, 2, 3],
       [0, 1, 2, 3]])

numpy.ravel（a，order =‘C’ ）：官方文档

不同数组的组合

numpy.hstack：按顺序堆叠数组（列式）。
stack：沿新轴加入一系列数组。
vstack：垂直堆叠数组（行方式）。
dstack：按顺序深度堆叠阵列（沿第三轴）。
concatenate：沿现有轴加入一系列数组。
hsplit：沿第二轴拆分阵列。
block：从块组装数组。

分割数组

numpy.split（ary，indices_or_sections，axis = 0 ）：将数组拆分为多个子数组。
array_split：将数组拆分为多个大小相等或接近相等的子数组。如果不能进行相等的除法，则不会引发异常。
hsplit：将数组水平拆分为多个子数组（按列）。
vsplit：将数组垂直拆分为多个子数组（按行方式）。
dsplit：沿第3轴（深度）将数组拆分为多个子阵列。
concatenate：沿现有轴加入一系列数组。
stack：沿新轴加入一系列数组。
hstack：按顺序堆叠数组（列式）。
vstack：垂直堆叠数组（行方式）。
dstack：按顺序深度堆叠阵列（沿第三维）。

>>> x = np.arange(16.0).reshape(4, 4)
>>> x
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.],
       [ 12.,  13.,  14.,  15.]])
>>> np.hsplit(x, 2)
[array([[  0.,   1.],
       [  4.,   5.],
       [  8.,   9.],
       [ 12.,  13.]]),
 array([[  2.,   3.],
       [  6.,   7.],
       [ 10.,  11.],
       [ 14.,  15.]])]
>>> np.hsplit(x, np.array([3, 6]))
[array([[  0.,   1.,   2.],
       [  4.,   5.,   6.],
       [  8.,   9.,  10.],
       [ 12.,  13.,  14.]]),
 array([[  3.],
       [  7.],
       [ 11.],
       [ 15.]]),

Copies and Views

不拷贝的情况：
简单的赋值
函数调用

>>> a = np.arange(12)
>>> b = a            # no new object is created
>>> b is a           # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4    # changes the shape of a
>>> a.shape
(3, 4)

>>> def f(x):
...     print(id(x))
...
>>> id(a)                           # id is a unique identifier of an object
148293216
>>> f(a)
148293216

View：创建了一个相同数据的新数组对象。

>>> c = a.view()
>>> c is a
False
>>> c.base is a                        # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6                      # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234                      # a's data changes
>>> a
array([[   0,    1,    2,    3],
       [1234,    5,    6,    7],
       [   8,    9,   10,   11]])

切片数组返回一个 view。

copy：完全拷贝数组。

>>> d = a.copy()                          # a new array object with new data is created
>>> d is a
False
>>> d.base is a                           # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[   0,   10,   10,    3],
       [1234,   10,   10,    7],
       [   8,   10,   10,   11]])

函数和方法综述

Less Basic

Broadcasting：https://docs.scipy.org/doc/numpy-1.15.1/user/basics.broadcasting.html
1.如果所有输入的数组都没有相同的维度数字，那么将会重复地用 1 去加在较小的数组形状上直到所有的数组有相同的维度数字。
2.确保沿着特定维度大小为 1 的数组就像沿着这个维度最大维数大小一样的，假设数组元素的值在广播数组的维度是相同的。应用广播规则后，所有数组大小不必须匹配。

花式索引和索引技巧

1.用索引数组索引

>>> a = np.arange(12)**2                       # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] )              # an array of indices
>>> a[i]                                       # the elements of a at the positions i
array([ 1,  1,  9, 64, 25])
>>>
>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] )      # a bidimensional array of indices
>>> a[j]                                       # the same shape as j
array([[ 9, 16],
       [81, 49]])

当数组 a 是多维的，单个数组指向数组 a 的第一维。
我们可以给超过一维的索引。数组每个维度的索引形状必须一样。

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array( [ [0,1],                        # indices for the first dim of a
...                 [1,2] ] )
>>> j = np.array( [ [2,1],                        # indices for the second dim
...                 [3,3] ] )
>>>
>>> a[i,j]                                     # i and j must have equal shape
array([[ 2,  5],
       [ 7, 11]])
>>>
>>> a[i,2]
array([[ 2,  6],
       [ 6, 10]])
>>>
>>> a[:,j]                                     # i.e., a[ : , j]
array([[[ 2,  1],
        [ 3,  3]],
       [[ 6,  5],
        [ 7,  7]],
       [[10,  9],
        [11, 11]]])

另一个常用数组索引是查询时间相关系列的最大值。

>>> time = np.linspace(20, 145, 5)                 # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4)      # 4 time-dependent series
>>> time
array([  20.  ,   51.25,   82.5 ,  113.75,  145.  ])
>>> data
array([[ 0.        ,  0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849, -0.54402111, -0.99999021],
       [-0.53657292,  0.42016704,  0.99060736,  0.65028784],
       [-0.28790332, -0.96139749, -0.75098725,  0.14987721]])
>>>
>>> ind = data.argmax(axis=0)                   # index of the maxima for each series
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ ind]                       # times corresponding to the maxima
>>>
>>> data_max = data[ind, xrange(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>>
>>> time_max
array([  82.5 ,   20.  ,  113.75,   51.25])
>>> data_max
array([ 0.98935825,  0.84147098,  0.99060736,  0.6569866 ])
>>>
>>> np.all(data_max == data.max(axis=0))
True

使用数组索引对数组进行赋值：

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])

Tips： Python 把 “a+=1” 等价于 “a=a+1”。

用布尔数组索引
简单的例子：

>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b                                          # b is a boolean with a's shape
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)
>>> a[b]                                       # 1d array with the selected elements
array([ 5,  6,  7,  8,  9, 10, 11])

把大于4的数等于零

>>> a[b] = 0                                   # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

使用布尔索引去生成 Mandelbrot set 图像

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> def mandelbrot( h,w, maxit=20 ):
...     """Returns an image of the Mandelbrot fractal of size (h,w)."""
...     y,x = np.ogrid[ -1.4:1.4:h*1j, -2:0.8:w*1j ]
...     c = x+y*1j
...     z = c
...     divtime = maxit + np.zeros(z.shape, dtype=int)
...
...     for i in range(maxit):
...         z = z**2 + c
...         diverge = z*np.conj(z) > 2**2            # who is diverging
...         div_now = diverge & (divtime==maxit)  # who is diverging now
...         divtime[div_now] = i                  # note when
...         z[diverge] = 2                        # avoid diverging too much
...
...     return divtime
>>> plt.imshow(mandelbrot(400,400))
>>> plt.show()

import numpy as np
import matplotlib.pyplot as plt

def mandelbrot(h, w, maxit=20):
    y, x = np.ogrid[-1.4:1.4:h * 1j, -2:0.8:w * 1j]
    c = x + y * 1j
    z = c
    divtime = maxit + np.zeros(z.shape, dtype=int)
    for i in range(maxit):
        z = z ** 2 + c
        diverge = z * np.conj(z) > 2 ** 2  # who is diverging
        div_now = diverge & (divtime == maxit)  # who is diverging now
        divtime[div_now] = i  # note when
        z[diverge] = 2  # avoid diverging too much
    return divtime

plt.imshow(mandelbrot(400, 400))
plt.show()

在这里插入图片描述

ix_() 函数：ix_ 可以组合不同向量去获得对于每一个 n-uplet 的结果。
使用字符串索引：https://docs.scipy.org/doc/numpy-1.15.1/user/basics.rec.html#structured-arrays

线性代数

包括转置、求逆、解矩阵等操作。

>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1.  2.]
 [ 3.  4.]]

>>> a.transpose()
array([[ 1.,  3.],
       [ 2.,  4.]])

>>> np.linalg.inv(a)
array([[-2. ,  1. ],
       [ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1.,  0.],
       [ 0.,  1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> np.dot (j, j) # matrix product
array([[-1.,  0.],
       [ 0., -1.]])

>>> np.trace(u)  # trace
2.0

>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
       [ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j,  0.-1.j]), array([[ 0.70710678+0.j        ,  0.70710678-0.j        ],
       [ 0.00000000-0.70710678j,  0.00000000+0.70710678j]]))