numpy使用总结
1.numpy概述
numpy(numeric python)是一种开源的数值计算扩展库
牛逼之处在于它能节约计算时间和内存使用
https://numpy.org/doc/stable/reference/[官方文档]
2.numpy数组对象
NumPy 最重要的一个特点是其 N 维数组对象 ndarray,它是一系列同类型数据的集合,以 0 下标为开始进行集合中元素的索引。
ndarray 对象是用于存放同类型元素的多维数组。
import numpy as np
ndarr = np.arange(24).reshape(2,3,4) # 生成一个三维数组
ndarr
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
type(ndarr) # 查看类型
numpy.ndarray
ndarr.ndim # 查看数组维度个数
3
ndarr.shape # 查看数组维度
(2, 3, 4)
ndarr.size # 查看元素个数
24
ndarr.dtype # 查看元素类型
dtype('int32')
np.ones((3,4)) # 创建全0shuzu
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
np.empty((3,4)) # 创建全空数组
array([[8.46775816e-312, 2.81617418e-322, 0.00000000e+000,
0.00000000e+000],
[7.56587583e-307, 6.82116729e-043, 5.59150575e-091,
6.40179205e+170],
[1.00567909e-047, 5.15973668e-066, 6.48224660e+170,
4.93432906e+257]])
np.arange(1,20,5) # 创建一个从1开始,间隔5的数组
array([ 1, 6, 11, 16])
np.array([1,2,3,4],float) # 定义浮点类型
array([1., 2., 3., 4.])
np.ones((2,3),'float64') # 定义浮点类型64位
array([[1., 1., 1.],
[1., 1., 1.]])
df1 = np.arange(0,1,0.1)
df1
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
b = np.linspace(0,1,10)
b
array([0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])
c = np.linspace(0,1,10,endpoint = False)
c
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
d = np.logspace(0,1,5)
d
array([ 1. , 1.77827941, 3.16227766, 5.62341325, 10. ])
np.empty((2,3),np.int) # 创建一个2*3整型空矩阵,只分配内存
array([[ 179652464, 399, 0],
[ 0, 1, -2147483648]])
np.zeros(4,np.int) # 创建一长度为4,值全为0的矩阵
array([0, 0, 0, 0])
np.full(4,np.pi) # 创一个长度为4,值全为π的矩阵
array([3.14159265, 3.14159265, 3.14159265, 3.14159265])
def func(i):
print(i)
return i % 4 +1 # 表示取余
np.fromfunction(func,(10,)) # fromfunction第一个参数接收计算函数,第二个参数接收数组的形状。
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
array([1., 2., 3., 4., 1., 2., 3., 4., 1., 2.])
ndarray的元素具有相同的元素类型。常用的有int(整型),float(浮点型),
complex(复数型)
a = np.array([1,2,3,4],dtype = float)
a.dtype
dtype('float64')
b = np.array([[1,2,3,3],[4,5,6,6],[7,8,9,9]])
b.shape
(3, 4)
b.reshape((2,6)) # 重定义形状
array([[1, 2, 3, 3, 4, 5],
[6, 6, 7, 8, 9, 9]])
c = np.arange(10)
c
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
c[5] # 取第六个元素
5
c[0:2] # 取第一二个元素
array([0, 1])
c[:-1] # 取前10-1个元素
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
c[1:-1:2] # 从第二个元素开始,到最后前一个元素结束,每间隔2取一个元素
array([1, 3, 5, 7])
c[::-1] # 逆着取
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
c[2:4] = 100,1001 # 更改元素
c
array([ 0, 1, 100, 1001, 4, 5, 6, 7, 8, 9])
ndarray通过切片产生一个新的数组b,b和a共享同一块数据存储空间。
b = c[3:7]
b == c
C:\Users\kingS\anaconda3\lib\site-packages\ipykernel_launcher.py:2: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
False
a = np.arange(0,60,10).reshape(-1,1)+np.arange(0,6)
a
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
a1 = np.arange(0,60,10).reshape(-1,1)
a1.shape
(6, 1)
a[0,3:5]
array([3, 4])
a[2,2:4]
array([22, 23])
a[2::2,::2] # 从第2行开始,每间隔2取的行,再取列,从0列开始间隔2,取列。
array([[20, 22, 24],
[40, 42, 44]])
persontype = np.dtype({
'names':['name', 'age', 'weight'],
'formats':['S30','i', 'f']})
a = np.array([("Zhang", 32, 75.5), ("Wang", 24, 65.2)],
dtype=persontype)
print(a[0])
(b'Zhang', 32, 75.5)
3.ufunc函数
ufunc是universal function的简称,它是一种能对数组每个元素进
行运算的函数。NumPy的许多ufunc函数都是用C语言实现的,因此
它们的运算速度非常快。
1)四则运算
x = np.linspace(0,2*np.pi,10)
x
array([0. , 0.6981317 , 1.3962634 , 2.0943951 , 2.7925268 ,
3.4906585 , 4.1887902 , 4.88692191, 5.58505361, 6.28318531])
y = np.sin(x)
y
array([ 0.00000000e+00, 6.42787610e-01, 9.84807753e-01, 8.66025404e-01,
3.42020143e-01, -3.42020143e-01, -8.66025404e-01, -9.84807753e-01,
-6.42787610e-01, -2.44929360e-16])
值得注意的是,对于同等长度的ndarray,np.sin()比math.sin()快
但是对于单个数值,math.sin()的速度则更快。
a = np.arange(0,4)
b = np.arange(1,5)
np.add(a,b) # 加法
array([1, 3, 5, 7])
a+b
array([1, 3, 5, 7])
np.subtract(a,b) # 减法
array([-1, -1, -1, -1])
np.multiply(a,b) # 乘法
array([ 0, 2, 6, 12])
np.divide(a,b) # 除法
array([0. , 0.5 , 0.66666667, 0.75 ])
np.power(a,b) # 乘方
array([ 0, 1, 8, 81], dtype=int32)
np.array([1,2,3]) < np.array([3,2,1])
array([ True, False, False])
2)自定义ufunc函数
def num_judge(x, a): #对于一个数字如果是3或5的倍数就
if x%3 == 0: # 返回0,否则返回a。
r = 0
elif x%5 == 0:
r = 0
else:
r = a
return r
x = np.linspace(0,10,11)
print(x)
y = np.array([num_judge(t,2) for t in x]) # 列表生成表达式
y
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.]
array([0, 2, 2, 0, 2, 0, 0, 2, 2, 0, 0])
numb_judge = np.frompyfunc(num_judge, 2, 1)
numb_judge
<ufunc '? (vectorized)'>
y = numb_judge(x,2) # 因为最后输出的元素类型是object,所以我们还需要把它转换成整型。
y
array([0, 2, 2, 0, 2, 0, 0, 2, 2, 0, 0], dtype=object)
y.astype(np.int)
array([0, 2, 2, 0, 2, 0, 0, 2, 2, 0, 0])
3)广播
什么是广播
使用ufunc对两个数组进行运算时,ufunc函数会对两个数组的对应元素进
行运算。如果数组的形状不相同,就会进行下广播处理。
简而言之,就是向两个数组每一维度上的最大值靠齐。
a = np.arange(0, 60, 10).reshape(-1, 1)
b = np.arange(0, 5)
c = a + b
print(c.shape)
c
(6, 5)
array([[ 0, 1, 2, 3, 4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44],
[50, 51, 52, 53, 54]])
ogrid用来生成广播运算所用的数组。
x,y = np.ogrid[:5,:5]
x
array([[0],
[1],
[2],
[3],
[4]])
y
array([[0, 1, 2, 3, 4]])
4.numpy的函数库
1)随机数方法
from numpy import random as nr
np.set_printoptions(precision = 2) # 只显示两位小数
r1 = nr.rand(4,3)
r1
array([[0.23, 0.71, 0.41],
[0.74, 0.68, 0.2 ],
[0.62, 0.58, 0.35],
[0.61, 0.58, 0.08]])
r2 = nr.poisson(2,(3,4))
r2
array([[3, 1, 2, 2],
[2, 3, 0, 1],
[0, 1, 1, 2]])
2)常见统计量计算
np.random.seed(1) # 设置随机种子
a = np.random.randint(0,10,size=(4,5))
a
array([[5, 8, 9, 5, 0],
[0, 1, 7, 6, 9],
[2, 4, 5, 2, 4],
[2, 4, 7, 7, 9]])
print(np.sum(a,axis = 1)) # 行求和,对列操作
np.sum(a)
[27 23 17 29]
96
print(np.sum(a,axis = 1,keepdims = True)) # 爆出维度不变
[[27]
[23]
[17]
[29]]
3)大小排序
np.sort(a)
array([[0, 5, 5, 8, 9],
[0, 1, 6, 7, 9],
[2, 2, 4, 4, 5],
[2, 4, 7, 7, 9]])
np.sort(a,axis=0)# 默认从升序
array([[0, 1, 5, 2, 0],
[2, 4, 7, 5, 4],
[2, 4, 7, 6, 9],
[5, 8, 9, 7, 9]])
help(np.sort)
Help on function sort in module numpy:
sort(a, axis=-1, kind=None, order=None)
Return a sorted copy of an array.
Parameters
----------
a : array_like
Array to be sorted.
axis : int or None, optional
Axis along which to sort. If None, the array is flattened before
sorting. The default is -1, which sorts along the last axis.
kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
Sorting algorithm. The default is 'quicksort'. Note that both 'stable'
and 'mergesort' use timsort or radix sort under the covers and, in general,
the actual implementation will vary with data type. The 'mergesort' option
is retained for backwards compatibility.
.. versionchanged:: 1.15.0.
The 'stable' option was added.
order : str or list of str, optional
When `a` is an array with fields defined, this argument specifies
which fields to compare first, second, etc. A single field can
be specified as a string, and not all fields need be specified,
but unspecified fields will still be used, in the order in which
they come up in the dtype, to break ties.
Returns
-------
sorted_array : ndarray
Array of the same type and shape as `a`.
See Also
--------
ndarray.sort : Method to sort an array in-place.
argsort : Indirect sort.
lexsort : Indirect stable sort on multiple keys.
searchsorted : Find elements in a sorted array.
partition : Partial sort.
Notes
-----
The various sorting algorithms are characterized by their average speed,
worst case performance, work space size, and whether they are stable. A
stable sort keeps items with the same key in the same relative
order. The four algorithms implemented in NumPy have the following
properties:
=========== ======= ============= ============ ========
kind speed worst case work space stable
=========== ======= ============= ============ ========
'quicksort' 1 O(n^2) 0 no
'heapsort' 3 O(n*log(n)) 0 no
'mergesort' 2 O(n*log(n)) ~n/2 yes
'timsort' 2 O(n*log(n)) ~n/2 yes
=========== ======= ============= ============ ========
.. note:: The datatype determines which of 'mergesort' or 'timsort'
is actually used, even if 'mergesort' is specified. User selection
at a finer scale is not currently available.
All the sort algorithms make temporary copies of the data when
sorting along any but the last axis. Consequently, sorting along
the last axis is faster and uses less space than sorting along
any other axis.
The sort order for complex numbers is lexicographic. If both the real
and imaginary parts are non-nan then the order is determined by the
real parts except when they are equal, in which case the order is
determined by the imaginary parts.
Previous to numpy 1.4.0 sorting real and complex arrays containing nan
values led to undefined behaviour. In numpy versions >= 1.4.0 nan
values are sorted to the end. The extended sort order is:
* Real: [R, nan]
* Complex: [R + Rj, R + nanj, nan + Rj, nan + nanj]
where R is a non-nan real value. Complex values with the same nan
placements are sorted according to the non-nan part if it exists.
Non-nan values are sorted as before.
.. versionadded:: 1.12.0
quicksort has been changed to `introsort <https://en.wikipedia.org/wiki/Introsort>`_.
When sorting does not make enough progress it switches to
`heapsort <https://en.wikipedia.org/wiki/Heapsort>`_.
This implementation makes quicksort O(n*log(n)) in the worst case.
'stable' automatically chooses the best stable sorting algorithm
for the data type being sorted.
It, along with 'mergesort' is currently mapped to
`timsort <https://en.wikipedia.org/wiki/Timsort>`_
or `radix sort <https://en.wikipedia.org/wiki/Radix_sort>`_
depending on the data type.
API forward compatibility currently limits the
ability to select the implementation and it is hardwired for the different
data types.
.. versionadded:: 1.17.0
Timsort is added for better performance on already or nearly
sorted data. On random data timsort is almost identical to
mergesort. It is now used for stable sort while quicksort is still the
default sort if none is chosen. For timsort details, refer to
`CPython listsort.txt <https://github.com/python/cpython/blob/3.7/Objects/listsort.txt>`_.
'mergesort' and 'stable' are mapped to radix sort for integer data types. Radix sort is an
O(n) sort instead of O(n log n).
.. versionchanged:: 1.17.0
NaT now sorts to the end of arrays for consistency with NaN.
Examples
--------
>>> a = np.array([[1,4],[3,1]])
>>> np.sort(a) # sort along the last axis
array([[1, 4],
[1, 3]])
>>> np.sort(a, axis=None) # sort the flattened array
array([1, 1, 3, 4])
>>> np.sort(a, axis=0) # sort along the first axis
array([[1, 1],
[3, 4]])
Use the `order` keyword to specify a field to use when sorting a
structured array:
>>> dtype = [('name', 'S10'), ('height', float), ('age', int)]
>>> values = [('Arthur', 1.8, 41), ('Lancelot', 1.9, 38),
... ('Galahad', 1.7, 38)]
>>> a = np.array(values, dtype=dtype) # create a structured array
>>> np.sort(a, order='height') # doctest: +SKIP
array([('Galahad', 1.7, 38), ('Arthur', 1.8, 41),
('Lancelot', 1.8999999999999999, 38)],
dtype=[('name', '|S10'), ('height', '<f8'), ('age', '<i4')])
Sort by age, then height if ages are equal:
>>> np.sort(a, order=['age', 'height']) # doctest: +SKIP
array([('Galahad', 1.7, 38), ('Lancelot', 1.8999999999999999, 38),
('Arthur', 1.8, 41)],
dtype=[('name', '|S10'), ('height', '<f8'), ('age', '<i4')])
r = np.abs(np.random.randn(100000))
np.percentile(r, [68.3, 95.4, 99.7])
array([1.01, 2. , 2.97])
4)统计函数
np.random.seed(42)
a = np.random.randint(0, 8, 10)
a
array([2, 6, 2, 2, 7, 4, 3, 7, 7, 2])
np.unique(a) # 统计唯一元素个数
array([2, 3, 4, 6, 7])
unique有两个参数,return_index=True同时返回原始数组中
的下标,return_inverse=True表示原始数据在新数组的下标
x,index = np.unique(a,return_index=True)
print(x)
index
[2 3 4 6 7]
array([0, 6, 5, 1, 4], dtype=int64)
a[index]
array([2, 3, 4, 6, 7])
x, rindex = np.unique(a, return_inverse=True)
print(x)
rindex
[2 3 4 6 7]
array([0, 3, 0, 0, 4, 2, 1, 4, 4, 0], dtype=int64)
bincount()对非负整数数组中的各个元素出现的次数进行统
计,返回数组中的第i个元素是整数i出现的次数。
a = np.array([6, 3, 4, 6, 2, 7, 4, 4, 6, 1])
a
array([6, 3, 4, 6, 2, 7, 4, 4, 6, 1])
np.bincount(a) # 0-7八个数,结果有八个,第一个0出现了0次,1出现了1次,倒数第7位是6,出现了3次!
array([0, 1, 1, 1, 3, 0, 3, 1], dtype=int64)
np.arange(5)
array([0, 1, 2, 3, 4])
help(np.bincount)
Help on function bincount in module numpy:
bincount(...)
bincount(x, weights=None, minlength=0)
Count number of occurrences of each value in array of non-negative ints.
The number of bins (of size 1) is one larger than the largest value in
`x`. If `minlength` is specified, there will be at least this number
of bins in the output array (though it will be longer if necessary,
depending on the contents of `x`).
Each bin gives the number of occurrences of its index value in `x`.
If `weights` is specified the input array is weighted by it, i.e. if a
value ``n`` is found at position ``i``, ``out[n] += weight[i]`` instead
of ``out[n] += 1``.
Parameters
----------
x : array_like, 1 dimension, nonnegative ints
Input array.
weights : array_like, optional
Weights, array of the same shape as `x`.
minlength : int, optional
A minimum number of bins for the output array.
.. versionadded:: 1.6.0
Returns
-------
out : ndarray of ints
The result of binning the input array.
The length of `out` is equal to ``np.amax(x)+1``.
Raises
------
ValueError
If the input is not 1-dimensional, or contains elements with negative
values, or if `minlength` is negative.
TypeError
If the type of the input is float or complex.
See Also
--------
histogram, digitize, unique
Examples
--------
>>> np.bincount(np.arange(5))
array([1, 1, 1, 1, 1])
>>> np.bincount(np.array([0, 1, 1, 3, 2, 1, 7]))
array([1, 3, 1, 1, 0, 0, 0, 1])
>>> x = np.array([0, 1, 1, 3, 2, 1, 7, 23])
>>> np.bincount(x).size == np.amax(x)+1
True
The input array needs to be of integer dtype, otherwise a
TypeError is raised:
>>> np.bincount(np.arange(5, dtype=float))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: array cannot be safely cast to required type
A possible use of ``bincount`` is to perform sums over
variable-size chunks of an array, using the ``weights`` keyword.
>>> w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights
>>> x = np.array([0, 1, 1, 2, 2, 2])
>>> np.bincount(x, weights=w)
array([ 0.3, 0.7, 1.1])
x = np.array([0 , 1, 2, 2, 1, 1, 0])
w = np.array([0.1, 0.3, 0.2, 0.4, 0.5, 0.8, 1.2])
np.bincount(x, w) # 0 出现了2次,权重=0.1+1.2
array([1.3, 1.6, 0.6])
histogram()对以为数组进行直方图统计,其参数为:
histogram(a, bins=10, range=None, weights=None)
函数返回两个一维数组,hist是每个区间的统计结果,
bin_edges返回区间的边界值。
a = np.random.rand(100)
np.histogram(a, bins=5, range=(0, 1))
(array([27, 20, 18, 18, 17], dtype=int64),
array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]))
a
array([0.02, 0.97, 0.83, 0.21, 0.18, 0.18, 0.3 , 0.52, 0.43, 0.29, 0.61,
0.14, 0.29, 0.37, 0.46, 0.79, 0.2 , 0.51, 0.59, 0.05, 0.61, 0.17,
0.07, 0.95, 0.97, 0.81, 0.3 , 0.1 , 0.68, 0.44, 0.12, 0.5 , 0.03,
0.91, 0.26, 0.66, 0.31, 0.52, 0.55, 0.18, 0.97, 0.78, 0.94, 0.89,
0.6 , 0.92, 0.09, 0.2 , 0.05, 0.33, 0.39, 0.27, 0.83, 0.36, 0.28,
0.54, 0.14, 0.8 , 0.07, 0.99, 0.77, 0.2 , 0.01, 0.82, 0.71, 0.73,
0.77, 0.07, 0.36, 0.12, 0.86, 0.62, 0.33, 0.06, 0.31, 0.33, 0.73,
0.64, 0.89, 0.47, 0.12, 0.71, 0.76, 0.56, 0.77, 0.49, 0.52, 0.43,
0.03, 0.11, 0.03, 0.64, 0.31, 0.51, 0.91, 0.25, 0.41, 0.76, 0.23,
0.08])
5) 操作多维数组
a = np.arange(3)
b = np.arange(10, 13)
a
array([0, 1, 2])
b
array([10, 11, 12])
v = np.vstack((a,b))
v # 按第一轴连接数组
array([[ 0, 1, 2],
[10, 11, 12]])
h = np.hstack((a, b)) # 按第0轴连接数组
h
array([ 0, 1, 2, 10, 11, 12])
c = np.column_stack((a, b)) # 按列连接多个一维数组
c
array([[ 0, 10],
[ 1, 11],
[ 2, 12]])
a = np.array([6, 3, 7, 4, 6, 9, 2, 6, 7, 4, 3, 7])
b = np.array([ 1, 3, 6, 9, 10])
np.split(a, b) # 按元素位置进行分段
[array([6]),
array([3, 7]),
array([4, 6, 9]),
array([2, 6, 7]),
array([4]),
array([3, 7])]
a = np.array([1.0, 0, -2, 1])
p = np.poly1d(a)
print (type(p))
<class 'numpy.poly1d'>
p(np.array([1,1,1]))
array([0., 0., 0.])
多项式函数可以进行四则运算,其中运算的列表自动化成多项
式函数。