基础
- numpy的数据对象--同类型的多维数组
- 用tuple来索引
数组类--ndarray
- 跟普通python中的array完全不同
- ndarray.ndim:数组axes的数目
- ndarray.shape:数组的形状,其中,len(ndarray.shape)==ndarray.ndim
- ndarray.size:数组中所有元素的个数
- ndarray.dtype:数组中元素类型对象
- ndarray.itemsize:数组中单个元素以字节计算的大小
- ndarray.data:数组中实际存储的数据
举例
In [1]:
import numpy as np
In [2]:
a = np.arange(15).reshape(3, 5) a
Out[2]:
array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]])
In [3]:
a.shape
Out[3]:
(3, 5)
In [4]:
a.ndim
Out[4]:
2
In [5]:
a.dtype.name, a.itemsize
Out[5]:
('int32', 4)
In [6]:
a.size
Out[6]:
15
In [7]:
type(a)
Out[7]:
numpy.ndarray
In [8]:
b = np.array([6, 7, 8]) b
Out[8]:
array([6, 7, 8])
In [9]:
type(b)
Out[9]:
numpy.ndarray
创建数组
In [10]:
a = np.array([2, 3, 4]) a
Out[10]:
array([2, 3, 4])
In [11]:
a.dtype
Out[11]:
dtype('int32')
In [12]:
b = np.array([1.2, 3.0, 5.8]) b.dtype
Out[12]:
dtype('float64')
In [13]:
len(b)
Out[13]:
3
In [14]:
len(b.shape)
Out[14]:
1
- np.array()接受的参数是 列表、元组
- 输出的是 ndarray
In [15]:
a = np.array(1, 2, 3, 4) # 错误案例
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-15-b0e1f2fde9e3> in <module>() ----> 1 a = np.array(1, 2, 3, 4) # 错误案例 ValueError: only 2 non-keyword arguments accepted
In [16]:
a = np.array([1, 2, 3, 4]) a
Out[16]:
array([1, 2, 3, 4])
In [17]:
a = np.array((1, 2, 3, 4)) a
Out[17]:
array([1, 2, 3, 4])
- array将两层嵌套的序列转化成2维数组,n层嵌套转化成n维数组
In [18]:
b = np.array([(1.5, 2, 3), (4, 5, 6)]) b
Out[18]:
array([[1.5, 2. , 3. ], [4. , 5. , 6. ]])
- 可以创建指定的数据类型
In [19]:
c = np.array([[1, 2], [3, 4]], dtype = complex) c
Out[19]:
array([[1.+0.j, 2.+0.j], [3.+0.j, 4.+0.j]])
实际工作中的重要应用
- 避免动态创建数组,要用占位符
- zeros():所有元素都是0
- ones():所有元素都是1
- empty():所有元素都是随机的
- 默认的dtype=float64
举例
In [20]:
np.zeros((3, 4)) # 参数是形状
Out[20]:
array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])
In [21]:
np.ones((3, 4))
Out[21]:
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
In [22]:
np.empty((3, 4)) # 是随机的,注意!!!
Out[22]:
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
- arange生成数组序列
In [23]:
np.arange(10, 30, 5) # 第一位是开始, 第二位是结束, 第三位是步长
Out[23]:
array([10, 15, 20, 25])
In [24]:
np.arange(0, 2, 0.3)
Out[24]:
array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
- 这时候用等差数列linspace()函数能更好的实现同一功能
In [25]:
np.linspace(0, 2, 9) # 第一位开始, 第二位结束, 第三位分成相等的几部分
Out[25]:
array([0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
In [26]:
from numpy import pi x = np.linspace(0, 2*pi, 100) f = np.sin(x) f
Out[26]:
array([ 0.00000000e+00, 6.34239197e-02, 1.26592454e-01, 1.89251244e-01, 2.51147987e-01, 3.12033446e-01, 3.71662456e-01, 4.29794912e-01, 4.86196736e-01, 5.40640817e-01, 5.92907929e-01, 6.42787610e-01, 6.90079011e-01, 7.34591709e-01, 7.76146464e-01, 8.14575952e-01, 8.49725430e-01, 8.81453363e-01, 9.09631995e-01, 9.34147860e-01, 9.54902241e-01, 9.71811568e-01, 9.84807753e-01, 9.93838464e-01, 9.98867339e-01, 9.99874128e-01, 9.96854776e-01, 9.89821442e-01, 9.78802446e-01, 9.63842159e-01, 9.45000819e-01, 9.22354294e-01, 8.95993774e-01, 8.66025404e-01, 8.32569855e-01, 7.95761841e-01, 7.55749574e-01, 7.12694171e-01, 6.66769001e-01, 6.18158986e-01, 5.67059864e-01, 5.13677392e-01, 4.58226522e-01, 4.00930535e-01, 3.42020143e-01, 2.81732557e-01, 2.20310533e-01, 1.58001396e-01, 9.50560433e-02, 3.17279335e-02, -3.17279335e-02, -9.50560433e-02, -1.58001396e-01, -2.20310533e-01, -2.81732557e-01, -3.42020143e-01, -4.00930535e-01, -4.58226522e-01, -5.13677392e-01, -5.67059864e-01, -6.18158986e-01, -6.66769001e-01, -7.12694171e-01, -7.55749574e-01, -7.95761841e-01, -8.32569855e-01, -8.66025404e-01, -8.95993774e-01, -9.22354294e-01, -9.45000819e-01, -9.63842159e-01, -9.78802446e-01, -9.89821442e-01, -9.96854776e-01, -9.99874128e-01, -9.98867339e-01, -9.93838464e-01, -9.84807753e-01, -9.71811568e-01, -9.54902241e-01, -9.34147860e-01, -9.09631995e-01, -8.81453363e-01, -8.49725430e-01, -8.14575952e-01, -7.76146464e-01, -7.34591709e-01, -6.90079011e-01, -6.42787610e-01, -5.92907929e-01, -5.40640817e-01, -4.86196736e-01, -4.29794912e-01, -3.71662456e-01, -3.12033446e-01, -2.51147987e-01, -1.89251244e-01, -1.26592454e-01, -6.34239197e-02, -2.44929360e-16])
打印数组
In [27]:
a = np.arange(6) print(a)
[0 1 2 3 4 5]
In [28]:
b = np.arange(12).reshape(3, 4) print(b)
[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
In [29]:
c = np.arange(24).reshape(2, 3, 4) print(c)
[[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]]
In [30]:
print(np.arange(10000))
[ 0 1 2 ... 9997 9998 9999]
In [31]:
print(np.arange(10000).reshape(100, 100))
[[ 0 1 2 ... 97 98 99] [ 100 101 102 ... 197 198 199] [ 200 201 202 ... 297 298 299] ... [9700 9701 9702 ... 9797 9798 9799] [9800 9801 9802 ... 9897 9898 9899] [9900 9901 9902 ... 9997 9998 9999]]
元素的操作
- 数组上的算术操作都是对元素的操作
- 通俗讲,就是分别对每一个元素进行的单独操作
In [32]:
a = np.array([20, 30, 40, 50]) b = np.arange(4) c = a - b c
Out[32]:
array([20, 29, 38, 47])
In [33]:
b ** 2
Out[33]:
array([0, 1, 4, 9], dtype=int32)
In [34]:
10 * np.sin(a)
Out[34]:
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
In [35]:
a < 35
Out[35]:
array([ True, True, False, False])
- 线代中 * 表示矩阵乘法,numpy中的dot函数,加减乘除都是元素级别的
In [36]:
A = np.array([[1, 1], # numpy中* 以及加减乘除都表示元素之间的操作 [0, 1]]) #矩阵的乘法要用dot()函数 B = np.array([[2, 0], [3, 4]]) A * B
Out[36]:
array([[2, 0], [0, 4]])
In [37]:
A.dot(B)
Out[37]:
array([[5, 4], [3, 4]])
In [38]:
np.dot(A, B)
Out[38]:
array([[5, 4], [3, 4]])
- 上面的操作都是自动创建了一个新数组来进行的,+=和*=会直接改变当前数组
In [39]:
a = np.ones((2, 3), dtype = int) b = np.random.random((2, 3)) a *= 3 a
Out[39]:
array([[3, 3, 3], [3, 3, 3]])
In [40]:
b += a #这是类型转换会失效,因为b是float更加精确 b
Out[40]:
array([[3.05399059, 3.53474061, 3.04725233], [3.29667869, 3.76725366, 3.94770914]])
In [41]:
a = np.random.random((2, 3)) a
Out[41]:
array([[0.99457762, 0.072168 , 0.31688902], [0.58258851, 0.87938576, 0.99625231]])
In [42]:
print(a.sum(), a.min(), a.max())
3.8418612204967566 0.07216800177955685 0.996252312303801
- 如果有轴,也可以指定某一个轴axis来计算
In [43]:
b = np.arange(12).reshape(3, 4) b
Out[43]:
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
In [44]:
np.ndim(b)
Out[44]:
2
In [45]:
b.sum(axis = 0) # 横坐标的每一个点对应的值的和
Out[45]:
array([12, 15, 18, 21])
In [46]:
b.min(axis = 1) # 纵坐标对应的一个集合中的最小值b
Out[46]:
array([0, 4, 8])
In [47]:
b.cumsum(axis = 1) # 每一行的累加和
Out[47]:
array([[ 0, 1, 3, 6], [ 4, 9, 15, 22], [ 8, 17, 27, 38]], dtype=int32)
通用函数
In [48]:
a = np.arange(10) a
Out[48]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [49]:
np.exp(a)
Out[49]:
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03])
In [50]:
np.sqrt(a)
Out[50]:
array([0. , 1. , 1.41421356, 1.73205081, 2. , 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
In [51]:
a = np.arange(3) b = np.array([2., -1., 4.]) np.add(a, b)
Out[51]:
array([2., 0., 6.])
- 还有很多,如:all, any, apply_along_axis, argmax, argmin, argsort, average, bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, transpose, var, vdot, vectorize, where
切片
In [52]:
a = np.arange(10) ** 3 a
Out[52]:
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729], dtype=int32)
In [53]:
a[2]
Out[53]:
8
In [54]:
a[2: 5]
Out[54]:
array([ 8, 27, 64], dtype=int32)
In [55]:
a[: 6: 2] = -100 a
Out[55]:
array([-100, 1, -100, 27, -100, 125, 216, 343, 512, 729], dtype=int32)
In [56]:
a = np.arange(10) a[ : : -1]
Out[56]:
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
In [57]:
for i in a[::-1]: print(i)
9 8 7 6 5 4 3 2 1 0
- 多维数组每一个坐标轴axis都有一个索引(index),各个索引用,号分隔
In [58]:
def foo(x, y): return x + y b = np.fromfunction(foo, (5, 4), dtype = int) b
Out[58]:
array([[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]])
In [59]:
b[2, 3]
Out[59]:
5
In [60]:
b[:5, 1]
Out[60]:
array([1, 2, 3, 4, 5])
In [61]:
b[0:2, 1]
Out[61]:
array([1, 2])
In [62]:
b[1, :]
Out[62]:
array([1, 2, 3, 4])
In [63]:
b[-1] # 如果索引的数目不够,numpy会自动补全
Out[63]:
array([4, 5, 6, 7])
- 其他补全功能
-
比如,我们现在有一个拥有五个轴(axis)的数组:
-
x[1,2,...] 等价于 x[1,2,:,:,:]
- x[...,3] 等价于 x[:,:,:,:,3]
- x[4,...,5,:] 等价于 x[4,:,:,5,:]
In [64]:
c = np.array([[[0, 1, 2], [5, 6, 7]], [[10, 11, 13], [16, 18, 19]]]) c.shape
Out[64]:
(2, 2, 3)
In [65]:
np.ndim(c)
Out[65]:
3
In [66]:
c[...,2]
Out[66]:
array([[ 2, 7], [13, 19]])
- 遍历是从第一个轴开始,从外向内逐层遍历
In [67]:
for row in b: print(row)
[0 1 2 3] [1 2 3 4] [2 3 4 5] [3 4 5 6] [4 5 6 7]
- flat属性可以将数据打平
In [68]:
for element in b.flat: print(element)
0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
- 还可以了解Indexing,Indexing(reference) , newaxis, ndenumerate, indices
数组形状操作
In [69]:
a = np.floor(10*np.random.random((3, 4))) a
Out[69]:
array([[0., 1., 7., 2.], [5., 8., 0., 9.], [3., 2., 7., 0.]])
In [70]:
a.shape
Out[70]:
(3, 4)
In [71]:
a.ravel() # 将数组打平
Out[71]:
array([0., 1., 7., 2., 5., 8., 0., 9., 3., 2., 7., 0.])
In [72]:
a.reshape(6,2)
Out[72]:
array([[0., 1.], [7., 2.], [5., 8.], [0., 9.], [3., 2.], [7., 0.]])
In [73]:
a.T #转秩
Out[73]:
array([[0., 5., 3.], [1., 8., 2.], [7., 0., 7.], [2., 9., 0.]])
In [74]:
a.T.shape, a.shape
Out[74]:
((4, 3), (3, 4))
- reshape不会改变原数组, ndarray.resize方法会改变原数组
In [75]:
a
Out[75]:
array([[0., 1., 7., 2.], [5., 8., 0., 9.], [3., 2., 7., 0.]])
In [76]:
a.resize((6,2)) a
Out[76]:
array([[0., 1.], [7., 2.], [5., 8.], [0., 9.], [3., 2.], [7., 0.]])
- reshape的时候如果只想考虑某一维度,另一维度让python自己考虑,就可以设置为-1
In [77]:
a.reshape(3, -1)
Out[77]:
array([[0., 1., 7., 2.], [5., 8., 0., 9.], [3., 2., 7., 0.]])
- 还有其他:ndarray.shape, reshape, resize, ravel
堆叠数组
In [78]:
a = np.floor(10 * np.random.random((2, 2))) a
Out[78]:
array([[3., 3.], [5., 9.]])
In [79]:
b = np.floor(10 * np.random.random((2, 2))) b
Out[79]:
array([[5., 2.], [2., 6.]])
In [80]:
np.vstack((a, b))
Out[80]:
array([[3., 3.], [5., 9.], [5., 2.], [2., 6.]])
In [81]:
np.hstack((a, b))
Out[81]:
array([[3., 3., 5., 2.], [5., 9., 2., 6.]])
- column_stack可以将一维数组作为一列插入一个二维数组
In [82]:
c = np.array([3, 3]) np.column_stack((a, c))
Out[82]:
array([[3., 3., 3.], [5., 9., 3.]])
切分数组
In [83]:
a = np.floor(10 * np.random.random((2, 12))) a
Out[83]:
array([[0., 6., 2., 9., 2., 2., 2., 0., 8., 4., 3., 1.], [9., 9., 3., 1., 4., 6., 7., 1., 5., 9., 9., 0.]])
In [84]:
np.hsplit(a, 3) # 切成三个数组
Out[84]:
[array([[0., 6., 2., 9.], [9., 9., 3., 1.]]), array([[2., 2., 2., 0.], [4., 6., 7., 1.]]), array([[8., 4., 3., 1.], [5., 9., 9., 0.]])]
In [85]:
np.hsplit(a, (3, 4)) # 在第三列和第四列进行切分,从1开始计数
Out[85]:
[array([[0., 6., 2.], [9., 9., 3.]]), array([[9.], [1.]]), array([[2., 2., 2., 0., 8., 4., 3., 1.], [4., 6., 7., 1., 5., 9., 9., 0.]])]
拷贝和视图
- 简单赋值不会发生拷贝
In [86]:
a = np.arange(12) b =a b is a
Out[86]:
True
In [87]:
b.shape = 3, 4 # 这时候a和b指向同一个地方,修改a会修改b a.shape
Out[87]:
(3, 4)
- 函数调用的时候不会发生拷贝
In [88]:
def foo(x): # 这时候x和a就是同一个地址 print(id(x)) print(id(a)) foo(a)
90393600 90393600
- 视图和浅拷贝
- 视图是建立一个数组对象,使不同数据对象可以共享相同的数据
In [89]:
c = a.view() # 下面说明了只有c中的值是a的值 c is a
Out[89]:
False
In [90]:
c.base is a
Out[90]:
True
In [91]:
c.flags.owndata # 说明c中的数据不是他自己的
Out[91]:
False
In [92]:
c.shape = 2, 6 # 视图不会改变原来的形状 a.shape
Out[92]:
(3, 4)
In [93]:
c[0, 4] = 1234 # a中的数据会改变 a
Out[93]:
array([[ 0, 1, 2, 3], [1234, 5, 6, 7], [ 8, 9, 10, 11]])
In [94]:
c
Out[94]:
array([[ 0, 1, 2, 3, 1234, 5], [ 6, 7, 8, 9, 10, 11]])
- 以上说明了视图是一个可以操作指向性状的一个指向数据的类似指针的数据结构
- 对一个数组切片,返回视图
In [95]:
s = a[:, 1: 3] s[:] = 10 a
Out[95]:
array([[ 0, 10, 10, 3], [1234, 10, 10, 7], [ 8, 10, 10, 11]])
- 深拷贝
In [96]:
d = a.copy() d is a
Out[96]:
False
In [97]:
d.base is a
Out[97]:
False
In [98]:
d[0, 0] = 999 d
Out[98]:
array([[ 999, 10, 10, 3], [1234, 10, 10, 7], [ 8, 10, 10, 11]])
In [99]:
a
Out[99]:
array([[ 0, 10, 10, 3], [1234, 10, 10, 7], [ 8, 10, 10, 11]])
广播规则
- 多个不同纬度的数组操作的时候
- 维度不够就补充到大体一致
- 数据不够就沿着维度拷贝
索引
In [100]:
a = np.arange(12)**2 i = np.array([1, 1, 3, 8, 5]) a[i]
Out[100]:
array([ 1, 1, 9, 64, 25], dtype=int32)
In [101]:
j = np.array([[3, 4], [9, 7]]) a[j]
Out[101]:
array([[ 9, 16], [81, 49]], dtype=int32)
In [102]:
palette = np.array([[0, 0, 0], [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 255, 255]]) image = np.array([[0, 1, 2, 0], [0, 3, 4, 0]]) palette[image]
Out[102]:
array([[[ 0, 0, 0], [255, 0, 0], [ 0, 255, 0], [ 0, 0, 0]], [[ 0, 0, 0], [ 0, 0, 255], [255, 255, 255], [ 0, 0, 0]]])
In [103]:
a = np.arange(12).reshape(3, 4) i = np.array( [ [0,1], # 数组a的第一个维度 [1,2] ] ) j = np.array( [ [2,1], # 数组a的第二个维度 [3,3] ] ) a
Out[103]:
array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
In [104]:
a[i, j]
Out[104]:
array([[ 2, 5], [ 7, 11]])
In [105]:
a[i, 2]
Out[105]:
array([[ 2, 6], [ 6, 10]])
In [106]:
a[:, j]
Out[106]:
array([[[ 2, 1], [ 3, 3]], [[ 6, 5], [ 7, 7]], [[10, 9], [11, 11]]])
- 寻找最大值
In [107]:
time = np.linspace(20, 145, 5) data = np.sin(np.arange(20)).reshape(5, 4) time
Out[107]:
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
In [108]:
data
Out[108]:
array([[ 0. , 0.84147098, 0.90929743, 0.14112001], [-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ], [ 0.98935825, 0.41211849, -0.54402111, -0.99999021], [-0.53657292, 0.42016704, 0.99060736, 0.65028784], [-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
In [109]:
ind = data.argmax(axis = 0) # 列方向 ind
Out[109]:
array([2, 0, 3, 1], dtype=int32)
In [110]:
time_max = time[ind] time_max
Out[110]:
array([ 82.5 , 20. , 113.75, 51.25])
In [111]:
a = np.arange(5) a
Out[111]:
array([0, 1, 2, 3, 4])
线性代数基本操作
In [112]:
a = np.array([[1.0, 2.0], [3.0, 4.0]]) print(a)
[[1. 2.] [3. 4.]]
In [113]:
a.transpose() # 转秩
Out[113]:
array([[1., 3.], [2., 4.]])
In [114]:
np.linalg.inv(a) # 矩阵求逆
Out[114]:
array([[-2. , 1. ], [ 1.5, -0.5]])
In [115]:
u = np.eye(2) # 单位矩阵 u
Out[115]:
array([[1., 0.], [0., 1.]])
In [116]:
j = np.array([[0.0, -1.0], [1.0, 0.0]])
In [117]:
np.dot(i, j)
Out[117]:
array([[ 1., 0.], [ 2., -1.]])
In [118]:
np.trace(u)
Out[118]:
2.0
- 自动变形
In [119]:
a = np.arange(30) # -1表示不关心,系统自己算出来 a.shape = 2, -1, 3 a.shape
Out[119]:
(2, 5, 3)
直方图
In [120]:
mu, sigma = 2, 0.5 v = np.random.normal(mu, sigma, 10000) import matplotlib.pyplot as plt plt.hist(v, bins = 50, normed = 1) plt.show()
In [121]:
(n, bins) = np.histogram(v, bins = 50, normed = True) plt.plot(.5 * (bins[1: ] + bins[ :-1]), n) plt.show()