Numpy基础:数组和矢量计算
创建N维数组
创建数组最简单的办法就是使用array函数。
import numpy as np
data1=[6,22,3.3,2]
arr1=np.array(data1)
data2=[[1,2,3,4],[5,6,7,8]]
In [8]: arr1
Out[8]: array([ 6. , 22. , 3.3, 2. ])
In [9]: arr2
Out[9]:array([[1, 2, 3, 4],
[5, 6, 7, 8]]) //嵌套序列将会被转换成为多维数组
并且,除非显示说明,np.array将会为新建的数组推断出一个较为合适的数据类型。保存在dtype对象中。
In [11]: arr1.dtype
Out[11]: dtype('float64')
In [12]: arr2.dtype
Out[12]: dtype('int32')
另外,还可以用np.zeros或np.ones创建全0或全1数组,empty可以创建一个没有任何具体值的数组。
In [13]: np.zeros(5)
Out[13]: array([ 0., 0., 0., 0., 0.])
In [15]: np.ones((3,3))
Out[15]:
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
In [16]: np.empty((2,3,2))
Out[16]:
array([[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]],
[[ 0., 0.],
[ 0., 0.],
[ 0., 0.]]])
可以用astype显式地转换其dtype:
In [17]: arr1.dtype
Out[17]: dtype('float64')
In [18]: int_arr1=arr1.astype(np.int64)
In [19]: int_arr1
Out[19]: array([ 6, 22, 3, 2], dtype=int64) //之前的小数部分会被截断
astype也可以将字符串数组转换为数值数组,代码略。
数组与标量之间的运算
数组可以使我们不用编写循环即可对数据执行批量运算,这通常叫做矢量化(vectorization)。
In [20]: arr2
Out[20]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
In [21]: arr2 * arr2
Out[21]:
array([[ 1, 4, 9, 16],
[25, 36, 49, 64]])
In [22]: arr2 +1
Out[22]:
array([[2, 3, 4, 5],
[6, 7, 8, 9]])
In [23]: 1 / arr2
Out[23]:
array([[ 1. , 0.5 , 0.33333333, 0.25 ],
[ 0.2 , 0.16666667, 0.14285714, 0.125 ]])
In [24]: arr2 ** 0.5
Out[24]:
array([[ 1. , 1.41421356, 1.73205081, 2. ],
[ 2.23606798, 2.44948974, 2.64575131, 2.82842712]])
基本的索引和切片
索引比较简单,和python本身的索引差不多。
切片
In [25]: arr=np.arange(10)
In [26]: arr
Out[26]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [27]: arr[5:8]=12
In [28]: arr
Out[28]: array([ 0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
数组切片是原始数组的视图,也就是说一旦操作,源数据直接改变。
In [29]: arr_slice=arr[5:8]
In [30]: arr_slice[1]=111
In [31]: arr
Out[31]: array([ 0, 1, 2, 3, 4, 12, 111, 12, 8, 9])
In [32]: arr_slice[:]=4444
In [33]: arr
Out[33]: array([ 0, 1, 2, 3, 4, 4444, 4444, 4444, 8, 9])
如果想要的是副本而非视图,那就要进行复制操作,例如:
arr[5:8].copy()
多维数组的索引,比如二维数组
In [34]: arr2d=np.array([[1,2,3],[4,5,6],[7,8,9]])
In [35]: arr2d[2]
Out[35]: array([7, 8, 9])
In [36]: arr2d[0][2]
Out[36]: 3
In [37]: arr2d[0,2]
Out[37]: 3
三维数组
In [38]: arr3d=np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
In [39]: arr3d[0]
Out[39]:
array([[1, 2, 3],
[4, 5, 6]])
标量值和数组都可以被赋值给arr3d[0]
In [40]: old_values=arr3d[0].copy()
In [41]: arr3d[0]=45
In [42]: arr3d
Out[42]:
array([[[45, 45, 45],
[45, 45, 45]],
[[ 7, 8, 9],
[10, 11, 12]]])
In [43]: arr3d[0]=old_values
In [44]: arr3d
Out[44]:
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
切片索引,高维度对象可以在一个或多个轴上进行切片,也可以跟整数索引一起使用。
In [45]: arr2d
Out[45]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [46]: arr2d[:2]
Out[46]:
array([[1, 2, 3],
[4, 5, 6]])
In [47]: arr2d[:2,1:]
Out[47]:
array([[2, 3],
[5, 6]])
In [48]: arr2d[1,:2]
Out[48]: array([4, 5])
In [49]: arr2d[2,:1]
Out[49]: array([7])
In [50]: arr2d[:,:1] 只有冒号表示选取整个轴
Out[50]:
array([[1],
[4],
[7]])
In [51]: arr2d[:2,1:]=0 对切片表达式的赋值
In [52]: arr2d
Out[52]:
array([[1, 0, 0],
[4, 0, 0],
[7, 8, 9]])
布尔型索引
In [54]: data=np.random.randn(5,5)
In [55]: data
Out[55]:
array([[ 0.26224992, 0.97018499, 0.22580213, -1.21175716, -1.41655148],
[-0.91801291, 0.9588066 , -1.4228044 , -0.93916245, 0.50487793],
[ 1.26572253, -0.31677449, -0.04173863, 0.28175939, 0.36777067],
[-0.85381682, 0.39739235, 0.23002012, -0.08400604, -0.61019238],
[-0.06159692, -0.67428044, 0.2520452 , -0.52615204, -0.26562721]])
In [56]: data[data<0]=0
In [57]: data
Out[57]:
array([[ 0.26224992, 0.97018499, 0.22580213, 0. , 0. ],
[ 0. , 0.9588066 , 0. , 0. , 0.50487793],
[ 1.26572253, 0. , 0. , 0.28175939, 0.36777067],
[ 0. , 0.39739235, 0.23002012, 0. , 0. ],
[ 0. , 0. , 0.2520452 , 0. , 0. ]])
In [58]: names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
In [59]: names == 'Bob'
Out[59]: array([ True, False, False, True, False, False, False], dtype=bool)
In [61]: data[names == 'Bob']
Out[61]:
array([[ 0.26224992, 0.97018499, 0.22580213, 0. , 0. ],
[ 0. , 0.39739235, 0.23002012, 0. , 0. ]])
花式索引
索引值为行数
In [63]: arr = np.empty((8,4))
In [64]: for i in range(8):
...: arr[i]=i
...:
In [65]: arr
Out[65]:
array([[ 0., 0., 0., 0.],
[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 3., 3., 3., 3.],
[ 4., 4., 4., 4.],
[ 5., 5., 5., 5.],
[ 6., 6., 6., 6.],
[ 7., 7., 7., 7.]])
In [66]: arr[[4,3,0,4]]
Out[66]:
array([[ 4., 4., 4., 4.],
[ 3., 3., 3., 3.],
[ 0., 0., 0., 0.],
[ 4., 4., 4., 4.]])
In [67]: arr[[-3,-4,-1,-7]]
Out[67]:
array([[ 5., 5., 5., 5.],
[ 4., 4., 4., 4.],
[ 7., 7., 7., 7.],
[ 1., 1., 1., 1.]])
In [68]: arr = np.arange(32).reshape((8,4))
In [69]: arr
Out[69]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
In [70]: arr[[1,4,5,5],[2,2,1,0]] 索引值坐标为(1,2)(4,2)(5,1)(5,0)
Out[70]: array([ 6, 18, 21, 20])
数组转置和轴对换
In [71]: arr
Out[71]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]])
In [72]: arr.T 数组的简单转置
Out[72]:
array([[ 0, 4, 8, 12, 16, 20, 24, 28],
[ 1, 5, 9, 13, 17, 21, 25, 29],
[ 2, 6, 10, 14, 18, 22, 26, 30],
[ 3, 7, 11, 15, 19, 23, 27, 31]])
In [73]: np.dot(arr.T,arr)
Out[73]:
array([[2240, 2352, 2464, 2576],
[2352, 2472, 2592, 2712],
[2464, 2592, 2720, 2848],
[2576, 2712, 2848, 2984]])
In [74]: np.dot(arr,arr.T)
Out[74]:
array([[ 14, 38, 62, 86, 110, 134, 158, 182],
[ 38, 126, 214, 302, 390, 478, 566, 654],
[ 62, 214, 366, 518, 670, 822, 974, 1126],
[ 86, 302, 518, 734, 950, 1166, 1382, 1598],
[ 110, 390, 670, 950, 1230, 1510, 1790, 2070],
[ 134, 478, 822, 1166, 1510, 1854, 2198, 2542],
[ 158, 566, 974, 1382, 1790, 2198, 2606, 3014],
[ 182, 654, 1126, 1598, 2070, 2542, 3014, 3486]])
对于高维数组,transpose需要得到一个由轴编号组成的元组才能对这些轴进行转置。
In [76]: arr = np.arange(12).reshape((2,2,3))
In [77]: arr
Out[77]:
array([[[ 0, 1, 2],
[ 3, 4, 5]],
[[ 6, 7, 8],
[ 9, 10, 11]]])
In [78]: arr.transpose((1,0,2))
Out[78]:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]]])
In [79]: arr.swapaxes(1,2)
Out[79]:
array([[[ 0, 3],
[ 1, 4],
[ 2, 5]],
[[ 6, 9],
[ 7, 10],
[ 8, 11]]])
通用函数:快速的元素级数组函数
In [80]: arr = np.arange(10)
In [81]: np.sqrt(arr)
Out[81]:
array([ 0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
In [82]: np.exp(arr)
Out[82]:
array([ 1.00000000e+00, 2.71828183e+00, 7.38905610e+00,
2.00855369e+01, 5.45981500e+01, 1.48413159e+02,
4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
8.10308393e+03])
In [83]: x = np.random.randn(8)
In [84]: y = np.random.randn(8)
In [85]: x
Out[85]:
array([-0.64050216, 0.4058439 , 0.53655964, -0.76862822, -0.16882124,
0.52559669, 0.38989637, -0.43821311])
In [86]: y
Out[86]:
array([-0.18182022, 1.74568738, 0.70178628, -1.01851544, 0.73568589,
-0.2059226 , -0.16270816, 1.057713 ])
In [87]: np.maximum(x,y) 求两个数组中相同坐标下的最大值
Out[87]:
array([-0.18182022, 1.74568738, 0.70178628, -0.76862822, 0.73568589,
0.52559669, 0.38989637, 1.057713 ])
利用数组进行数据处理
In [93]: points = np.arange(-5,5,0.01) 以0.01为间隔 -5,5为区间 定义点
In [94]: xs,ys = np.meshgrid(points,points) meshgrid函数可以将一维数组扩展成二维数组
In [95]: xs
Out[95]:
array([[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
...,
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99],
[-5. , -4.99, -4.98, ..., 4.97, 4.98, 4.99]])
In [96]: import matplotlib.pyplot as plt
In [97]: z = np.sqrt(xs ** 2 + ys **2)
In [98]: z
Out[98]:
array([[ 7.07106781, 7.06400028, 7.05693985, ..., 7.04988652,
7.05693985, 7.06400028],
[ 7.06400028, 7.05692568, 7.04985815, ..., 7.04279774,
7.04985815, 7.05692568],
[ 7.05693985, 7.04985815, 7.04278354, ..., 7.03571603,
7.04278354, 7.04985815],
...,
[ 7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 ,
7.03571603, 7.04279774],
[ 7.05693985, 7.04985815, 7.04278354, ..., 7.03571603,
7.04278354, 7.04985815],
[ 7.06400028, 7.05692568, 7.04985815, ..., 7.04279774,
7.04985815, 7.05692568]])
In [99]: plt.imshow(z,cmap=plt.cm.gray); plt.colorbar()
Out[99]: <matplotlib.colorbar.Colorbar at 0x28d7dab65c0>
In [101]: plt.title('Image plot of $\sqrt{x^2 + y^2}$ for a grid of values')
Out[101]: <matplotlib.text.Text at 0x28d7d26eeb8>
In [102]: plt.show()
将条件逻辑表述为数组运算
In [103]: xarr = np.array([1.1,1.2,1.3,1.4,1.5])
In [104]: yarr = np.array([2.1,2.2,2.3,2.4,2.5])
In [105]: cond = np.array([True,False,True,True,False])
np.where函数,当cond中的值为T,选xarr;为F,选yarr。
In [106]: result = np.where(cond,xarr,yarr)
In [107]: result
Out[107]: array([ 1.1, 2.2, 1.3, 1.4, 2.5])
np.where中的第二个和第三个参数不必是数组,也可以是标量值。
In [108]: arr = np.random.randn(4,4)
In [109]: arr
Out[109]:
array([[-0.43521077, 1.41782551, -0.97362101, 1.08447685],
[ 2.68892549, -1.30362208, -1.08288557, 0.35985212],
[ 1.10480412, -0.80542523, 0.48892358, -1.07925725],
[-1.34552789, 1.132726 , -2.3198594 , -0.51442034]])
In [110]: np.where(arr>0,2,-2)
Out[110]:
array([[-2, 2, -2, 2],
[ 2, -2, -2, 2],
[ 2, -2, 2, -2],
[-2, 2, -2, -2]])
In [111]: np.where(arr>0,2,arr)
Out[111]:
array([[-0.43521077, 2. , -0.97362101, 2. ],
[ 2. , -1.30362208, -1.08288557, 2. ],
[ 2. , -0.80542523, 2. , -1.07925725],
[-1.34552789, 2. , -2.3198594 , -0.51442034]])
还可以用where表示出更复杂的逻辑,嵌套
用于布尔型数组的方法
In [112]: arr = np.random.randn(100)
In [113]: (arr > 0).sum() 数组中大于0的个数
Out[113]: 54
In [114]: bools = np.array([False,False,True,False])
In [115]: bools.any() 一个或多个True
Out[115]: True
In [116]: bools.all() 是否都为True
Out[116]: False
排序
In [118]: arr = np.random.randn(8)
In [119]: arr
Out[119]:
array([-0.50050381, -0.47721016, -0.30869937, -1.43030168, 0.00459887,
-1.65491773, 1.22161368, -0.77993317])
In [120]: arr.sort() 从小到大排序
In [121]: arr
Out[121]:
array([-1.65491773, -1.43030168, -0.77993317, -0.50050381, -0.47721016,
-0.30869937, 0.00459887, 1.22161368])
In [122]: arr = np.random.randn(5,3)
In [123]: arr
Out[123]:
array([[ 2.57065162, -0.96012742, -1.11512802],
[-0.89160886, 0.06382505, -0.8871275 ],
[-0.71819144, -1.57579496, -0.27975377],
[ 0.09348711, -0.01115059, 0.18504493],
[-0.75319897, 0.46313174, -2.02176903]])
In [125]: arr.sort(1) 指定排序的轴
In [126]: arr
Out[126]:
array([[-1.11512802, -0.96012742, 2.57065162],
[-0.89160886, -0.8871275 , 0.06382505],
[-1.57579496, -0.71819144, -0.27975377],
[-0.01115059, 0.09348711, 0.18504493],
[-2.02176903, -0.75319897, 0.46313174]])
唯一化以及其他的集合逻辑
In [127]: ints = np.array([3,3,3,3,2,2,1,1,4])
In [128]: np.unique(ints) 唯一元素,并返回有序结果
Out[128]: array([1, 2, 3, 4])