本博客为 Numpy & Pandas 莫烦 python 数据处理 的个人学习笔记!
pandas 的相关介绍可以参考 【python】pandas
最近一次修订时间为:2020-10-19
前言
Why Numpy & Pandas?
- 运算速度快:numpy 和 pandas 都是采用 C 语言编写, pandas 又是基于 numpy, 是 numpy 的升级版本。
- 消耗资源少:采用的是矩阵运算,会比 python 自带的字典或者列表快好多
1 属性:ndim / shape / size
ndim 和 size 我用的比较少,涨知识了
import numpy as np
a = np.array([[1,2,3],
[2,3,4]])
print(a)
print('number of dimension:',a.ndim) # 返回数组的维数
print('shape:',a.shape)
print('size:',a.size)
output
[[1 2 3]
[2 3 4]]
number of dimension: 2
shape: (2, 3)
size: 6
2 创建 array
- array:创建数组,dtype 指定数据类型
import numpy as np
a = np.array([2,3,4],dtype = np.int)
print(a.dtype)
b = np.array([2,3,4],dtype = np.int64)
print(b.dtype)
c = np.array([2,3,4],dtype = np.float)
print(c.dtype)
d = np.array([2,3,4],dtype = np.float32)
print(d.dtype)
output
int32
int64
float64
float32
2.1 zeros / ones / empty / full
参考 numpy.full
- zeros:Return a new array of given shape and type, filled with zeros
- ones:Return a new array of given shape and type, filled with ones
import numpy as np
a = np.array([[1,2,3],
[4,5,6]])
print(a,'\n')
b = np.zeros((3,4))
print(b,'\n')
c = np.ones((3,4),dtype=np.int16)
print(c,'\n')
output
[[1 2 3]
[4 5 6]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
- empty:Return a new array of given shape and type, without initializing entries
d = np.empty((3,4))
print(d,'\n')
output
[[6.9351693e-310 2.1158484e-316 0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000 0.0000000e+000 0.0000000e+000]
[0.0000000e+000 0.0000000e+000 0.0000000e+000 0.0000000e+000]]
- full:Return a new array of given shape and type, filled with fill_value.
e = np.full((3,4),5)
print(e,'\n')
f = np.full((3,4),(1,2,3,4))
print(f)
output
[[5 5 5 5]
[5 5 5 5]
[5 5 5 5]]
[[1 2 3 4]
[1 2 3 4]
[1 2 3 4]]
2.2 zeros_like / ones_like / empty_like / full_like
import numpy as np
x = np.arange(6)
x = x.reshape((2, 3))
print(x, "\n")
print(np.zeros_like(x), "\n")
print(np.ones_like(x))
output
[[0 1 2]
[3 4 5]]
[[0 0 0]
[0 0 0]]
[[1 1 1]
[1 1 1]]
import numpy as np
x = np.arange(6)
x = x.reshape((2, 3))
print(x, "\n")
print(np.empty_like(x), "\n")
print(np.full_like(x, 1, dtype=np.double))
output
[[0 1 2]
[3 4 5]]
[[140240500624592 33819824 140240520185592]
[140240520222512 140240441229232 140240520473936]]
[[1. 1. 1.]
[1. 1. 1.]]
2.3 arrange / linspace
- arrange:按指定范围创建数据
- linspace:创建线段
import numpy as np
a = np.arange(12).reshape(3,4) # numpy.ndarray
print(a,'\n')
b = np.linspace(1,10,20).reshape(4,5)
print(b)
output
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 1. 1.47368421 1.94736842 2.42105263 2.89473684]
[ 3.36842105 3.84210526 4.31578947 4.78947368 5.26315789]
[ 5.73684211 6.21052632 6.68421053 7.15789474 7.63157895]
[ 8.10526316 8.57894737 9.05263158 9.52631579 10. ]]
3 基本运算
+ / ** / sin / < / ==
用数学的时候,别总想着 import math
哟, numpy 自带的也不错,比如 np.sin
,注意 <、>、== 可以直接对两个数组进行比较
import numpy as np
x = np.array([10,20,30,40])
y = np.arange(4)
print(x,y,'\n')
c = x+y
print(c,'\n')
d = x**2
print(d,'\n')
e = 10*np.sin(x)# cos
print(e,'\n')
print(y<3,'\n')
print(y==3)
output
[10 20 30 40] [0 1 2 3]
[10 21 32 43]
[ 100 400 900 1600]
[-5.44021111 9.12945251 -9.88031624 7.4511316 ]
[ True True True False]
[False False False True]
* / dot / transpose / swapaxes / T / broadcast_to
dot(a,b)
见得多,a.dot(b)
学习到了
import numpy as np
x = np.array([[1,1],
[0,1]])
y = np.arange(4).reshape(2,2)
print(x,'\n')
print(y,'\n')
c = x*y # element-wise
print(c,'\n')
d1 = np.dot(x,y) # 矩阵乘
print(d1,'\n')
d2 = x.dot(y)# 矩阵乘
print(d2)
output
[[1 1]
[0 1]]
[[0 1]
[2 3]]
[[0 1]
[0 3]]
[[2 4]
[2 3]]
[[2 4]
[2 3]]
transpose / T
import numpy as np
A = np.arange(11,-1,-1).reshape((3,4))
print(A,'\n')
print(np.transpose(A),'\n')
print(A.T,'\n')
print((A.T).dot(A))
output
[[11 10 9 8]
[ 7 6 5 4]
[ 3 2 1 0]]
[[11 7 3]
[10 6 2]
[ 9 5 1]
[ 8 4 0]]
[[11 7 3]
[10 6 2]
[ 9 5 1]
[ 8 4 0]]
[[179 158 137 116]
[158 140 122 104]
[137 122 107 92]
[116 104 92 80]]
np.transpose
更多的用法参考 【python】axis 的形象化理解
np.swapaxes
对轴进行两两置换,而 np.transpose
可以把数组转化为指定 axis 顺序,更灵活
import numpy as np
a = np.arange(1, 25, 1).reshape((2, 3, 4))
print(a, '\n')
print("transpose:\n", np.transpose(a, (0, 2, 1)), '\n') # 行列交换
print("swapaxes:\n", np.swapaxes(a, axis1=1, axis2=2)) # 行列交换
output
[[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
[[13 14 15 16]
[17 18 19 20]
[21 22 23 24]]]
transpose:
[[[ 1 5 9]
[ 2 6 10]
[ 3 7 11]
[ 4 8 12]]
[[13 17 21]
[14 18 22]
[15 19 23]
[16 20 24]]]
swapaxes:
[[[ 1 5 9]
[ 2 6 10]
[ 3 7 11]
[ 4 8 12]]
[[13 17 21]
[14 18 22]
[15 19 23]
[16 20 24]]]
上述代码看到,实现行列交换时 np.swapaxes
和 np.transpose
对应的操作
numpy.ndarray.broadcast_to(array, shape, subok)
subok : bool, optional
If True, then sub-classes will be passed-through, otherwise
the returned array will be forced to be a base-class array (default).
import numpy as np
a = np.arange(4).reshape(1,4)
print("ori:\n", a)
print("bro:\n", np.broadcast_to(a,(4,4)))
print("-"*30)
b = a.reshape(4,1)
print("ori:\n", b)
print("bro:\n", np.broadcast_to(b,(4,4)))
output
ori:
[[0 1 2 3]]
bro:
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
------------------------------
ori:
[[0]
[1]
[2]
[3]]
bro:
[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
sum / min / max / amax
axis 的介绍可以查看这篇博客。
import numpy as np
a = np.random.random((2,4))
print(a,'\n')
print(np.sum(a),'\n')
print(np.min(a),'\n')
print(np.max(a),'\n')
output
[[0.11554764 0.29960549 0.86234135 0.68197679]
[0.6658813 0.50246088 0.61024788 0.48163003]]
4.219691351931921
0.11554764097686954
0.8623413474608111
amax 功能同 max
argmin / argmax / mean / average / median / argsort
注意 median
是求中位数,数组为奇数时,中位数为中间的一个数,数组为偶数时,中位数为中间两个数加起来除以 2
mean
和 average
的区别如下
- np.mean 直接计算平均数
- np.average 计算加权平均数(如果有权重weight的话)
https://numpy.org/doc/stable/reference/generated/numpy.average.html
argmin()
和 argmax()
两个函数分别对应着求矩阵中最小元素和最大元素的索引
当然,他们的计算都是可以加入 axis 的。
import numpy as np
A = np.arange(11,-1,-1).reshape((3,4))
print(A,'\n')
print(np.argmin(A))
print(A.argmin(),'\n')
print(np.argmax(A))
print(A.argmax(),'\n')
print(np.mean(A))
print(A.mean(),'\n')
print(np.average(A),'\n')
print(np.median(A),'\n') # 中位数
output
[[11 10 9 8]
[ 7 6 5 4]
[ 3 2 1 0]]
11
11
0
0
5.5
5.5
5.5
5.5
argsort
Returns the indices that would sort an array
import numpy as np
x = [9,7,5,3,1]
print(np.argsort(x))
print(x[np.argsort(x)[-1]])
output
[4 3 2 1 0]
9
cumsum / diff / nonzero / sort / clip
- cumsum:生成的每一项矩阵元素均是从原矩阵首项累加到对应项的元素之和
- diff:该函数默认(axis = 1)计算的便是每一行中后一项与前一项之差
注意,也可以加入 axis 哟
import numpy as np
A = np.arange(11,-1,-1).reshape((3,4))
print(A,'\n')
print(np.cumsum(A),'\n') # 累加
print(np.diff(A),'\n') # 累差
output
[[11 10 9 8]
[ 7 6 5 4]
[ 3 2 1 0]]
[11 21 30 38 45 51 56 60 63 65 66 66]
[[-1 -1 -1]
[-1 -1 -1]
[-1 -1 -1]]
- nonzero:python numpy中nonzero()的用法
- sort:排序
import numpy as np
A = np.arange(11,-1,-1).reshape((3,4))
print(A)
print(np.nonzero(A)) # 返回数组a中值不为零的元素的下标
print(np.sort(A))
output
[[11 10 9 8]
[ 7 6 5 4]
[ 3 2 1 0]]
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2], dtype=int64), array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2], dtype=int64))
[[ 8 9 10 11]
[ 4 5 6 7]
[ 0 1 2 3]]
nonzero 对于二维,输出的是一个长度为2的元组,第一个元组是行号,第二个是列号,一一依次对应(非零元素的行列号)。
- clip:
import numpy as np
A = np.arange(11,-1,-1).reshape((3,4))
print(A,'\n')
print(np.clip(A,5,9)) # 大于9变成9,小于5变成5
output
[[11 10 9 8]
[ 7 6 5 4]
[ 3 2 1 0]]
[[9 9 9 8]
[7 6 5 5]
[5 5 5 5]]
intersect1d / union1d
求交集
import numpy as np
list1 = [1,2,3,4,5]
list2 = [2,3,4,5,8]
np.intersect1d(list1,list2)
output
array([2, 3, 4, 5])
求并集
list1 = [1,2,3]
list2 = [2,3,4]
np.union1d(list1,list2)
output
array([1, 2, 3, 4])
np.greater
参考 https://numpy.org/doc/stable/reference/generated/numpy.greater.html
np.sign
符号函数的功能
4 Numpy 索引
import numpy as np
A = np.arange(3,15)
print(A)
print(A[3],'\n')
B = np.arange(3,15).reshape(3,4)
print(B,'\n')
print(B[1]) # 第1行,0,1,2 行
print(B[1,:],'\n') # 第一行另一种写法
print(B[1,1]) # 第1行,第1列
print(B[1][1],'\n') # 第1行,第1列 另一种写法
print(B[:,0]) # 第0列
print(B[0,1:3],'\n') # 0行,1,2 列
for row in B: # 输出每一行
print(row)
print('\n')
for column in B.T: # 输出每一列
print(column)
print('\n')
print(A.flatten())
for item in A.flatten(): # 输出每个元素
print(item,end = ' ')
output
[ 3 4 5 6 7 8 9 10 11 12 13 14]
6
[[ 3 4 5 6]
[ 7 8 9 10]
[11 12 13 14]]
[ 7 8 9 10]
[ 7 8 9 10]
8
8
[ 3 7 11]
[4 5]
[3 4 5 6]
[ 7 8 9 10]
[11 12 13 14]
[ 3 7 11]
[ 4 8 12]
[ 5 9 13]
[ 6 10 14]
[ 3 4 5 6 7 8 9 10 11 12 13 14]
3 4 5 6 7 8 9 10 11 12 13 14
要注意如下 ,,,
的比较省略的写法
import numpy as np
a = np.arange(12).reshape(3, 4)
print(a, "\n")
print(a[1, ...], "\n")
print(a[...,1])
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[4 5 6 7]
[1 5 9]
5 array
5.1 合并:vstack / hstack /concatenate
import numpy as np
A = np.array([1,1,1])
B = np.array([2,2,2])
C = np.vstack((A,B)) # vertical stack
print(C,'\n')
print(A.shape,C.shape,'\n')
D = np.hstack((A,B)) # horizontal stack
print(D,'\n')
print(A[np.newaxis,:],'\n') # 行方面加了一个维度
print(A[:,np.newaxis]) # 列方面加了一个维度
output
[[1 1 1]
[2 2 2]]
(3,) (2, 3)
[1 1 1 2 2 2]
[[1 1 1]]
[[1]
[1]
[1]]
A = np.array([1,1,1])[:,np.newaxis]
B = np.array([2,2,2])[:,np.newaxis]
C = np.vstack((A,B))
print(C,'\n')
D = np.hstack((A,B)) # horizontal stack
print(D)
output
[[1]
[1]
[1]
[2]
[2]
[2]]
[[1 2]
[1 2]
[1 2]]
更加灵活的 concatenate
A = np.array([1,1,1])[:,np.newaxis]
B = np.array([2,2,2])[:,np.newaxis]
C = np.concatenate((A,B,A),axis = 0)
print(C,'\n')
D = np.concatenate((A,B,A),axis = 1)
print(D)
output
[[1]
[1]
[1]
[2]
[2]
[2]
[1]
[1]
[1]]
[[1 2 1]
[1 2 1]
[1 2 1]]
5.2 分割:vsplit / hsplit / split / array_split
import numpy as np
A = np.arange(12).reshape((3,4))
print(A,'\n')
# 等量分割
print(np.split(A,2,axis = 1),'\n') # 4列分成2块
print(np.split(A,3,axis = 0),'\n') # 3行分成3块
# 不等量分割
print(np.array_split(A,3,axis = 1),'\n') # 4 列 2,1,1
# vertical 和 horizontal 分割
print(np.vsplit(A,3),'\n')
print(np.hsplit(A,2))
output
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2, 3],
[ 6, 7],
[10, 11]])]
[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2],
[ 6],
[10]]), array([[ 3],
[ 7],
[11]])]
[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8, 9, 10, 11]])]
[array([[0, 1],
[4, 5],
[8, 9]]), array([[ 2, 3],
[ 6, 7],
[10, 11]])]
array_split allows indices_or_sections to be an integer that does not equally divide the axis.
x = np.arange(8.0)
np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
5.3 np.array() 与 np.asarray() 的区别
参考
输入 x 为数组时,
np.asarray(x) 相当于浅层复制 x,x 变 np.asarray(x)
也会变(不会开辟新内存空间)
np.array(x) 相当于深层复制 x,x 变 np.array(x)
不会变(开辟了新内存)
输入 x 为 列表时,两者没区别
例子如下
1)输入为列表时
import numpy as np
x = [[1,2],
[3,4],
[5,6]]
y = np.array(x)
z = np.asarray(x)
x[2][1] = 0
print(x)
print(y)
print(z)
output
[[1, 2], [3, 4], [5, 0]]
[[1 2]
[3 4]
[5 6]]
[[1 2]
[3 4]
[5 6]]
2)输入为数组时
x = np.array([[1,2],
[3,4],
[5,6]])
y = np.array(x)
z = np.asarray(x)
x[2][1] = 0
print(x)
print(y)
print(z)
output
[[1 2]
[3 4]
[5 0]]
[[1 2]
[3 4]
[5 6]]
[[1 2]
[3 4]
[5 0]]
5.4 扩充维度
import numpy as np
a = np.array(range(12)).reshape(3, 4)
b = np.expand_dims(a,0).repeat(2,axis=0)
print(b.shape)
output
(2,3,4)
expand_dims
表示增加一个维度,这个维度增加在 a 的 0 维度。
repeat
代表重复的次数,axis 代表在哪个维度进行重复。
5.5 stack
numpy.ndarray.stack(arrays, axis)
函数沿新轴连接数组序列
参考
import numpy as np
a = np.array([[1,2],[3,4]])
print("a:\n", a,"\n")
b = np.array([[5,6],[7,8]])
print("b:\n",b,"\n")
print("stack 0:\n",np.stack((a,b),0),"\n")
print("stack 1:\n",np.stack((a,b),1),"\n")
print("stack 2:\n",np.stack((a,b),2),"\n")
output
a:
[[1 2]
[3 4]]
b:
[[5 6]
[7 8]]
stack 0:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
stack 1:
[[[1 2]
[5 6]]
[[3 4]
[7 8]]]
stack 2:
[[[1 5]
[2 6]]
[[3 7]
[4 8]]]
再看个例子
import numpy as np
from pprint import pprint
arry0 = [np.arange(0, 8).reshape(2, 4) for i in range(3)]
arry1 = np.stack(arry0, axis=0)
arry2 = np.stack(arry0, axis=1)
arry3 = np.stack(arry0, axis=2)
print(np.shape(arry0))
pprint(arry0)
print(np.shape(arry1))
pprint(arry1)
print(np.shape(arry2))
pprint(arry2)
print(np.shape(arry3))
pprint(arry3)
output
(3, 2, 4)
[array([[0, 1, 2, 3],
[4, 5, 6, 7]]),
array([[0, 1, 2, 3],
[4, 5, 6, 7]]),
array([[0, 1, 2, 3],
[4, 5, 6, 7]])]
(3, 2, 4)
array([[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[0, 1, 2, 3],
[4, 5, 6, 7]]])
(2, 3, 4)
array([[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]],
[[4, 5, 6, 7],
[4, 5, 6, 7],
[4, 5, 6, 7]]])
(2, 4, 3)
array([[[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3]],
[[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7]]])
对于axis=1,就是横着切开,对应行横着堆。即按照原始维度的1进行连接。
对于axis=2,就是竖着切开,对应行竖着堆。即按照原始维度的2进行连接。
对于axis=0,就是不切开,两个堆一起。即按照最外面的维度0进行连接。
5.6 resize
resize(arr, shape)
函数返回指定大小的新数组。如果新大小大于原始大小,则包含原始数组中的元素的重复副本。如果小于则去掉原始数组的部分数据。
import numpy as np
a = np.array([[1,2,3],[4,5,6]])
print("shape a:", a.shape)
print(a,"\n")
b = np.resize(a, (3,2))
print("shape b:", b.shape)
print(b,"\n")
print('修改数组 b 的大小为3x3:')
b = np.resize(a,(3,3))
print(b,"\n")
print('修改数组 b 的大小为2x2:')
b = np.resize(a,(2,2))
print(b,"\n")
output
shape a: (2, 3)
[[1 2 3]
[4 5 6]]
shape b: (3, 2)
[[1 2]
[3 4]
[5 6]]
修改数组 b 的大小为3x3:
[[1 2 3]
[4 5 6]
[1 2 3]]
修改数组 b 的大小为2x2:
[[1 2]
[3 4]]
5.7 rollaxis
numpy.ndarray.rollaxis(arr, axis, start)
• arr:输入数组
• axis:要向后滚动的轴,其它轴的相对位置不会改变
• start:默认为零,表示完整的滚动。会滚动到特定位置。
import numpy as np
a = np.arange(8).reshape(2,2,2)
print(a)
print(np.rollaxis(a,2)) # axis 2 滚到 axis 0, 原有 axis 0 和 axis 1 相对位置不变,[0,1,2]->[2,0,1]
print(np.rollaxis(a,2,1)) # axis 2 滚到 axis 1, 原有 axis 0 和 axis 1 相对位置不变,[0,1,2]->[0,2,1]