Numpy学习笔记

多维数组基本操作

Numpy库是一个使用科学计算的基础库

创建ndarray

In [6]: import numpy as np
# 初始化一个列表
In [7]: list_data=[3,6,6.7,9,1,0]
# 使用numpy里的array方法获取数组
In [8]: array1 = np.array(list_data)

In [9]: array1
Out[9]: array([3. , 6. , 6.7, 9. , 1. , 0. ])

In [10]: list_data=[[1,2,3,4],[5,6,7,8]]

In [11]: array2 = np.array(list_data)

In [12]: array2
Out[12]:
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])
       

查看数组的属性

# 维度
In [13]: array1.ndim
Out[13]: 1

In [14]: array2.ndim
Out[14]: 2
# 形状
In [15]: array2.shape
Out[15]: (2, 4)

In [16]: array1.shape
Out[16]: (6,)
# 数据类型
In [17]: array1.dtype
Out[17]: dtype('float64')

In [18]: array2.dtype
Out[18]: dtype('int32')

创建数组基础方法

在这里插入图片描述

指定特定数据类型

astype()函数可以直接转换数据类型

In [19]: arr1 = np.array([1,2,3], dtype=np.float64)

In [20]: arr1.dtype
Out[20]: dtype('float64')
# 或使用类型代码int_arr1 = arr1.astype('i4')
In [21]: int_arr1 = arr1.astype(np.int32)
In [22]: int_arr1.dtype
Out[22]: dtype('int32')

在这里插入图片描述在这里插入图片描述

数组运算

与普通四则运算不同在于会延伸到数组里的每个元素

In [29]: arr = np.array([[1., 2., 3.], [4., 5., 6.]])

In [30]: arr*arr
Out[30]:
array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [31]: arr-arr
Out[31]:
array([[0., 0., 0.],
       [0., 0., 0.]])

数组的索引和切片于列表类似

In [33]: arr3x3=np.array([[1, 2, 3], [4, 5, 6],[7, 8, 9]])

In [34]: arr3x3[:2,1:]
Out[34]:
array([[2, 3],
       [5, 6]])
# 选取第二行的前两列 [在第几行, 选取哪几列]
In [35]: arr3x3[1,:2]
Out[35]: array([4, 5])
# 选取第三列的前两行 [选取那几行, 在第几列]
In [36]: arr3x3[:2,2]
Out[36]: array([3, 6])
# 维度切片 [每一行,第一列]
In [40]: arr3x3[:,:1]
Out[40]:
array([[1],
       [4],
       [7]])

可对切片区域赋值

In [44]: arr3x3[:,2:] = 0

In [45]: arr3x3
Out[45]:
array([[1, 2, 0],
       [4, 5, 0],
       [7, 8, 0]])

利用布尔数组进行切片

In [49]: names = np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])

In [50]: data = np.random.randn(7,4)

In [51]: names
Out[51]: array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [52]: data
Out[52]:
array([[-0.04231319, -0.52927321,  1.01094362,  0.28275029],
       [-0.5517945 ,  0.15585963, -0.95589379, -1.94704089],
       [ 0.271432  ,  0.14834617, -0.41995274,  0.69163998],
       [ 0.53585224,  0.45810894, -0.1749757 ,  0.82619417],
       [-0.03514554, -0.34414769,  1.14273879,  0.85434894],
       [ 1.57857093,  0.84154066, -0.18796459,  0.6071804 ],
       [ 0.95654895, -0.12625911, -2.09689247,  0.30145266]])

In [54]: names=='Bob'
Out[54]: array([ True, False, False,  True, False, False, False])

In [55]: data[names=='Bob']
Out[55]:
array([[-0.04231319, -0.52927321,  1.01094362,  0.28275029],
       [ 0.53585224,  0.45810894, -0.1749757 ,  0.82619417]])

# 选取 names=='Bob'的行,中的前三列
In [56]: data[names=='Bob',:3]
Out[56]:
array([[-0.04231319, -0.52927321,  1.01094362],
       [ 0.53585224,  0.45810894, -0.1749757 ]])

索引进阶

In [65]: arr = np.empty((8,4))

In [67]: for i in range(8):
    ...:     arr[i] = i
    
In [68]: arr
Out[68]:
array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])
# 选取指定行
In [71]: arr[[4,2,5,-7]]
Out[71]:
array([[4., 4., 4., 4.],
       [2., 2., 2., 2.],
       [5., 5., 5., 5.],
       [1., 1., 1., 1.]])
# reshape 指定矩阵形状
In [77]: arr = np.arange(32).reshape((8,4))

In [78]: arr
Out[78]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])
# 选取指定坐标位置的值
In [79]: arr[[1,5,4,3],[3,2,1,0]]
Out[79]: array([ 7, 22, 17, 12])

# 选取指定行,指定列。 如:第2行的4,3,2,1列
In [80]: arr[[1,5,4,3]][:,[3,2,1,0]]
Out[80]:
array([[ 7,  6,  5,  4],
       [23, 22, 21, 20],
       [19, 18, 17, 16],
       [15, 14, 13, 12]])

数组的转置和轴对换

转置和运算

numpy.linalg 有许多关于矩阵运算的方法

# 转置
In [82]: arr.T
Out[82]:
array([[ 0,  4,  8, 12, 16, 20, 24, 28],
       [ 1,  5,  9, 13, 17, 21, 25, 29],
       [ 2,  6, 10, 14, 18, 22, 26, 30],
       [ 3,  7, 11, 15, 19, 23, 27, 31]])

# dot()计算矩阵相乘(外积),若参数都为一维矩阵则计算内积,即对应数相乘之和
In [83]: np.dot(arr.T,arr)
Out[83]:
array([[2240, 2352, 2464, 2576],
       [2352, 2472, 2592, 2712],
       [2464, 2592, 2720, 2848],
       [2576, 2712, 2848, 2984]])
       
In [60]: arr = np.random.randn(4,4)

In [61]: mat = arr.T.dot(arr)

In [62]: mat.dot(inv(mat))
Out[62]:
array([[ 1.00000000e+00, -1.77635684e-15, -4.44089210e-16,
        -2.22044605e-16],
       [ 0.00000000e+00,  1.00000000e+00, -1.33226763e-15,
        -2.22044605e-16],
       [ 2.66453526e-15,  0.00000000e+00,  1.00000000e+00,
        -2.22044605e-16],
       [ 6.66133815e-16,  4.44089210e-16,  0.00000000e+00,
         1.00000000e+00]])

numpy.linalg中的常用函数

在这里插入图片描述

轴对换

In [88]: arr = np.arange(24).reshape((2,3,4))

In [89]: arr
Out[89]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

在这里插入图片描述

X轴和Y轴对换

A2x3x4 转置 A3x2x4

方法一

将对应坐标进行转换,例如:若X轴和Y轴对换,则只需将坐标0,1,0 的值填入坐标 1,0,0中。(对其他轴对换也适用)

方法二

逐个面进行转置
底部

在这里插入图片描述

# 或者 arr.swapaxes(0,1)
In [90]: arr.transpose(1,0,2)
Out[90]:
array([[[ 0,  1,  2,  3],
        [12, 13, 14, 15]],

       [[ 4,  5,  6,  7],
        [16, 17, 18, 19]],

       [[ 8,  9, 10, 11],
        [20, 21, 22, 23]]])
X轴和Z轴对换

A2x3x4 转置 A4x3x2
正面
在这里插入图片描述

#或者 arr.swapaxes(0,2)
In [91]: arr.transpose(2,1,0)
Out[91]:
array([[[ 0, 12],
        [ 4, 16],
        [ 8, 20]],

       [[ 1, 13],
        [ 5, 17],
        [ 9, 21]],

       [[ 2, 14],
        [ 6, 18],
        [10, 22]],

       [[ 3, 15],
        [ 7, 19],
        [11, 23]]])
Y轴和Z轴对换

A2x3x4 转置 A2x4x3
侧面

在这里插入图片描述

# 或者 arr.swapaxes(1,2)
In [92]: arr.transpose(0,2,1)
Out[92]:
array([[[ 0,  4,  8],
        [ 1,  5,  9],
        [ 2,  6, 10],
        [ 3,  7, 11]],

       [[12, 16, 20],
        [13, 17, 21],
        [14, 18, 22],
        [15, 19, 23]]])

通用函数

In [2]: arr = np.arange(10)

In [3]: arr
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# 计算平方根
In [4]: np.sqrt(arr)
Out[4]:
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])
#计算e的x指数
In [5]: np.exp(arr)
Out[5]:
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [6]: arr
Out[6]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [8]: arr = arr.astype('f8')
# 具有相同的数据类型的数组可以直接对原数组进行操作
In [9]: np.sqrt(arr,arr)
Out[9]:
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [10]: arr
Out[10]:
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

更多通用函数

在这里插入图片描述在这里插入图片描述
在这里插入图片描述

用数组处理数据

In [11]: points = np.arange(-5,5,0.01)

In [12]: xs,ys=np.meshgrid(points,points)

In [13]: ys
Out[13]:
array([[-5.  , -5.  , -5.  , ..., -5.  , -5.  , -5.  ],
       [-4.99, -4.99, -4.99, ..., -4.99, -4.99, -4.99],
       [-4.98, -4.98, -4.98, ..., -4.98, -4.98, -4.98],
       ...,
       [ 4.97,  4.97,  4.97, ...,  4.97,  4.97,  4.97],
       [ 4.98,  4.98,  4.98, ...,  4.98,  4.98,  4.98],
       [ 4.99,  4.99,  4.99, ...,  4.99,  4.99,  4.99]])

In [14]: xs
Out[14]:
array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

In [15]: z = np.sqrt(xs**2+ys**2)

In [16]: z
Out[16]:
array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])
In [18]: import matplotlib.pyplot as plt
In [27]: plt.imshow(z,cmap=plt.cm.Blues);plt.colorbar();plt.title("image $\sqrt{x^2+y^2}$")
Out[27]: Text(0.5, 1, 'image $\\sqrt{x^2+y^2}$')

在这里插入图片描述
where()函数:where(判断语句,True情况执行,False情况执行),与if语句功能类似。

统计方法函数

# 生成正态分布的数组
In [28]: arr=np.random.randn(5,4)

In [29]: arr
Out[29]:
array([[ 1.05174756,  0.94627221, -1.47924905, -0.0400526 ],
       [ 0.37094514, -0.27207601,  1.31544288,  0.02487793],
       [ 1.13919491, -0.81736487,  0.21520435,  0.28251311],
       [-1.10817169,  0.22068492, -0.95416473,  0.06779754],
       [-0.02334601,  0.16765813, -1.30903895,  0.56600925]])
# 计算平均值 np.mean(arr) 或
In [30]: arr.mean()
Out[30]: 0.01824420114243558
# 计算总和
In [32]: arr.sum()
Out[32]: 0.3648840228487116
# 计算每一列的平均值,axis表示在该轴上的
# 0表示x轴,1表示y轴
In [34]: arr.mean(axis=0)
Out[34]: array([ 0.28607398,  0.04903488, -0.4423611 ,  0.18022904])
# 统计正数的个数
In [37]: (arr>0).sum()
Out[37]: 12
函数说明
mean算术平均数,零长度的数组为NaN
sum对数组中quanbu
std、var分别为标准差和方差、自由度可调(默认为n)
min、max最大值和最小值
argmin、argmax分别为最大和最小元素的索引
cumsum所有元素的累计和
cumprod所有元素的累计积

布尔型数组的方法

any() 用于测试数组中是否存在一个或多个True;
all() 则检查数组中所有值是否都是True

数组排序

# 对指定轴向的数组排序,0为x轴,1为y轴
# 注意:指定y轴会令原本数组改变
In [39]: arr.sort(1)

In [40]: arr
Out[40]:
array([[-1.47924905, -0.0400526 ,  0.94627221,  1.05174756],
       [-0.27207601,  0.02487793,  0.37094514,  1.31544288],
       [-0.81736487,  0.21520435,  0.28251311,  1.13919491],
       [-1.10817169, -0.95416473,  0.06779754,  0.22068492],
       [-1.30903895, -0.02334601,  0.16765813,  0.56600925]])

集合运算

In [47]: ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])

In [48]: np.unique(ints)
Out[48]: array([1, 2, 3, 4])
# 上下等价np.unique(ints) == sorted(set(ints))
In [49]: sorted(set(ints))
Out[49]: [1, 2, 3, 4]

在这里插入图片描述

数组文件输入和输出


In [4]: arr1 = np.arange(10)

In [5]: arr2 = np.random.randn(10)
# 数组的文件输出
In [6]: np.save('arr1_array',arr1)
# 数组的文件输入
In [7]: np.load('arr1_array.npy')
Out[7]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# 多个数组的文件输出,以关键字形式保存
In [8]: np.savez('arr1+2_array.npz',a=arr1,b=arr2)
# 多个数组的文件输入
In [9]: arr1ANDarr2 = np.load('arr1+2_array.npz')
In [10]: arr1ANDarr2['b']
Out[10]:
array([-2.4996369 , -1.88216122, -0.75614113, -1.3758831 ,  1.00322028,
        0.04897909, -0.69571353, -1.82350073, -0.54210798,  1.28929306])
# 以数据压缩的形式的文件输出
In [11]: np.savez_compressed('arrays_compressed.npz',a=arr1,b=arr2)

生成指定分布的随机数组

# 更改随机种子
In [74]:np.random.seed(1245)
# 独立的随机数生成器,与其他生成器隔离开
In [75]: rng = np.random.RandomState(1234)
In [76]: rng.randn(10)
Out[76]:
array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

In [77]: rng.normal(size=(3,3))
Out[77]:
array([[ 1.15003572,  0.99194602,  0.95332413],
       [-2.02125482, -0.33407737,  0.00211836],
       [ 0.40545341,  0.28909194,  1.32115819]])

在这里插入图片描述在这里插入图片描述

简单的例子1

可见模式
In [78]: import random

In [79]: position = 0

In [80]: walk = [position]

In [81]: steps =1000

In [82]: for i in range(steps):
    ...:     step = 1 if random.randint(0,1) else -1
    ...:     position += step
    ...:     walk.append(position)
    ...:

In [83]: import matplotlib.pyplot as plt

In [84]: plt.plot(walk[:100])
Out[84]: [<matplotlib.lines.Line2D at 0x14880408dd8>]

在这里插入图片描述

数组模式
In [111]: nsteps = 1000

In [112]: draws=np.random.randint(0,2,size=nsteps)

In [113]: steps=np.where(draws>0,1,-1)
# cumsum 计算累计和,可对轴进行
In [115]: walk=steps.cumsum()

In [116]: walk.max()
Out[116]: 62
## argmax() 返回的第一个TRUE下标
## argmin() 返回的第一个FALSE下标
In [119]: (np.abs(walk) >=10).argmax()
Out[119]: 47
二维数组模式
In [125]: nwalks=5000

In [126]: nsteps=1000

In [127]: draws = np.random.randint(0,2,size=(nwalks,nsteps))

In [128]: draws
Out[128]:
array([[0, 0, 0, ..., 0, 1, 0],
       [0, 1, 1, ..., 0, 0, 1],
       [0, 0, 0, ..., 1, 1, 0],
       ...,
       [1, 1, 1, ..., 1, 0, 0],
       [1, 1, 0, ..., 1, 1, 1],
       [0, 1, 1, ..., 1, 1, 1]])

In [130]: steps=np.where(draws>0,1,-1)

In [131]: walks=steps.cumsum(1)

In [132]: walks
Out[132]:
array([[ -1,  -2,  -3, ..., -16, -15, -16],
       [ -1,   0,   1, ...,   0,  -1,   0],
       [ -1,  -2,  -3, ...,   2,   3,   2],
       ...,
       [  1,   2,   3, ...,   2,   1,   0],
       [  1,   2,   1, ..., -38, -37, -36],
       [ -1,   0,   1, ...,  12,  13,  14]], dtype=int32)

In [133]: steps.cumsum(0)
Out[133]:
array([[ -1,  -1,  -1, ...,  -1,   1,  -1],
       [ -2,   0,   0, ...,  -2,   0,   0],
       [ -3,  -1,  -1, ...,  -1,   1,  -1],
       ...,
       [ 84,  52, -62, ..., -72,  84, -78],
       [ 85,  53, -63, ..., -71,  85, -77],
       [ 84,  54, -62, ..., -70,  86, -76]], dtype=int32)

In [134]: steps.cumsum()
Out[134]: array([  -1,   -2,   -3, ..., 1916, 1917, 1918], dtype=int32)

In [135]: walks.max()
Out[135]: 126
# any(1) Y轴向的有TRUE的数组
In [136]: hits30 = (np.abs(walks) >= 30).any(1)

In [137]: hits30
Out[137]: array([ True, False, False, ..., False,  True, False])

In [138]: crossing_times = (np.abs(walks[hits30]) 

In [140]: crossing_times.mean()
Out[140]: 499.10712183637435

多维数组进阶操作

数组重组和散开

重组

In [152]:arr = np.arange(15)
# -1表示自动分配适合的列数,F表示按列顺序重组
In [153]: arr.reshape((5,-1),order='F')
Out[153]:
array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])
# 默认order=’C‘,表示按行顺序重组
In [155]: arr.reshape((5,3))
Out[155]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

散开

# flatten()和ravel()相似
In [161]: arr.reshape((5,3),order='F').ravel()
Out[161]: array([ 0,  5, 10,  1,  6, 11,  2,  7, 12,  3,  8, 13,  4,  9, 14])

In [162]: arr
Out[162]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
# 可以指定行('C')或列('F')散开
In [169]: arr.reshape((5,3),order='F').ravel('F')
Out[169]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

数组合并和拆分

In [180]: arr1=np.arange(1,7).reshape(-1,3)

In [181]: arr2=np.arange(7,13).reshape(-1,3)
# 按行合并(X轴向合并)
# np.vstack((arr1,arr2))或
In [182]: np.concatenate([arr1,arr2],axis=0)
Out[182]:
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])
# 按列合并(Y轴向合并)
# np.hstack((arr1,arr2)) 或
In [183]: np.concatenate([arr1,arr2],axis=1)
Out[183]:
array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])
# 拆分
# 指定索引处拆分 split(array, index)
In [186]: np.split(np.vstack((arr1,arr2)),[1,3])
Out[186]:
[array([[1, 2, 3]]), array([[4, 5, 6],
        [7, 8, 9]]), array([[10, 11, 12]])]

更多方法

在这里插入图片描述

元素的重复操作

repeat() 一维数组对元素重复操作,多维数组轴向对元素重复操作
tile(array,) 对整个多维数组进行重复操作,如铺瓷砖一般

一维数组

In [195]: arr = np.arange(4)
# 每个元素重复两次
In [196]: arr.repeat(2)
Out[196]: array([0, 0, 1, 1, 2, 2, 3, 3])
# 指定下标元素重复的次数
In [198]: arr.repeat([2,3,1,2])
Out[198]: array([0, 0, 1, 1, 1, 2, 3, 3])

多维数组

In [211]: arr = np.random.randn(2,3)

In [212]: arr
Out[212]:
array([[ 0.69330678, -1.56588234,  0.27861828],
       [-1.63229888,  0.40831778, -0.32360164]])

In [213]: arr.repeat([1,2],axis=0)
Out[213]:
array([[ 0.69330678, -1.56588234,  0.27861828],
       [-1.63229888,  0.40831778, -0.32360164],
       [-1.63229888,  0.40831778, -0.32360164]])

In [214]: arr.repeat([1,2,1],axis=1)
Out[214]:
array([[ 0.69330678, -1.56588234, -1.56588234,  0.27861828],
       [-1.63229888,  0.40831778,  0.40831778, -0.32360164]])
# 瓷砖平铺2次
In [215]: np.tile(arr,2)
Out[215]:
array([[ 0.69330678, -1.56588234,  0.27861828,  0.69330678, -1.56588234,
         0.27861828],
       [-1.63229888,  0.40831778, -0.32360164, -1.63229888,  0.40831778,
        -0.32360164]])
# 瓷砖垂直2次水平3次
In [216]: np.tile(arr,(2,3))
Out[216]:
array([[ 0.69330678, -1.56588234,  0.27861828,  0.69330678, -1.56588234,
         0.27861828,  0.69330678, -1.56588234,  0.27861828],
       [-1.63229888,  0.40831778, -0.32360164, -1.63229888,  0.40831778,
        -0.32360164, -1.63229888,  0.40831778, -0.32360164],
       [ 0.69330678, -1.56588234,  0.27861828,  0.69330678, -1.56588234,
         0.27861828,  0.69330678, -1.56588234,  0.27861828],
       [-1.63229888,  0.40831778, -0.32360164, -1.63229888,  0.40831778,
        -0.32360164, -1.63229888,  0.40831778, -0.32360164]])

广播

广播需注意数组的形状(最小行或列为1,不可为空),如:在0(1)轴向上广播,则行(列)数最小要为1,列(行)数需与目标数组相同。

沿轴向传播

In [20]: arr = np.arange(15).reshape(3,5)

In [21]: arr.mean(0)
Out[21]: array([5., 6., 7., 8., 9.])

In [22]: list_mean = arr.mean(0)

In [23]: list_mean.shape
Out[23]: (5,)
# 重建数组的形状
# list_mean.reshape(1,5) 或

In [26]: list_mean[np.newaxis,:]
Out[26]: array([[5., 6., 7., 8., 9.]])

In [27]: arr - list_mean[np.newaxis,:]
Out[27]:
array([[-5., -5., -5., -5., -5.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 5.,  5.,  5.,  5.,  5.]])

In [28]: (arr - list_mean[np.newaxis,:]).mean(0)
Out[28]: array([0., 0., 0., 0., 0.])

广播初始化数组

In [47]: arr = np.arange(16).reshape(4,4)

In [48]: col = np.array([2,3,5,8])

In [51]: arr[:] = col[np.newaxis,:]

In [52]: arr
Out[52]:
array([[2, 3, 5, 8],
       [2, 3, 5, 8],
       [2, 3, 5, 8],
       [2, 3, 5, 8]])
     
In [57]: arr[1:3] = [[8],[9]]

In [58]: arr
Out[58]:
array([[2, 3, 5, 8],
       [8, 8, 8, 8],
       [9, 9, 9, 9],
       [2, 3, 5, 8]])

结构化数组

结构化数组可以由不同的数据类型组成的数组。

# 定义结构化数组的数据类型
In [89]: UserDef_dtype = [('F_X',np.float64),('I_Y',np.int32)]
# sarr = np.array([(1.6,3),(np.pi,-2)],[('X',np.float64),('Y',np.int32)]) 等价
In [90]: sarr = np.array([(1.6,3),(np.pi,-2)],dtype=UserDef_dtype)

In [92]: sarr
Out[92]:
array([(1.6       ,  3), (3.14159265, -2)],
      dtype=[('F_X', '<f8'), ('I_Y', '<i4')])

In [93]: sarr['F_X']
Out[93]: array([1.6       , 3.14159265])

结构化多维数组

多维数组
In [97]: UserDef_dtype = [('x',np.int64,3),('y',np.int32)]

In [98]: arr = np.zeros(4,dtype=UserDef_dtype)

In [99]: arr
Out[99]:
array([([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0), ([0, 0, 0], 0)],
      dtype=[('x', '<i8', (3,)), ('y', '<i4')])

In [101]: arr[0][0][1]
Out[101]: 0

In [102]: arr[0]['x']
Out[102]: array([0, 0, 0], dtype=int64)

In [103]: arr['x']
Out[103]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=int64)
嵌套dtype
In [104]: UserDef_dtype = [('x',[('a','f8'),('b','f4')]),('y',np.int32)]

In [105]: data = np.array([((1,2),5),((3,4),6)],dtype=UserDef_dtype)

In [106]: data['x']
Out[106]: array([(1., 2.), (3., 4.)], dtype=[('a', '<f8'), ('b', '<f4')])

In [107]: data['x']['a']
Out[107]: array([1., 3.])

间接排序

argsort() 返回排序后的索引(下标)数组

Out[109]: array([-0.38962844, -2.14157925,  1.7248753 ,  0.84273655, -0.30907011])

In [111]: indexer = value.argsort()

In [112]: indexer
Out[112]: array([1, 0, 4, 3, 2], dtype=int64)

In [113]: value[indexer]
Out[113]: array([-2.14157925, -0.38962844, -0.30907011,  0.84273655,  1.7248753 ])

lexsort() 可传入多个参数进行排序,即对多个键值组进行排序,如:lexsort(arr1,arr2,arr3) 则是先arr3再arr2再arr1进行排序。

In [116]: first_name = np.array(['Bob', 'Jane', 'Steve', 'Bill', 'Barbara'])

In [117]: last_name = np.array(['Jones', 'Arnold', 'Arnold', 'Jones', 'Walters'])

In [118]: indexer = np.lexsort((first_name,last_name))

In [119]: indexer
Out[119]: array([1, 2, 3, 0, 4], dtype=int64)

In [126]: for i in indexer:
     ...:     print(first_name[i] + " " + last_name[i])
     ...:
Jane Arnold
Steve Arnold
Bill Jones
Bob Jones
Barbara Walters

其他排序

mergesot(合并排序)、quicksort(快速排序)和heapsort(堆排序),使用方法更改argsort()中的kind参数argsort(kind=‘mergesort’)

partition(array,num) 返回的数组arr头num个为数组中最小的num个元素
argpartition(array,num) 返回partition()后的索引数组

searchsorted() 二分查找,在有序数组中返回查找到的索引

归类分组

In [136]: data = np.floor(np.random.uniform(0,10000,size=50))

In [137]: group = np.array([0,100,1000,5000,10000])

In [138]: data
Out[138]:
array([4343., 8770., 7665., 9782., 8358., 9207., 4158., 6187., 9370.,
       5887., 7608., 5661., 4172., 5920.,  474., 9362., 4400., 9747.,
        985., 2734., 2967., 1455., 7431., 1031., 2083., 7761., 3859.,
       3220., 3252., 6401.,  745., 9711., 7131.,  928., 3021., 3815.,
       1662., 5869., 4557., 1048., 7624., 6371., 6308., 1509., 4979.,
       6245., 4965., 1116., 8241., 2490.])

In [139]: group.searchsorted(data)
Out[139]:
array([3, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4, 3, 4, 2, 4, 3, 4, 2, 3, 3, 3,
       4, 3, 3, 4, 3, 3, 3, 4, 2, 4, 4, 2, 3, 3, 3, 4, 3, 3, 4, 4, 4, 3,
       3, 4, 3, 3, 4, 3], dtype=int64)
# pandas 可以使用该结果利用groupby()进行针对分组的数据分析
In [140]: import pandas as pd;

In [141]: pd.Series(data).groupby(group.searchsorted(data))
Out[141]: <pandas.core.groupby.generic.SeriesGroupBy object at 0x0000021C2C5F6B38>

In [142]: pd.Series(data).groupby(group.searchsorted(data)).mean()
Out[142]:
2     783.000000
3    3038.000000
4    7609.041667
dtype: float64

使用numba库编写快速函数

可以将python代码转换成机器代码

使用jit()函数

In [4]: import numpy as np,  numba as nb

# x-y的平均数
In [5]: def mean_distance(x, y):
   ...:     nx = len(x)
   ...:     result = 0.0
   ...:     count = 0
   ...:     for i in range(nx):
   ...:         result += x[i] - y[i]
   ...:         count += 1
   ...:     return result / count
   ...:

In [6]: x = np.random.randn(10000000)

In [7]: y = np.random.randn(10000000)

In [8]: %timeit mean_distance(x,y)
7.55 s ± 622 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [9]: %timeit (x-y).mean()
59.1 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: numba_mean_distance = nb.jit(mean_distance)

In [11]: %timeit numba_mean_distance(x,y)
22 ms ± 3.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

使用装饰器

 In [12] @nb.jit
    ...: def mean_distance(x,y):
    ...:     nx = len(x)
    ...:     result = 0.0
    ...:     count =0
    ...:     for i in range(nx):
    ...:         result += x[i] -y[i]
    ...:         count += 1
    ...:     return result/count
In [13]: %timeit mean_distance(x,y)
22.8 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

自定义numpy.ufunc对象

In [17]: from numba import vectorize

In [18]: @vectorize
    ...: def my_add(x,y):
    ...:     return x+y
    ...:

In [19]: x = np.arange(10)

In [21]: my_add(x,x)
Out[21]: array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18], dtype=int64)

高级数组的输出输入

内存映像文件对磁盘读写

# memmap() 需传入参数文件路径、数据类型、形状以及文件模式
In [39]: mmap = np.memmap('mymmap', dtype='float64',mode='w+',shape=(10000,10000))

In [40]: mmap
Out[40]:
memmap([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]])
# 返回磁盘上的数据的一个视图     
In [43]: section=mmap[:5]
# 将值赋给试图,此时数据被缓存在内存中
In [44]: section[:]=np.random.randn(5,10000)
# flush() 则将其写入磁盘
In [45]: mmap.flush()

In [46]: mmap
Out[46]:
memmap([[-0.69302268, -1.16286652,  1.01013265, ...,  0.73541862,
         -0.52401082,  0.79853806],
        [-0.1564463 ,  0.56630705,  1.01664617, ...,  0.52664146,
          0.61622246,  0.29153165],
        [-0.17849793, -0.66249437, -0.44119979, ...,  0.07165869,
         -0.09034793, -2.51524144],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]])

In [47]: del mmap
# 即使删除依旧存在于磁盘中
In [53]: mmap = np.memmap('mymmap', dtype='float64',mode='r',shape=(10000,10000))

In [54]: mmap
Out[54]:
memmap([[-0.69302268, -1.16286652,  1.01013265, ...,  0.73541862,
         -0.52401082,  0.79853806],
        [-0.1564463 ,  0.56630705,  1.01664617, ...,  0.52664146,
          0.61622246,  0.29153165],
        [-0.17849793, -0.66249437, -0.44119979, ...,  0.07165869,
         -0.09034793, -2.51524144],
        ...,
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        , ...,  0.        ,
          0.        ,  0.        ]])

提高性能

访问内存中连续存储方式的数组一般是最快的,数组连续类型分为Fortran(列优先)和C(行优先),可通过flags属性查看信息

In [55]: arr_c = np.ones((1000,1000),order='C')

In [56]: arr_f = np.ones((1000,1000),order='F')

In [57]: arr_c.flags
Out[57]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
# copy()可更改连续类型
In [58]: arr_c.copy('F').flags
Out[58]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
#行优先只在行切片下连续,列优先只在列切片下连续
In [60]: arr_c[:50].flags
Out[60]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
In [62]: arr_c[:,:5].flags
Out[62]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

In [63]: arr_c.copy('F')[:,:5].flags
Out[63]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False
  • 1
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值