py numpy 库

yichudu

已于 2022-12-08 15:18:38 修改

阅读量2.8k

点赞数 3

分类专栏： python 文章标签： numpy

于 2017-03-23 15:47:10 首次发布

天天开心

本文链接：https://blog.csdn.net/chuchus/article/details/65444580

版权

python 专栏收录该内容

54 篇文章 2 订阅

订阅专栏

简介

用于矩阵等的科学计算.
可通过 pip install numpy 安装, 通过import numpy as np引入.
矩阵一般用ndarray表示, N-dimensional array object.

构造 ndarray 对象

从 py list 对象中构造
np.array([[1,2],[3,4]])
自然数数列
np.arange(x)
得到 1d array, [0,1,2, … , x-1]等价于 np.array(list(range(x)).
均匀分布
np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
指定区间与采样个数, 返回均匀间隔的数组.
指定元素
手动拼好放到array()的构造函数中.
单位矩阵(方阵)
np.eye(3), 即对角线元素全为1, 其余元素全为0.
全1矩阵
np.ones((2,3))
0型矩阵
np.zeros(shape, dtype=float, order='C')

指定规则
tile(A, reps) 将A矩阵当作一个元素, 按照重复规则生成新的复合矩阵.如

arr=np.tile([[1,2],[3,4]], [2,3])
print arr
"""
[[1 2 1 2 1 2]
 [3 4 3 4 3 4]
 [1 2 1 2 1 2]
 [3 4 3 4 3 4]]
"""
arr=np.tile([1.23], 3)
# [1.23, 1.23, 1.23]
print arr

random
见后续章节, random 类.

dtype

python中表示整数类型的只有 int, np 中有 np.int32, np.int64 多种.
当放入中文字符串时, 会观察到类型为 <U3 , 不直观, 先mark.

ndarray 的 shape

print np.array([1,2]).shape # (2L,), type(shape) 是 tuple
print np.array([[1,2]]).shape   #(1L, 2L)
# 注意这两种的区别, 前者是一维数组, 后者是二维数组.

np.reshape(a, newshape, order='C')
调整ndarray的尺寸. 若shape 中某个分量是-1, 代表着这一维度的尺寸会自动计算.
如 a = np.arange(6).reshape((3, 2))
或 a = np.arange(6).reshape((3, -1)), 得到
array([[0, 1],
[2, 3],
[4, 5]])
expand_dims(a, axis)
维度扩充.

 Examples
    --------
    >>> x = np.array([1,2])
    >>> x.shape
    (2,)

    >>> y = np.expand_dims(x, axis=0)
    >>> y
    array([[1, 2]])
    >>> y.shape
    (1, 2)

ndarray 的 axis

ndarray是多维的, axis=i 就指定了第i个维度. 很多计算函数都有axis参数可以指定, 得到不同的计算结果.
对于2维数组, 如果axis=0，则沿着纵轴进行操作；axis=1，则沿着横轴进行操作。
但如果是多维的呢？可以总结为一句话：设axis=i，则numpy沿着第i个下标变化的方向进行操作。
if axis=-1 it will be the last dimension.

#axis参数对计算结果的影响
a
Out[178]: 
array([[1, 2],
       [3, 4],
       [5, 6]])

np.mean(a,axis=0)
Out[179]: array([ 3.,  4.])

np.mean(a,axis=1)
Out[180]: array([ 1.5,  3.5,  5.5])

np.mean(a)
Out[181]: 3.5

ndarray 的比较

如果用 is 比较, 则是去比内存区域.
两个相同 shape 的 array, 用 ==判断, 会得到同样 shape 的 bool 数组, 代表 element-wise 的比较结果. 如果想整体比较怎么办?
用 (a==b).all(), 规则是所有元素全为 True, 才返回 True.
当浮点数难以直接比较, 可以通过 np.round() 来指定精度后再比较.

array 切片读写

切片是指按照一定的规则截取部分.

灵活的切片表达语法

arr[截取表达式]. 最常见用法见下图.
这属于python 中的 slice, getitem() 语法.
在这里插入图片描述
图3-1 ndarray截取示意图

对于二维数据, arr[i,j] 取的是个位于 i行j列的标量, 等效于 arr[i][j].
arr(i)则执行报错, not a callable object.

补充一些 slice 语法:

当 slice 是一个数组时, 就相当于按照指定的一批 index 批量读取. 也可用于等号左边作赋值, 此时广播机制也可以用.

Python 3.9.12 (main, Apr  4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)]

import numpy as np
a=np.arange(10)
a
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b=np.array([2,3,5])
a[b]=88
a
Out[7]: array([ 0,  1, 88, 88,  4, 88,  6,  7,  8,  9])
a[b]
Out[8]: array([88, 88, 88])

当 slice 的 shape 与 array shape 不一致时, 省略的部分等价于默认: 符号.
当某个 axis 上需要指定区间时, 可用 from:to 的左闭右开写法.
当某个 axis 上需要间隔指定时, 可用 (1,3,5) 的tuple写法.

import numpy as np

a = np.arange(0, 2 * 3 * 4).reshape((2, 3, 4))

b1 = a[:, 1]
b2 = a[:, 1, :]
# True
print('(b1 == b2).all()', (b1 == b2).all())

c1 = a[:, :2]
c2 = a[:, (0, 1)]
# True
print('(c1 == c2).all()', (c1 == c2).all())

指定位置保留

import numpy as np
a=np.array(['新','年','快','乐'])
# array(['新', '快', '乐'], dtype='<U1')
b=a[[True,False,True,True]]
# array(['新', '快', '乐'], dtype='<U1')
c=a[[0,2,3]]

常用运算

单个数组

# 这些常用运算既可以对一个数计算, 也可以对 数组,矩阵中 每一个元素分别计算
np.tanh(x) #求双曲正切
np.sqrt(x) #求平方根
np.log(x) #求e为底的对数
np.log2(x) #求2为底的对数
np.log10(x) #求10为底的对数
np.power(a,b) # 求a^b
np.abs(x) #求x中每个元素的绝对值, 返回同尺寸的ndarray.
np.fill(x) # 数组内的每个元素填充为指定的x

数组间运算
- 减法
  a.shape=(m,n) b.shape=(n,)
  a-b的效果就是把二维数组a拆成m个shape为(n,)的一维数组,再逐个相减,再stack堆叠.
- 乘法
  对应元素相乘.
  a.shape=(m,n) b.shape=(m,n), a*b 与 np.multiply(a,b) 等价, 结果的shape=(m,n).
点乘 np.dot(a,b)
计算两个数组的点乘, 具体地,
- 当 a,b 是一维数组时, 理解为向量内积.
- 当 a,b 是二维矩阵时, 就是矩阵乘法 matrix multiplication.

度&弧度

np.sin()
"""角度(angle) 有两种表示，度(degree) 和 弧度(radian). 
弧度:度 = pi:180 = 3.1415:180
这里计算的是弧度.
所以想计算正弦30°需要np.sin(30*np.pi/180)"""

argmax(a,axis=None)
找出最大元素的下标. 详细参考: scipy-numpy.argmax

y=argsort(a,axis=-1)
返回排序后元素的原位置, 返回值y怎么解读呢? 第i大的元素, 其原位置为 y[i].
默认升序排列, 想降序就乘以-1.
还可用于计算学生成绩排名.

import numpy as np
score=np.array([9,7,8])
rank_arg=np.argsort(score*-1)
rank=np.argsort(rank_arg)+1
#  ['成绩=9,排名=1', '成绩=7,排名=3', '成绩=8,排名=2']
result=['成绩={},排名={}'.format(k,v) for k,v in zip(score,rank)]

数组的常用统计指标

x = (0,1,5)
np.var(x)  # 方差=4.66
np.mean(x)  # 均值=2
np.median(x) # 中位数
np.std(x)	# 标准差,即 np.sqrt(np.var(x))
np.percentile(x, (30,50)) # 计算数组的 30-th, 50-th 百分位数, 从小到大

矩阵属性

import numpy as np
line1=(1,2,3)
line2=(4,5,6)
# 打印矩阵
arr=np.array([line1,line2])
print arr
print arr.ndim,#秩
print arr.I,#逆矩阵
print arr.inverse(),#逆矩阵
print arr.transpose(),#逆矩转置
print arr.T,#逆矩转置
print arr.size ,#元素个数
print arr.shape ,#矩阵大小,2*3
print type(arr)

矩阵运算

#coding=utf-8
import numpy as np
a=np.array([[1,2],
            [3,4]])

b=np.array([[5,6],
            [7,8]])
print a+b   #对应元素相加
print a-b   #对应元素相减
print np.dot(a,b)   #矩阵乘法
print a*b   #对应元素相乘
print a**2  #每个元素求平方

print a.sum()   #所有元素相加,得到一个type 'numpy.int32'
print a.sum(axis=0) #每列元素相加
print a.sum(axis=1) #每行元素相加

余弦相似度

import numpy as np

# 勾三弧四的直角三角形的两条边
a = np.array([3, 0])
b = np.array([3, 4])

sim = np.dot(a.T, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print('cosine <a,b> : {0}'.format(sim))
"""cosine <a,b> : 0.6"""

向量计算

import numpy as np

np.cross(a,b) # 叉乘, 0说明两向量共线(角度为0或180)
np.dot(a,b) #点乘, 0说明正交
np.linalg.norm(a) # 向量a的模

数组广播

broadcasting，不同 shape 的数组(或标量)作数值运算时，numpy 会对某些较低维度元素作自动 tile 的机制，可减少编码量。

广播条件
并不是任意两个 shape 的数组都能广播，得满足兼容条件。
对两个数组的shape作右对齐，若某一列两个维度均大于1且不相等，则报错ValueError: operands could not be broadcast together with shapes ${shape1} ${shape2} .
触发自动广播后的计算结果，shape的每一列的值为两个数组的较大维度，例子见下：

A      (3d array): 256 x 256 x 3
B      (1d array):             3
Result (3d array): 256 x 256 x 3

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

A      (2d array):  5 x 4
B      (1d array):      1
Result (2d array):  5 x 4

A      (2d array):  5 x 4
B      (1d array):      4
Result (2d array):  5 x 4

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 5
Result (3d array):  15 x 3 x 5

A      (3d array):  15 x 3 x 5
B      (2d array):       3 x 1
Result (3d array):  15 x 3 x 5

例子

import numpy as np

a = np.array([[[1, 2, 3], [4, 5, 6]]])
b = np.array([[[1, 2, 3]]])
print('a.shape={},b.shape={},a+b={}'.format(a.shape, b.shape, a + b))

c = np.tile(b, (1, 2, 1))
print('a.shape={},c.shape={},a+c={}'.format(a.shape, c.shape, a + c))
"""
a.shape=(1, 2, 3),b.shape=(1, 1, 3),a+b=[[[2 4 6]
  [5 7 9]]]
a.shape=(1, 2, 3),c.shape=(1, 2, 3),a+c=[[[2 4 6]
  [5 7 9]]]
"""

array 间拼接

np.concatenate(arrays, axis=None, dtype=None)
用途最广泛的函数. arrays 每个元素必须有相同的 dim.
下面方法能用到的场景, 该方法也都可以.
深度学习中, 也都是用这个方法名.
np.stack(arrays, axis=0)
将若干个 shape相同 的 array_like 对象拼接在一起.
纵向拼接
- np.vstack(tup), vertical stack.
- np.row_stack = vstack
  以上两个等价于 np.stack(tup, axis=0), 但不要求 first axis 相等, 更实用.
横向拼接
- np.hstack(tuple), horizontal stack.
  即可. 如 a.shape=(3,4), b.shape=(3,1) , 那么拼接后的shape为(3,5).
- np.column_stack(tup)
  这里并没有看到 column_stack= hstack, 但应该是一致的.
例子

import numpy as np

a = [1, 2, 3]
b = [5, 6, 7]
c = [a, b]

x = np.stack((a, b))
y = np.row_stack((a, b))
"""
array([[1, 2, 3],
       [5, 6, 7]])
"""
# ValueError: all input arrays must have the same shape
# z = np.stack((a,c))

z = np.row_stack((a, c))
"""
array([[1, 2, 3],
       [1, 2, 3],
       [5, 6, 7]])
"""

d = np.column_stack((a, b))
f = np.stack((a, b), axis=1)
"""
array([[1, 5],
       [2, 6],
       [3, 7]])
'"""

np.random

from numpy import random

常用函数

random.rand(*dn)
随机生成 shape 为 rand(d0, d1, …, dn) 的, 值为 [0,1) 的数组.
random.standard_normal(size)
Draw samples (draw 有抽取的意思) from a standard Normal distribution (mean=0, stdev=1), 即标准正态分布.
random.uniform(a,b)
Get a random number in the range [a, b). By default, it’s [0,1)
random.randint(low, high=None, size=None, dtype=‘l’)
产生[low,high)范围的int型随机数. 返回类型为int 或 ndarray.
random.choice(a, size=None, replace=True, p=None)
从一维数组a中按指定概率作随机采样.
- a, 待抽样的数据总体
- size, 抽取个数
- 是否有放回
- p, 不指定概率时, 就按均匀分布抽取
np.random.permutation(x) -> 一维数组
返回随机打散后的数组排列, 等价于对数组作 shuffle.
当 x 为数组时, 属于典型用法.
当 x 为int时, 等价于先生成 range(0,x) 的数组再打散排列.

例子

from numpy import random
arr=random.standard_normal((2,3))
print(arr.__class__,'\n',arr)
num = random.uniform(0,1)
print(num.__class__,num)
"""
<class 'numpy.ndarray'> 
 [[ 0.27243397 -0.83744812 -0.33860031]
 [-1.36952477 -1.06641186 -0.4565501 ]]
<class 'float'> 0.06643459053873624
"""

ndarray 与 matrix

凡是能用 matrix 的地方, ndarray 也都能满足.
ndarray 可以是多维的, 而matrix只能是二维的, 严格对应数学教材中的矩阵.
matrix.I 矩阵的逆
mat1*mat2 矩阵的乘法.

例子

#coding=utf-8
import numpy as np

arr1=np.array([[1,2],
          [3,4]])
arr2=np.array([[1,2],
          [3,4]])

mat1=np.matrix(arr1)
mat2=np.matrix(arr2)
print mat1*mat2 #矩阵乘法,对应np.dot(a,b)
print ('Inverse',mat1.I)    #矩阵的逆

ndarray 与 list 的互相转换

np.array(object,dtype)
np.tolist(self)

格式化 print

当数组尺寸较大, 直接打印会有 ‘…’ 省略, 若是 debug 场景, 则需展现全部元素, 怎么办?

1. np.`set_printoptions`(edgeitems,linewidth)

该 api 参数众多, 最常用的是这俩.

edgeitems=n, 是说数组的一行中, 前n个和后n个作展现, 中间就是省略号. ndarray 是多维数组, 可以把某层级的数组当成是更上一层级的元素, 也遵循 edgeitems 的约定.

# edgeitems=2
[[1. 0. ... 0. 0.]
[0. 1. ... 1. 1.]
...
[1. 1. ... 0. 1.]
[0. 0. ... 1. 0.]]

# edgeitems=3
[[1. 0. 0. ... 0. 0. 0.]
[0. 1. 1. ... 1. 1. 1.]
[0. 1. 0. ... 0. 0. 0.]
...
[1. 1. 1. ... 1. 1. 0.]
[1. 1. 1. ... 0. 0. 1.]
[0. 0. 0. ... 1. 1. 0.]]

linewidth 一行的宽度, 单位是字符, 不是元素个数.

2. ndarray.tolist()

很直观, 不再赘述.

内存数组<->文件的序列与反序列化

str 格式, 可读性高

the following method is intuitive.
np.ndarray -> list -> json_str(持久化且人易读) -> list -> np.array(list_obj)
code sample:

import json
import numpy as np

np_arr = np.array([[1, 2], [7, 8]])
json_str = json.dumps(np_arr.tolist())
arr = json.loads(json_str)
restore_np = np.array(arr)
# True
print((np_arr == restore_np).all())

.npy文件

np.load和np.save是读写磁盘数组数据的两个主要函数，默认情况下，数组是以未压缩的原始二进制格式保存在扩展名为.npy的文件中。

.npz文件

如果你想将多个数组保存到一个文件中的话，可以使用numpy.savez函数。

savez(), 函数的第一个参数是文件名，其后的参数都是需要保存的数组.
np.load(npzFIle)
NpzFile.files 该字段代表着保存的数组名称
NpzFile[arrName], 返回指定的数组.

代码示例

import numpy
import numpy as np

# 存储单个 np数组
a = np.arange(5)
np.save('test.npy', a)

# 加载 npz 格式的多数组文件
d = np.load('C:/Users/yichu.dyc/Downloads/ihdp_npci_1-100.train.npz')  # type: np.lib.npyio.NpzFile
# ['ate', 'mu1', 'mu0', 'yadd', 'yf', 'ycf', 't', 'x', 'ymul']
d.files
# (672, 25, 100)
d['x'].shape