Python初学小知识（十一）：科学计算库NumPy

三耳01

已于 2023-01-25 01:30:04 修改

阅读量657

点赞数

分类专栏： Python初学小知识文章标签： pytorch 深度学习 python numpy

于 2021-11-21 13:00:38 首次发布

本文链接：https://blog.csdn.net/niexinyu0026/article/details/121452333

版权

Python初学小知识专栏收录该内容

14 篇文章 4 订阅

订阅专栏

Python初学小知识（十一）：NumPy

十五、NumPy

十五、NumPy

NumPy是一个高性能的科学计算和数据分析基础包，它具有多维数组对象、线性代数、傅里叶变换和随机数等强大功能。

1. 多维数组

1.1 numpy可以打开txt文件

word = np.genfromtxt('word.txt', delimiter='\n', dtype=str)
print(word)
>>> ['23412' 'dafg']

delimiter='\n’表示分隔符是换行符。
不过pandas读取文件更简单，用的更多。

1.2 创建多维数组

numpy的核心就是 向量 vector 和 矩阵 matrix。

1.2.1 只有一维，就是向量：

import numpy as np  #平时的使用中，习惯将import numpy写成import numpy as np
print(np.array([1, 2, 3]))
>>> [1, 2, 3]

1.2.2 二维，矩阵：

print(np.array([[1, 2, 3], [4, 5, 6]]))
>>> [[1 2 3]
	 [4 5 6]]

1.2.3 0矩阵、1矩阵

np.zeros([2, 3]), np.ones([2, 3]), np.ones((2, 3))  #里面的维度使用列表或者元组形式都行
>>> (array([[0., 0., 0.],
            [0., 0., 0.]]),
     array([[1., 1., 1.],
  	        [1., 1., 1.]]),
     array([[1., 1., 1.],
  	        [1., 1., 1.]]))

a = np.ones([2, 3])
a[1, 2] = 2
a
>>> array([[1., 1., 1.],
     	   [1., 1., 2.]])

1.2.4 arange与linspace

np.arange(2000)
>>> array([   0,    1,    2, ..., 1997, 1998, 1999])

x = np.arange(15).reshape(3,5)
print(x)
>>> [[ 0  1  2  3  4]
	 [ 5  6  7  8  9]
	 [10 11 12 13 14]]

linspace与arange不太一样，linspace是输入起始值和终点值，然后在这两个数中取n等份的数。

x = np.linspace(0, 10, 11)  # 在0和10中取11个平均的数
print(x)
>>> [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

1.3 多维数组的常用属性

1.3.1 adim返回维度数量

a = np.array([1, 2])
b = np.array([[1, 2, 3], [4, 5, 6]])
c = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [0, 0, 0]]])
print(a.ndim, b.ndim, c.ndim)  #其实就是有几个中括号
>>> 1 2 3

1.3.2 shape返回维度值

print(c)
print(a.shape, b.shape, c.shape)
>>> [[[1 2 3]
      [4 5 6]]

    [[7 8 9]
     [0 0 0]]]
    (2,) (2, 3) (2, 2, 3)

这种比较有用的是debug的操作。

1.3.3 ndim返回维度

x = np.arange(15).reshape(3,5)
print(x)
>>> [[ 0  1  2  3  4]
	 [ 5  6  7  8  9]
	 [10 11 12 13 14]]
print(x.ndim)
>>> 2

1.3.4 size返回元素总数量

print(a.size, b.size, c.size)
>>> 2 6 12

1.3.5 dtype返回数据类型

d = np.ones((2, 3))
a.dtype, d.dtype
>>> (dtype('int32'), dtype('float64'))

d = np.ones((2, 3), dtype = int)
e = np.ones((2, 3), dtype = np.int32)
d.dtype, e.dtype  #更改数据类型的写法，如果是改为int32，就有两种，int默认是int32
>>> (dtype('int32'), dtype('int32'))

1.3.6 astype类型转换

x = np.array(['1', '2', '3'])
print(x, x.dtype)
>>> ['1' '2' '3'] <U1

x = x.astype(float)
print(x, x.dtype)
>>> [1. 2. 3.] float64

1.3.7 itemsize返回字节大小

e.itemsize  #32/8=4
>>> 4

2. 多维数组的基本操作

2.1 算术运算

2.1.1 加减乘除余幂

a = np.array([4, 5, 6])
b = np.array([1, 2, 3])
a + b, a - b, a * b, a / b, a % b, a ** b
# 其中a * b就是内积，对应位置相乘

(array([5, 7, 9]),
 array([3, 3, 3]),
 array([ 4, 10, 18]),
 array([4. , 2.5, 2. ]),
 array([0, 1, 0], dtype=int32),
 array([  4,  25, 216], dtype=int32))

2.1.2 矩阵运算

a.dot(b), np.dot(a, b)
>>> (32, 32)

转置：

a = np.array([[4, 5, 6], [1, 2, 3]])
print(a)
>>> [[4 5 6]
	 [1 2 3]]

print(a.T)
>>> [[4 1]
	 [5 2]
	 [6 3]]

2.1.3 数组直接和标量运算

a + 2, a - 2, a * 2, a / 2, a % 2, a ** 2

(array([6, 7, 8]),
 array([2, 3, 4]),
 array([ 8, 10, 12]),
 array([2. , 2.5, 3. ]),
 array([0, 1, 0], dtype=int32),
 array([16, 25, 36], dtype=int32))

2.2 自身运算

2.2.1 最小/大值、对应的index，总和sum

x = np.array([1, 2, 3])
x.min(), x.max(), x.sum()
>>> (1, 3, 6)

按行、按列计算：

#axis=0时，按列计算，保留行
x = np.array([[1, 5, 6], [4, 2, 3]])
print(x.min(), x.min(axis=0), x.min(axis=1))
>>> 1 [1 2 3] [1 2]

print(x.max(), x.max(axis=0), x.max(axis=1))
>>> 6 [4 5 6] [6 4]

print(x.sum(), x.sum(axis=0), x.sum(axis=1))
>>> 21 [5 7 9] [12  9]

获取对应的索引：

x = np.array([[1, 5, 6, 8], [4, 2, 3, 10], [9, 0, 7, 2]])
print(x)
>>> [[ 1  5  6  8]
	 [ 4  2  3 10]
	 [ 9  0  7  2]]

print(x.argmin(), x.argmin(axis=0), x.argmin(axis=1))
>>> 9 [0 2 1 2] [0 1 1]

print(x.argmax(), x.argmax(axis=0), x.argmax(axis=1))
>>> 7 [2 0 2 1] [3 3 0]

2.2.2 指数运算，平方根运算，二次方运算

x = np.array([1, 2, 3])
np.exp(x), np.sqrt(x), np.square(x)

(array([ 2.71828183,  7.3890561 , 20.08553692]),
 array([1.        , 1.41421356, 1.73205081]),
 array([1, 4, 9], dtype=int32))

2.2.3 判断是否等于某个数

a = np.array([1,3,4])
print(a == 3)  # [False  True False]，此处dtype=bool，布尔类型

if_equal = (a == 3)  # [False  True False]
print(a[if_equal])  # [3]
# 说明会输出布尔值为True的值

2.2.4 与、或

a = np.array([5,10,15,20])
print((a==10) & (a==5))  # 与
>>> [False False False False]

print((a==10) | (a==5))  # 或
>>> [ True  True False False]

# 可以把布尔值作为输入
b = (a==10) | (a==5)
print(a[b])
>>> [ 5 10]

a[b] = 6
print(a)
>>> [ 6  6 15 20]

2.2.5 floor向下取整

a = np.array([1.2, 2.3, 3.4])
print(a, np.floor(a))
>>> [1.2 2.3 3.4] [1. 2. 3.]

2.2.6 ravel矩阵转变为向量

a = np.arange(1,16,1).reshape(3,5)
print(a)
>>> [[ 1  2  3  4  5]
	 [ 6  7  8  9 10]
	 [11 12 13 14 15]]
print(a.ravel())  # 也可以用np.ravel(a)
>>> [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]

2.2.7 hstack、vstack拼接相同shape的数组

a = np.arange(1,7,1).reshape(2,3)
b = np.arange(8,14,1).reshape(2,3)
print(a)
>>> [[1 2 3]
	 [4 5 6]]
print(b)
>>> [[ 8  9 10]
	 [11 12 13]]

按列拼接：

print(np.hstack((a,b)))
>>> [[ 1  2  3  8  9 10]
	 [ 4  5  6 11 12 13]]

按行拼接：

print(np.vstack((a,b)))
>>> [[ 1  2  3]
	 [ 4  5  6]
	 [ 8  9 10]
	 [11 12 13]]

2.2.8 hsplit、vsplit拆分

按列拆分：

a = np.arange(1,25,1).reshape(2,12)
print(a)
>>> [[ 1  2  3  4  5  6  7  8  9 10 11 12]
	 [13 14 15 16 17 18 19 20 21 22 23 24]]

按列平均拆分成3份：

print(np.hsplit(a,3))
>>> [array([[ 1,  2,  3,  4],
	       [13, 14, 15, 16]]), array([[ 5,  6,  7,  8],
	       [17, 18, 19, 20]]), array([[ 9, 10, 11, 12],
	       [21, 22, 23, 24]])]

在索引为1、2、6的地方各切一刀：

print(np.hsplit(a, (1,2,6)))
>>> [array([[ 1],
	       [13]]), array([[ 2],
	       [14]]), array([[ 3,  4,  5,  6],
	       [15, 16, 17, 18]]), array([[ 7,  8,  9, 10, 11, 12],
	       [19, 20, 21, 22, 23, 24]])]

按行拆分：

b = a.reshape(12,2)
print(b)
>>> [[ 1  2]
	 [ 3  4]
	 [ 5  6]
	 [ 7  8]
	 [ 9 10]
	 [11 12]
	 [13 14]
	 [15 16]
	 [17 18]
	 [19 20]
	 [21 22]
	 [23 24]]

按行平均拆分为6份：

print(np.vsplit(b, 6))
>>> [array([[1, 2],
	       [3, 4]]), array([[5, 6],
	       [7, 8]]), array([[ 9, 10],
	       [11, 12]]), array([[13, 14],
	       [15, 16]]), array([[17, 18],
	       [19, 20]]), array([[21, 22],
	       [23, 24]])]

按行在1、6处拆分：

print(np.vsplit(b, (1,6)))
>>> [array([[1, 2]]), array([[ 3,  4],
	       [ 5,  6],
	       [ 7,  8],
	       [ 9, 10],
	       [11, 12]]), array([[13, 14],
	       [15, 16],
	       [17, 18],
	       [19, 20],
	       [21, 22],
	       [23, 24]])]

2.2.9 tile扩充

把行、列都变为原来的几倍：

a = np.arange(4)
print(np.tile(a, (2,3)))
>>> [[0 1 2 3 0 1 2 3 0 1 2 3]
	 [0 1 2 3 0 1 2 3 0 1 2 3]]

2.2.10 sort排序

x = np.array([[1, 5, 6, 8], [4, 2, 3, 10], [9, 0, 7, 2]])
print(x)
>>> [[ 1  5  6  8]
	 [ 4  2  3 10]
	 [ 9  0  7  2]]

print(np.sort(x))
>>> [[ 1  5  6  8]
	 [ 2  3  4 10]
	 [ 0  2  7  9]]

print(np.sort(x, axis=0))
>>> [[ 1  0  3  2]
	 [ 4  2  6  8]
	 [ 9  5  7 10]]

print(np.sort(x, axis=1))
>>> [[ 1  5  6  8]
	 [ 2  3  4 10]
	 [ 0  2  7  9]]

可以看出来，np.sort(x)和np.sort(x, axis=1)的结果相同，这是可以理解的。

sort的index：

x = np.array([[1, 5, 6, 8], [4, 2, 3, 10], [9, 0, 7, 2]])
print(x)
>>> [[ 1  5  6  8]
	 [ 4  2  3 10]
	 [ 9  0  7  2]]

print(np.argsort(x))
>>> [[0 1 2 3]
	 [1 2 0 3]
	 [1 3 2 0]]
# 这里其实是给出了sort(x)以后，新元素的原索引
print(sort(x))
>>> [[ 1  5  6  8]
	 [ 2  3  4 10]
	 [ 0  2  7  9]]

2.3 随机数组

2.3.1 np.empty

依给定的shape, 和数据类型 dtype, 返回一个数组。数据类型默认为 numpy.float64。

a = np.empty((2,3,4))  # 随机生成矩阵
print(a, a.dtype)
>>> [[[6.23042070e-307 4.67296746e-307 1.69121096e-306 1.33511562e-306]
	  [1.89146896e-307 1.37961302e-306 1.05699242e-307 8.01097889e-307]
	  [1.78020169e-306 7.56601165e-307 1.02359984e-306 1.29060531e-306]]

	 [[1.24611741e-306 1.11261027e-306 1.78019761e-306 1.33511969e-306]
	  [1.42418172e-306 2.04712906e-306 7.56589622e-307 1.11258277e-307]
	  [8.90111708e-307 2.11389826e-307 1.11260619e-306 9.79107192e-307]]] float64  # 结果随机

print(np.empty((2, 3), dtype=int))
>>> [[16843009 16843009 16843009]
	 [16843009 16843009 16843009]]  # 一直都是这个值不变

print(np.empty((2, 3), dtype=np.int8))
>>> [[1 1 1]
	 [1 1 1]]  # 一直都是这个值不变

print(np.empty((2, 3), dtype=np.float64))
>>> [[0.50490027 0.05494886 0.78237593]
	 [0.1434046  0.67016263 0.77145421]]  # 随机

print(np.empty((2, 3), dtype=list))
>>> [[None None None]
	 [None None None]]  # 当数据类型是指对象时，会创建空数组

2.3.2 np.random.rand

np.random.seed(42)  #随机数生成器的随机因子，之后无论运行多少次程序，生成的数据都是一样的

'''生成随机样本数'''
np.random.rand(2, 3)  #[0, 1)范围内，均匀分布
>>> array([[0.37454012, 0.95071431, 0.73199394],
           [0.59865848, 0.15601864, 0.15599452]])

2.3.3 np.random.randn

np.random.randn(2, 3)  #均值为0，方差为1的正态分布
>>> array([[ 1.57921282,  0.76743473, -0.46947439],
           [ 0.54256004, -0.46341769, -0.46572975]])

2.3.4 np.random.randint

np.random.randint(1, 60)  #生成一个1和60之间的随机整数
>>> 25

2.3.5 np.random.binomial

np.random.binomial(10, 0.6, 100)  #10个样本数，每个样本成功的概率是0.6，测试了100次
#从具有指定参数，n次试验和p个成功概率的二项式分布中抽取样本，其中n个整数> = 0，且p在[0,1]区间内。
#（n可以作为浮点输入，但在使用中会被截断为整数）

array([8, 4, 7, 3, 7, 7, 5, 5, 6, 6, 5, 7, 2, 3, 5, 7, 5, 8, 9, 7, 6, 4,
       5, 5, 6, 7, 7, 3, 9, 4, 6, 6, 5, 6, 4, 4, 6, 7, 6, 4, 7, 6, 6, 5,
       8, 5, 4, 7, 7, 3, 7, 4, 6, 7, 4, 6, 6, 7, 6, 8, 6, 3, 7, 5, 7, 5,
       8, 3, 6, 7, 8, 7, 6, 7, 5, 6, 7, 6, 5, 8, 7, 9, 5, 5, 6, 4, 5, 6,
       4, 7, 8, 7, 8, 6, 4, 9, 8, 6, 4, 7])

2.3.6 np.random.beta

np.random.beta(1, 10)  #指定维度且满足beta分布
>>> 0.03929334089516656

2.3.7 np.random.normal

np.random.normal(1, 10)  #指定维度且满足高斯正态分布
>>> -0.3958962815517375

2.4 索引、切片和迭代

x = np.arange(10)
x, x[:5]
>>> (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4]))

x = np.array([[1,2,3], 
              [4,5,6], 
              [7,8,9]])

x[1], x[0:2, 1:3]#第0行、第1行，第1列、第2列

(array([4, 5, 6]),
 array([[2, 3],
        [5, 6]]))

#针对不同维度进行迭代
for i in x:
    print(i)

[1 2 3]
[4 5 6]
[7 8 9]

for i in x:
    for j in i:
        print(j)

#上面的迭代可以扁平化处理
for i in x.flat:
    print(i)

3 id的问题、浅复制、深复制

用了b = a来赋值，实际上b和a都是同一个东西的指代，它们完全相同：

a = np.arange(12)
b = a
print(b is a, b == a)
>>> True [ True  True  True  True  True  True  True  True  True  True  True  True]

b.shape = 3,4
print(a.shape)
>>> (3, 4)

print(id(a) == id(b))
>>> True

如果要不同的地址：浅复制

c = a.view()
print(c is a)
>>> False

c.shape = 2.6
print(a.shape)
>>> (12,)

print(id(c) == id(a))
>>> False

但是！当改变c的数值时，a的数值也会改变：

c[0,4] = 99
print(c)
>>> [[ 0  1  2  3 99  5]
	 [ 6  7  8  9 10 11]]

print(a)
>>> [ 0  1  2  3 99  5  6  7  8  9 10 11]

因此，浅复制是公用一组数据的，不推荐使用。

更推荐用copy()函数深复制：
在这里，d就是a的初始值了：

d = a.copy()
print(d is a)
>>> False

d[3] = 8888
print(d)
>>> [   0    1    2 8888   99    5    6    7    8    9   10   11]

print(a)
>>> [ 0  1  2  3 99  5  6  7  8  9 10 11]