c、去重
arr = np.array([21,2,21, 1, 0 ,1])
np.unique(arr, axis=None)
array([ 0, 1, 2, 21])
去重后整行与整列去重:
axis=0表示对比整行数,如果整行值都相同,则进行去重;
axis=1表示对比整列数,如果整列值都相同,则进行去重;
arr2 = np.array([[2,1,1,2],[2,1,1,2],[1,3,2,5]])
np.unique(arr2, axis=None)
array([1, 2, 3, 5])
arr2 = np.array([[2,1,1,2],[2,1,1,2],[1,3,2,5]])
np.unique(arr2, axis=0)
array([[1, 3, 2, 5],
[2, 1, 1, 2]])
print(arr2)
np.unique(arr2, axis=1)
[[2 1 1 2]
[2 1 1 2]
[1 3 2 5]]
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[2, 3, 1, 5]])
(6)文件操作
a、二进制文件操作; b.文本文件
.npy / .npz:二进制文件, 必须通过numpy来操作;
a、二进制文件的生成与读取:
arr3 = np.random.random(100).reshape((10,10))
np.save('arr.npy', arr3)
b.读取二进制文件:
np.load('arr.npy')
array([[0.94166009, 0.08707178, 0.24738943, 0.63504731, 0.6243815 ,
0.47913674, 0.16587397, 0.05695113, 0.51010674, 0.90441856],
[0.32740982, 0.86440694, 0.89363497, 0.68414786, 0.17839399,
0.05469072, 0.11002265, 0.4023877 , 0.88905758, 0.72660147],
[0.13412419, 0.64377726, 0.3068447 , 0.74123468, 0.33092074,
0.71307152, 0.67174501, 0.78521112, 0.90050852, 0.50450854],
[0.88072315, 0.54935894, 0.75141671, 0.08125172, 0.22635412,
0.70076483, 0.42758755, 0.17078626, 0.2248808 , 0.44911113],
[0.59483549, 0.65632765, 0.31902911, 0.8290965 , 0.03132135,
0.23629831, 0.12177865, 0.60715191, 0.04982158, 0.36634437],
[0.6571213 , 0.08188189, 0.25241281, 0.89119618, 0.3374529 ,
0.62371096, 0.75715895, 0.85582739, 0.76904704, 0.22617414],
[0.66571791, 0.60909379, 0.83446684, 0.40617087, 0.21977122,
0.08921085, 0.91250163, 0.22541365, 0.87649993, 0.87391666],
[0.76833715, 0.19886483, 0.49752762, 0.11773357, 0.84235592,
0.65317118, 0.36145973, 0.96081981, 0.03722321, 0.72876165],
[0.27106625, 0.72437395, 0.21744038, 0.31800489, 0.92305494,
0.26010492, 0.97360296, 0.98171742, 0.41868223, 0.91821043],
[0.06910287, 0.76526378, 0.88465488, 0.57540593, 0.2064483 ,
0.96813549, 0.51524908, 0.72942177, 0.70780959, 0.6597308 ]])
c.多个数组的保存与读取:
arr1 = np.random.random(1000).reshape((10, 100))
np.savez('arrz', arr, arr1)
data = np.load('arrz.npz')
print(data)
<numpy.lib.npyio.NpzFile object at 0x0000000012BCFBA8>
# 查看npzfile下的npy文件: arr_0.npy, arr_1.npy
print(list(data))
['arr_0', 'arr_1']
print(data['arr_0'])
[21 2 21 1 0 1]
d、.txt文件的保存与读取
np.savetxt('arr2', arr2, fmt='%d', delimiter=',' )
arr = np.loadtxt('arr2', delimiter=',',dtype='str')
print(arr)
[['2' '1' '1' '2']
['2' '1' '1' '2']
['1' '3' '2' '5']]
(7)统计分布方法
a. 最小值、最大值
arr = np.array([[1, 2, 4, 6], [2, 2, 4, 20], [3, 9, 20, 5]])
print(arr)
[[ 1 2 4 6]
[ 2 2 4 20]
[ 3 9 20 5]]
(1) 最小值
print('获取整个数组中最小的值:\n', arr.min())
print('获取每一行的最小值:\n', arr.min(axis=1))
print('获取每一列的最小值:\n', arr.min(axis=0))
获取整个数组中最小的值:
1
获取每一行的最小值:
[1 2 3]
获取每一列的最小值:
[1 2 4 5]
(2)最大值
print(arr.max())
print(arr.max(axis=1))
print(arr.max(axis=0))
20
[ 6 20 20]
[ 3 9 20 20]
(3)最大值索引
print(arr.argmax())
print(arr.argmax(axis=1))
print(arr.argmax(axis=0))
7
[3 3 2]
[2 2 2 1]
(4)最小值索引;
print(arr.argmin())
0
(5-6)标准差与方差
方差与标准差可以反映数据的分布:
当标准差与方差越大, 表示数据越分散;
当标准差与方差越小, 数据越集中;
print('标准差:\n', arr.std(axis=1))
标准差:
[1.92028644 7.54983444 6.57171971]
print('方差:\n', arr.var())
方差:
40.75
(7-8)求和与求均值
print(arr.sum())
print('按行求和:\n', arr.sum(axis=1))
print('按列求和:\n', arr.sum(axis=0))
78
按行求和:
[13 28 37]
按列求和:
[ 6 13 28 31]
print('求均值:\n', arr.mean())
求均值:
6.5
(9)累计求和
print(arr.cumsum())
[ 1 3 7 13 15 17 21 41 44 53 73 78]
(10)累计求积
print(arr)
[[ 1 2 4 6]
[ 2 2 4 20]
[ 3 9 20 5]]
arr_2 = arr.cumprod()
print('累计求积:\n', arr_2)
累计求积:
[ 1 2 8 48 96 192 768 15360
46080 414720 8294400 41472000]
print(arr_2[-1])
41472000