python数据分析与展示-NumPy数据存取与函数

最新推荐文章于 2022-02-27 20:01:26 发布

I am Paul Plus Plus

最新推荐文章于 2022-02-27 20:01:26 发布

阅读量374

点赞数

本文链接：https://blog.csdn.net/weixin_40793311/article/details/89530803

版权

文章目录

北理工嵩天老师的慕课课程《python数据分析与展示》学习笔记！

1. 数据的CSV文件存取

CSV（comma-Separated Value，逗号分隔值），是一种常见的文件格式。

1.1 写入文件：

np.savetxt(frame, array, fmt=’%.18e’, delimiter=None)

frame：文件、字符串或产生器，可以是.gz或.bz2的压缩文件
array：存入文件的数组
fmt：写入文件的格式，例如：%d、%.2f、%.18e
delimiter：分割字符串，默认是任何空格

>>> a = np.arange(100).reshape((5,20))
>>> np.savetxt('a.csv',a,fmt='%d',delimiter=',')

在这里插入图片描述

1.2 读出文件：

np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)

frame：文件、字符串或产生器，可以是.gz或.bz2的压缩文件
dtype：数据类型，可选
delimiter：分割字符串，默认是任何空格
unpack：默认为False，读入的写入一个数组；如果True，读入属性将分别写入不同变量

>>> b = np.loadtxt('a.csv',dtype=np.int32, delimiter=',')
>>> b
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
        56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
        76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
        96, 97, 98, 99]])

2. 多维数据的存取

该部分着重实现对任意维度数据的存取。

2.1 写入文件：

a.tofile(frame, sep=’’, format=’%s’)

frame：文件、字符串
sep：数据分割字符串，如果是空串，写入文件为二进制
format：写入数据的格式

>>> a = np.arange(100).reshape((5,10,2))
>>> a.tofile('b.txt',sep=',', format='%d')

在这里插入图片描述

2.2 读出文件：

np.fromfile(frame, dtype=float, count=-1, sep=’’)

frame：文件、字符串
dtype：读取的数据类型
count：读入元素个数，-1表示读入整个文件
sep：数据分割字符串，如果是空串，写入文件为二进制

>>> a = np.arange(100).reshape((5,10,2))
>>> a.tofile('b.txt',sep=',', format='%d')
 c = np.fromfile("b.txt",dtype=np.int, sep=',')
>>> c
array([ 0,  1,  2, ……, 96, 97, 98, 99]
 c = np.fromfile("b.txt",dtype=np.int, sep=',').reshape(5,10,2)
>>> c
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],
			……
       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

可以看出，该方法需要读取时知道存入文件时数组的维度和元素类型，a.tofile()和np.fromfile()需配合使用。可以通过元数据文件来存储额外信息。

2.3 NumPy的便捷文件存取：

np.save(fname, array)或np.savez(frame, array)

frame：文件名，以.npy为扩展名，压缩扩展名为.npz
array：数组变量

np.load(fname)

frame：文件名，以.npy为扩展名，压缩扩展名为.npz

>>> a = np.arange(100).reshape(5,10,2)
>>> np.save("a.npy",a)
>>> b = np.load("a.npy")
>>> b
array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],
			……
       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

3. NumPy的随机数函数

NumPy中的random子库，调用方法为 np.random.*

函数	说明
rand(d0, d1, … , dn)	根据d0-dn（维度）创建随机数数组，浮点数，[0,1)，均匀分布
randn(d0, d1, …, dn)	根据d0-dn（维度）创建随机数数组，标准正态分布
randint(low [, high, shape])	根据shape创建随机整数或整数数组，范围是[low, high]
seed(s)	随机数种子，s是给定的种子值

>>> a = np.random.rand(2,3,4)
>>> a
array([[[0.7159914 , 0.67879048, 0.66354717, 0.62109466],
        [0.83696041, 0.55870606, 0.80749695, 0.65830685],
        [0.02791269, 0.41589142, 0.81384098, 0.88096189]],

       [[0.2120255 , 0.23210762, 0.1713055 , 0.46703512],
        [0.41053682, 0.39566548, 0.2020188 , 0.04423246],
        [0.05319782, 0.34943791, 0.22082916, 0.3082998 ]]])
>>> b = np.random.randn(2,3,4)
>>> b
array([[[ 1.91189919,  0.89440312, -0.0065782 ,  0.15076632],
        [ 1.03070899, -0.49241246,  1.27212925, -0.44477393],
        [ 1.0811208 , -0.67856507,  0.5050502 , -0.4688582 ]],

       [[ 0.18175709, -0.63418146, -1.07830838,  0.76980784],
        [-0.63507712,  1.19886136,  0.18953848,  0.04486784],
        [-0.29572939,  0.10893423, -0.1560887 , -1.11338885]]])
>>> c = np.random.randint(100,200,(3,4))
>>> c
array([[178, 161, 136, 132],
       [159, 124, 182, 106],
       [196, 195, 146, 135]])
>>>
>>> np.random.seed(10)
>>> np.random.randint(100,200,(3,4))
array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])
>>> np.random.seed(10) # 可以产生相同的随机数组！
>>> np.random.randint(100,200,(3,4))
array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])

函数	说明
shuffle(a)	根据数组a的第1轴进行随机排列，改变数组a
permutation	根据数组a的第1轴产生一个新的乱序数组，不改变数组a
choice(a [, size, replace, p])	从一维数组a中以概率p抽取元素，形成size形状新数组，replace表示是否可以重用元素，默认为False

>>> a = np.random.randint(100, 200, (3,4))
>>> a
array([[139, 136, 129, 186],
       [170, 130, 101, 182],
       [181, 102, 104, 113]])
>>> >>> # 改变原数组
>>> np.random.shuffle(a)
>>> a
array([[181, 102, 104, 113],
       [170, 130, 101, 182],
       [139, 136, 129, 186]])
>>> np.random.shuffle(a)
>>> a
array([[139, 136, 129, 186],
       [170, 130, 101, 182],
       [181, 102, 104, 113]])
>>> # 不改变原数组
>>> np.random.permutation(a)
array([[181, 102, 104, 113],
       [139, 136, 129, 186],
       [170, 130, 101, 182]])
>>> a
array([[139, 136, 129, 186],
       [170, 130, 101, 182],
       [181, 102, 104, 113]])
>>>
>>> b = np.random.randint(100,200,(8))
>>> b
array([100, 127, 160, 194, 186, 160, 188, 110])
>>> np.random.choice(b, (3,2), replace=False)
array([[188, 194],
       [110, 160],
       [127, 186]])
>>> np.random.choice(b, (3,2), p=b/np.sum(b))
array([[194, 188],
       [186, 127],
       [160, 160]])

函数	说明
uniform(low, high, size)	产生具有均匀分布的数组，low起始值，high结束值，size形状
normal(loc, scale, size)	产生具有正态分布的数组，loc均值，scale标准差，size形状
poisson(lam, size)	产生具有泊松分布的数组，lam随机事件发生率，size形状

>>> u = np.random.uniform(100,200,(3,4))
>>> u
array([[146.02845602, 122.78942046, 112.13939091, 185.72304815],
       [199.04671529, 178.95682172, 110.28043003, 117.57380544],
       [124.82472279, 100.0461196 , 150.4064258 , 156.10574794]])
>>> n = np.random.normal(10,5,(3,4))
>>> n
array([[10.80331365,  4.24328615,  5.65297955,  9.0247167 ],
       [17.22963437,  6.26369092,  4.06187665, 10.80390977],
       [ 4.38306056,  7.04111773, 10.66619143,  3.64565378]])

4. NumPy的统计函数

函数	说明
sum(a, axis=None)	根据给定轴axis计算数组a相关元素之和，默认轴None表示对所有元素进行求和
mean(a, axis=None)	根据给定轴axis计算数组a相关元素的期望
average(a, axis=None, weights=None)	根据给定轴axis计算数组a相关元素的加权平均值
std(a, axis=None)	根据给定轴axis计算数组a相关元素的标准差
var(a, axis=None)	根据给定轴axis计算数组a相关元素的方差

>>> a = np.arange(15).reshape(3,5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> np.sum(a)
105
>>> np.mean(a, axis=1)  # 注意！这里表示在第二维度进行相关运算。
array([ 2.,  7., 12.])
>>>  # 理解在第一维度进行计算时的含义。4.1875=(2*10+7*5+12*1)/(10+5+1)
>>> np.average(a,axis=0,weights=[10,5,1])
array([2.1875, 3.1875, 4.1875, 5.1875, 6.1875])

函数	说明
min(a) max(a)	计算数组a中元素的最小值、最大值
argmin(a) argmax(a)	计算数组a中元素最小值、最大值的降一维后下标
unravel_index(index,shape)	根据shape将一维下标index转换成多位下标
ptp(a)	计算数组a中元素最大值与最小值的差
median(a)	计算数组a中元素的中位数

>>> b = np.arange(15).reshape(3,5)
>>> b
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> np.max(b)
14
>>> np.argmax(b) # 扁平化后的下标
14
>>> np.unravel_index(np.argmax(b),b.shape) # 重塑成多维下标
(2, 4)
>>> np.median(b)
7.0

5. NumPy的梯度函数

梯度：连续值之间的变化率，即斜率。

函数	说明
np.gradient(f)	计算数组f中元素的梯度，当f为多维时，返回每个维度梯度

>>> a = np.random.randint(0,50,(3,5))
>>> a
array([[ 9, 44, 43,  1,  6],
       [29, 32, 22, 29, 39],
       [38,  3, 49, 44, 20]])
>>> np.gradient(a)
[array([[ 20. , -12. , -21. ,  28. ,  33. ],   # 第一维度的梯度
       [ 14.5, -20.5,   3. ,  21.5,   7. ],
       [  9. , -29. ,  27. ,  15. , -19. ]]), 
 array([[ 35. ,  17. , -21.5, -18.5,   5. ],   # 第二维度的梯度
       [  3. ,  -3.5,  -1.5,   8.5,  10. ],
       [-35. ,   5.5,  20.5, -14.5, -24. ]])]

I am Paul Plus Plus

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
python数据分析与展示-NumPy数据存取与函数

北理工嵩天老师的慕课课程《python数据分析与展示》学习笔记！1. 数据的CSV文件存取CSV（comma-Separated Value，逗号分隔值），是一种常见的文件格式。1.1 写出：np.savetxt(frame, array, fmt=’%.18e’, delimiter=None)frame：文件、字符串或产生器，可以是.gz或.bz2的压缩文件array：存入文件的...
复制链接

扫一扫