看看下面两段代码的运行时间!
代码一:
import numpy as np
import time
# 100万个数据
n_samples=1000000
# 将随机浮点数作为字符串写入本地TXT文件
with open('fdata.txt', 'w') as fdata:
for _ in range(n_samples):
fdata.write(str(10*np.random.random())+',')
# 读取TXT文件的数据,转换为1000x1000的ndarray,并计时
t1=time.time()
with open('fdata.txt','r') as fdata:
datastr=fdata.read()
lst = datastr.split(',')
lst.pop()
array_lst=np.array(lst,dtype=float).reshape(1000,1000)
t2=time.time()
print(array_lst)
print('\nShape: ',array_lst.shape)
print(f"Time took to read: {t2-t1} seconds.")
#输出结果:
>> [[0.32614787 6.84798256 2.59321025 ... 5.02387324 1.04806225 2.80646522]
[0.42535168 3.77882315 0.91426996 ... 8.43664343 5.50435042 1.17847223]
[1.79458482 5.82172793 5.29433626 ... 3.10556071 2.90960252 7.8021901 ]
...
[3.04453929 1.0270109 8.04185826 ... 2.21814825 3.56490017 3.72934854]
[7.11767505 7.59239626 5.60733328 ... 8.33572855 3.29231441 8.67716649]
[4.2606672 0.08492747 1.40436949 ... 5.6204355 4.47407948 9.50940101]]
>> Shape: (1000, 1000)
>> Time took to read: 1.018733024597168 seconds.
代码二:
import numpy as np
import time
np.save('fnumpy.npy', array_lst)
t1=time.time()
array_reloaded = np.load('fnumpy.npy')
t2=time.time()
print(array_reloaded)
print('\nShape: ',array_reloaded.shape)
print(f"Time took to load: {t2-t1} seconds.")
#输出结果:
>> [[0.32614787 6.84798256 2.59321025 ... 5.02387324 1.04806225 2.80646522]
[0.42535168 3.77882315 0.91426996 ... 8.43664343 5.50435042 1.17847223]
[1.79458482 5.82172793 5.29433626 ... 3.10556071 2.90960252 7.8021901 ]
...
[3.04453929 1.0270109 8.04185826 ... 2.21814825 3.56490017 3.72934854]
[7.11767505 7.59239626 5.60733328 ... 8.33572855 3.29231441 8.67716649]
[4.2606672 0.08492747 1.40436949 ... 5.6204355 4.47407948 9.50940101]]
>> Shape: (1000, 1000)
>> Time took to load: 0.009010076522827148 seconds.
如果要以其他形状读取也可以
t1=time.time()
array_reloaded = np.load('fnumpy.npy').reshape(10000,100)
t2=time.time()
print(array_reloaded)
print('\nShape: ',array_reloaded.shape)
print(f"Time took to load: {t2-t1} seconds.")
#输出结果:
>> [[0.32614787 6.84798256 2.59321025 ... 3.01180325 2.39479796 0.72345778]
[3.69505384 4.53401889 8.36879084 ... 9.9009631 7.33501957 2.50186053]
[4.35664074 4.07578682 1.71320519 ... 8.33236349 7.2902005 5.27535724]
...
[1.11051629 5.43382324 3.86440843 ... 4.38217095 0.23810232 1.27995629]
[2.56255361 7.8052843 6.67015391 ... 3.02916997 4.76569949 0.95855667]
[6.06043577 5.8964256 4.57181929 ... 5.6204355 4.47407948 9.50940101]]
>> Shape: (10000, 100)
>> Time took to load: 0.010006189346313477 seconds.
总结: 从txt或其他文件读取1000*1000的数据,直接读取需要1s,而转成.npy后,读取只需要约0.01s,快100倍。如果需要多次读取相同的数据文件,这是一个有用的技巧,而且如上图所示,数据量越大,速度提升越明显!