和csv做了个比较。
import time
import pandas as pd
print('测试csv和hdf5的读写速度。')
print('1、测试read_csv的速度')
t0 = time.time()
df = pd.read_csv('~/stock-info/caibao.csv',header=None,encoding = 'gb18030')
t1 =time.time()
t = t1 - t0
print('read_csv耗时 : ',t)
print('2、测试to_csv的速度')
t0 = time.time()
df.to_csv('caibao.csv')
t1 =time.time()
t = t1 - t0
print('to_csv耗时 : ',t)
for s in df.columns: #必须把字段decode成utf8,否则warning,读时出错
df[s] = df[s].str.decode('utf8')
print('3、测试HDFStore接口的写速度')
t0 = time.time()
store = pd.HDFStore('test/caibao.h5')
store['df'] = df
store.close
t1 =time.time()
t = t1 - t0
print('HDFStore写耗时 : ',t)
print('4、测试HDFStore接口的读速度')
t0 = time.time()
store = pd.HDFStore('test/caibao.h5')
df = store['df']
store.close
t1 =time.time()
t = t1 - t0
df.head()
print('HDFStore读耗时 : ',t)
print('5、测试fix模式的to_hdf速度')
t0 = time.time()
df.to_hdf('test/caibao2.h5', 'df',format='f', mode='w')
t1 =time.time()
t = t1 - t0
print('fix模式to_hdf耗时 : ',t)
print('6、测试fix模式的read_hdf速度')
t0 = time.time()
df = pd.read_hdf('test/caibao2.h5', 'df')
t1 =time.time()
t = t1 - t0
df.head()
print('fix模式read_hdf耗时 : ',t)
print('7、测试table模式的to_hdf速度')
t0 = time.time()
df.to_hdf('test/caibao3.h5', 'df',format='t', mode='w')
t1 =time.time()
t = t1 - t0
print('table模式to_hdf耗时 : ',t)
print('8、测试table模式的read_hdf速度')
t0 = time.time()
df = pd.read_hdf('test/caibao3.h5', 'df')
t1 =time.time()
t = t1 - t0
df.head()
print('table模式read_hdf耗时 : ',t)
测试结果
测试csv和hdf5的读写速度。
1、测试read_csv的速度
read_csv耗时 : 0.20053529739379883
2、测试to_csv的速度
/home/zhangyl/anaconda3/envs/zz-tensorflow-gpu/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3146: DtypeWarning: Columns (0,7,23,30,33,34) have mixed types.Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
to_csv耗时 : 0.19930219650268555
3、测试HDFStore接口的写速度
HDFStore写耗时 : 0.08034491539001465
4、测试HDFStore接口的读速度
HDFStore读耗时 : 0.008101701736450195
5、测试fix模式的to_hdf速度
fix模式to_hdf耗时 : 0.023907184600830078
6、测试fix模式的read_hdf速度
fix模式read_hdf耗时 : 0.008603096008300781
7、测试table模式的to_hdf速度
table模式to_hdf耗时 : 0.043287038803100586
8、测试table模式的read_hdf速度
table模式read_hdf耗时 : 0.00782632827758789
中间to_csv的warning不知是什么意思,大神帮助回答下。