``
用pandas读取csv文件
def pd_read_csv():
start_time = time.time()
data = pd.read_csv(file_path)
values = data.values
keys = data.keys()
result = []
for value in values:
datas = {k: v for k, v in zip(keys, value)}
result.append(datas)
test.insert_many(result)
end_time = time.time()
print('pd.read_csv方法耗时:', end_time - start_time)
用csv.DictReader读取csv文件
def csv_read_csv():
start_time = time.time()
datas = csv.DictReader(open(file_path))
result = []
for data in datas:
result.append(data)
test.insert_many(result)
end_time = time.time()
print('csv.DictReader方法耗时:', end_time - start_time)
由于csv文件太大, 差不多每个有1G以上,消耗太多内存, 只能把csv分成小文件进行读取.
试了三个不同大小的csv文件,分别是175M, 150M, 9M,pd.read_csv 分别耗时43s, 24s, 1s
csv.DictReader 分别耗时 33s, 15s, 0.6s.
目前看来是csv.DictReader 效率高点
由于本人太菜了 目前只会这么写, 如果有大神路过还请指教一下效率高的写法!