用python处理csv数据_如何用python处理csv数据

最新推荐文章于 2023-03-27 21:07:24 发布

weixin_39567222

最新推荐文章于 2023-03-27 21:07:24 发布

阅读量156

点赞数

文章标签：用python处理csv数据

2017-03-07 回答

因为python处理json比较方便，所以首先测试一下csv和json哪个快。

首先生成测试数据

# coding: utf-8import jsonimport csvimport randomfrom string import letterslow = 1e2 # 3-10位数字hi = 1e11cnt = 100000 # 10万条total = {}for _ in range(cnt):

total[str(random.randrange(low, hi))] = "".join(random.sample(letters, 10))with open("data.json", "w") as f:

f.write(json.dumps(total, ensure_ascii=false))with open("data.csv", "w") as f:

writer = csv.writer(f, delimiter=',')

writer.writerows(total.items())

然后对比由这两者生成dict的速度

# coding: utf-8import jsonimport csvfrom time import clockt0 = clock()total1 = json.load(open("data.json"))t1 = clock()total2 = {}with open("data.csv") as f:

reader = csv.reader(f)

for k, v in reader:

total2[k] = vt2 = clock()print "json: %fs" % (t1 - t0)print "csv: %fs" % (t2 - t1)

输出是：

json: 0.109953s

csv: 0.066411s

果然csv还是蛮快的，那我们就用它吧。

接下来解决更新问题。我不知道题主对于重复项需要怎么处理，所以都写了。

# 先生成数据，同之前的做法。low = 1e2hi = 1e11cnt = 100000new = {}for _ in range(cnt):

new[str(random.randrange(low, hi))] = "".join(random.sample(letters, 10))# 找出重复项，因为是随机生成的数据，所以恰好没有重复项duplicate = {k:v for k, v in new.items() if k in total}# 输出重复项print(json.dumps(duplicate, ensure_ascii=false, indent=4))# 1. 如果重复项是用new覆盖totaltotal.update(new)# 2. 如果是保留totalnew.update(total)total = new# 然后再写回csv文件中with open("data.csv", "w") as f:

writer = csv.writer(f, delimiter=',')

writer.writerows(total.items())

至于运行时间，如果不算上输出重复项的时间，不到0.5s。算上的话大概也就0.8s。

weixin_39567222

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python处理csv数据_如何用python处理csv数据

2017-03-07 回答因为python处理json比较方便，所以首先测试一下csv和json哪个快。首先生成测试数据# coding: utf-8import jsonimport csvimport randomfrom string import letterslow = 1e2 # 3-10位数字hi = 1e11cnt = 100000 # 10万条total = {}for _ in...
复制链接

扫一扫

用python处理csv数据_如何用python处理csv数据

“相关推荐”对你有帮助么？