python遍历json对象顺序输出,Python在保持串行读取的同时压缩一系列JSON对象?

I have a bunch of json objects that I need to compress as it's eating too much disk space, approximately 20 gigs worth for a few million of them.

Ideally what I'd like to do is compress each individually and then when I need to read them, just iteratively load and decompress each one. I tried doing this by creating a text file with each line being a compressed json object via zlib, but this is failing with a

decompress error due to a truncated stream,

which I believe is due to the compressed strings containing new lines.

Anyone know of a good method to do this?

解决方案

Just use a gzip.GzipFile() object and treat it like a regular file; write JSON objects line by line, and read them line by line.

The object takes care of compression transparently, and will buffer reads, decompressing chucks as needed.

import gzip

import json

# writing

with gzip.GzipFile(jsonfilename, 'w') as outfile:

for obj in objects:

outfile.write(json.dumps(obj) + '\n')

# reading

with gzip.GzipFile(jsonfilename, 'r') as isfile:

for line in infile:

obj = json.loads(line)

# process obj

This has the added advantage that the compression algorithm can make use of repetition across objects for compression ratios.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值