python生成json_python – 如何逐步写入json文件

我知道这已经晚了一年,但问题仍然存在,我很惊讶

json.iterencode()没有被提及.

在这个例子中,iterencode的潜在问题是你希望通过使用生成器对大数据集进行迭代处理,而json编码不会序列化生成器.

解决这个问题的方法是子类列表类型并覆盖__iter__魔术方法,以便您可以生成生成器的输出.

以下是此列表子类的示例.

class StreamArray(list):

"""

Converts a generator into a list object that can be json serialisable

while still retaining the iterative nature of a generator.

IE. It converts it to a list without having to exhaust the generator

and keep it's contents in memory.

"""

def __init__(self,generator):

self.generator = generator

self._len = 1

def __iter__(self):

self._len = 0

for item in self.generator:

yield item

self._len += 1

def __len__(self):

"""

Json parser looks for a this method to confirm whether or not it can

be parsed

"""

return self._len

从这里开始使用非常简单.获取生成器句柄,将其传递到StreamArray类,将流数组对象传递给iterencode()并迭代块.块将是json格式输出,可以直接写入文件.

用法示例:

#Function that will iteratively generate a large set of data.

def large_list_generator_func():

for i in xrange(5):

chunk = {'hello_world': i}

print 'Yielding chunk: ',chunk

yield chunk

#Write the contents to file:

with open('/tmp/streamed_write.json','w') as outfile:

large_generator_handle = large_list_generator_func()

stream_array = StreamArray(large_generator_handle)

for chunk in json.JSONEncoder().iterencode(stream_array):

print 'Writing chunk: ',chunk

outfile.write(chunk)

显示产量和写入的输出连续发生.

Yielding chunk: {'hello_world': 0}

Writing chunk: [

Writing chunk: {

Writing chunk: "hello_world"

Writing chunk: :

Writing chunk: 0

Writing chunk: }

Yielding chunk: {'hello_world': 1}

Writing chunk:,Writing chunk: {

Writing chunk: "hello_world"

Writing chunk: :

Writing chunk: 1

Writing chunk: }

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值