qpython能使用json吗l,从JSON到JSONL的Python转换

该博客讨论了如何将标准的JSON对象转换为JSON Lines格式,以便于在Spark中读取。作者提出了一种方法,即读取JSON文件作为文本文件,然后通过删除开始的[和结束的]来创建每行都是一个独立有效JSON对象的文件。虽然这种方法可行,但作者想知道是否存在更优雅的解决方案。文中提供了使用Python的json模块来实现这一转换的代码示例,并强调了避免对文件进行字符串操作可能带来的问题。
摘要由CSDN通过智能技术生成

I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines

JSON_file =

[{u'index': 1,

u'no': 'A',

u'met': u'1043205'},

{u'index': 2,

u'no': 'B',

u'met': u'000031043206'},

{u'index': 3,

u'no': 'C',

u'met': u'0031043207'}]

To JSONL:

{u'index': 1, u'no': 'A', u'met': u'1043205'}

{u'index': 2, u'no': 'B', u'met': u'031043206'}

{u'index': 3, u'no': 'C', u'met': u'0031043207'}

My current solution is to read the JSON file as a text file and remove the [ from the beginning and the ] from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.

I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.

The motivation is to read json files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`

解决方案

Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.

If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:

import json

with open('output.jsonl', 'w') as outfile:

for entry in JSON_file:

json.dump(entry, outfile)

outfile.write('\n')

The default configuration for the json module is to output JSON without newlines embedded.

Assuming your A, B and C names are really strings, that would produce:

{"index": 1, "met": "1043205", "no": "A"}

{"index": 2, "met": "000031043206", "no": "B"}

{"index": 3, "met": "0031043207", "no": "C"}

If you started with a JSON document containing a list of entries, just parse that document first with json.load()/json.loads().

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值