qpython能使用json吗l,从JSON到JSONL的Python转换

最新推荐文章于 2024-09-18 05:20:31 发布

硅谷IT胖子

最新推荐文章于 2024-09-18 05:20:31 发布

阅读量201

点赞数

文章标签： qpython能使用json吗l

该博客讨论了如何将标准的JSON对象转换为JSON Lines格式，以便于在Spark中读取。作者提出了一种方法，即读取JSON文件作为文本文件，然后通过删除开始的[和结束的]来创建每行都是一个独立有效JSON对象的文件。虽然这种方法可行，但作者想知道是否存在更优雅的解决方案。文中提供了使用Python的json模块来实现这一转换的代码示例，并强调了避免对文件进行字符串操作可能带来的问题。

摘要由CSDN通过智能技术生成

I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines

JSON_file =

[{u'index': 1,

u'no': 'A',

u'met': u'1043205'},

{u'index': 2,

u'no': 'B',

u'met': u'000031043206'},

{u'index': 3,

u'no': 'C',

u'met': u'0031043207'}]

To JSONL:

{u'index': 1, u'no': 'A', u'met': u'1043205'}

{u'index': 2, u'no': 'B', u'met': u'031043206'}

{u'index': 3, u'no': 'C', u'met': u'0031043207'}

My current solution is to read the JSON file as a text file and remove the [ from the beginning and the ] from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.

I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.

The motivation is to read json files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`

解决方案

Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.

If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:

import json

with open('output.jsonl', 'w') as outfile:

for entry in JSON_file:

json.dump(entry, outfile)

outfile.write('\n')

The default configuration for the json module is to output JSON without newlines embedded.

Assuming your A, B and C names are really strings, that would produce: