python2 json大数据_Python 数据处理（十七）—— json

最新推荐文章于 2024-06-17 09:43:08 发布

weixin_39524834

最新推荐文章于 2024-06-17 09:43:08 发布

阅读量490

点赞数 1

文章标签： python2 json大数据

本文链接：https://blog.csdn.net/weixin_39524834/article/details/114923937

版权

本文详细介绍了如何使用Python的pandas库处理JSON数据，包括读写JSON格式文件、不同orient选项的使用、日期处理以及JSON数据的规范化。重点讨论了to_json和read_json函数的参数和使用示例，展示了如何将DataFrame和Series转换为JSON，以及如何从JSON数据恢复到DataFrame。

摘要由CSDN通过智能技术生成

前言

在系列的第一节中，我们介绍了如何使用 Python 的标准库 json 来读写 json 格式文件

本节，我们将介绍 pandas 提供的 JSON 格式的文件和字符串的读写操作。

介绍

1 写入 JSON

一个 Series 或 DataFrame 可以使用 to_json 方法转换为有效的 JSON 字符串。

可选的参数如下：

path_or_buf :

orient :

Series：默认为 index，可选择 [split, records, index, table]

DataFrame：默认为 columns，可选择 [split, records, index, columns, values, table]

9da9dff4a0db

image.png

date_format : 日期转换类型, epoch 表示 timestamp, iso 表示 ISO8601.

double_precision : 浮点值的小数位数，默认为 10

force_ascii : 强制将字符串编码为 ASCII，默认为 True。

date_unit : 编码的时间单位，控制 timestamp 和 ISO8601 精度。's'、'ms'、'us' 和 'ns' 分别代表秒、毫秒、微秒和纳秒。默认为 'ms'

default_handler : 如果无法将对象转换为适合 JSON 的格式，则调用该处理程序。它接受一个要转换的对象，并返回将其序列化后的对象

lines : 如果 orient=records, 将每条记录

注意 NaN，NaT 和 None 将被转换为 null，并且 datetime 对象将根据 date_format 和 date_unit 参数进行转换

In [197]: json = dfj.to_json()

In [198]: json

Out[198]: '{"A":{"0":-1.2945235903,"1":0.2766617129,"2":-0.0139597524,"3":-0.0061535699,"4":0.8957173022},"B":{"0":0.4137381054,"1":-0.472034511,"2":-0.3625429925,"3":-0.923060654,"4":0.8052440254}}'

1.1 orient 选项

生成的 JSON 文件或字符串的格式有很多不同的选项，比如，下面的 DataFrame 和 Series

In [199]: dfjo = pd.DataFrame(

.....: dict(A=range(1, 4), B=range(4, 7), C=range(7, 10)),

.....: columns=list("ABC"),

.....: index=list("xyz"),

.....: )

.....:

In [200]: dfjo

Out[200]:

A B C

x 1 4 7

y 2 5 8

z 3 6 9

In [201]: sjo = pd.Series(dict(x=15, y=16, z=17), name="D")

In [202]: sjo

Out[202]:

x 15

y 16

z 17

Name: D, dtype: int64

columns: DataFrame 默认是按列将数据序列化为嵌套的 JSON 对象

In [203]: dfjo.to_json(orient="columns")

Out[203]: '{"A":{"x":1,"y":2,"z":3},"B":{"x":4,"y":5,"z":6},"C":{"x":7,"y":8,"z":9}}'

# Not available for Series

index: Series 的默认是按索引 index 序列化，类似于面向列

In [204]: dfjo.to_json(orient="index")

Out[204]: '{"x":{"A":1,"B":4,"C":7},"y":{"A":2,"B":5,"C":8},"z":{"A":3,"B":6,"C":9}}'

In [205]: sjo.to_json(orient="index")

Out[205]: '{"x":15,"y":16,"z":17}'

records: 将数据序列化为列->值记录的 JSON 数组，同时会忽略索引标签，有利于将数据传入 js 绘图库

In [206]: dfjo.to_json(orient="records")

Out[206]: '[{"A":1,"B":4,"C":7},{"A":2,"B":5,"C":8},{"A":3,"B":6,"C":9}]'

In [207]: sjo.to_json(orient="records")

Out[207]: '[15,16,17]'

value 是一个基本选项，它仅将值序列化为的嵌套 JSON 数组，不包括列和索引标签

In [208]: dfjo.to_json(orient="values")

Out[208]: '[[1,4,7],[2,5,8],[3,6,9]]'

# Not available for Series

split: 序列化为 JSON 对象，包含值，索引和列的单独条目。Series 名称也包括在内

In [209]: dfjo.to_json(orient="split")

Out[209]: '{"columns":["A","B","C"],"index":["x","y","z"],"data":[[1,4,7],[2,5,8],[3,6,9]]}'

In [210]: sjo.to_json(orient="split")

Out[210]: '{"name":"D","index":["x","y","z"],"data":[15,16,17]}'

table: 序列化为 JSON 表模式，从而允许保留元数据，包括但不限于 dtypes 和索引名称

>>> sjo.to_json(orient='table')

>>> '{"schema":{"fields":[{"name":"index","type":"string"},{"name":"D","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":"x","D":15},{"index":"y","D":16},{"index":"z","D":17}]}'

>>> dfjo.to_json(orient='table')

>>> '{"schema":{"fields":[{"name":"index","type":"string"},{"name":"A","type":"integer"},{"name":"B","type":"integer"},{"name&#