python 百度脑图json数据转Excel
最近接了个需求是将百度脑图导出的json格式数据保存到Excel中,众所周知,不管是kityminder还是xmind这些脑图,导出的数据必定是有一定规律的。首先想到的是借用xmind2excel,但是发现有些是不支持的,因此打算自己写一个。
kityminder-editor JSON TO EXCEL
首先我们从百度脑图将脑图另存为.km格式的数据,将.km格式直接修改为.json格式。
(.km格式是kityminder独有的格式,与.json格式相同,可以直接修改后缀)
为此,我特意创建了一个测试脑图,将所有元素囊括其中,数据大致手工整理如下:
{
"root":{
"data":{"id":"e497989a3e9b","created":1613635615,"text":"新建脑图"},
"children":[
{
"data":{"id":"c9lnfu4a9540","created":1614564757710,"text":"1","resource":["Hello","Hi","Yes"]},
"children":[
{"data":{"id":"c9lnhflj9ug0","created":1614564882830,"text":"分支主题"},
"children":[]}]},
{"data":{"id":"c9lnfu4ahjs0","created":1614564757711,"text":"2","resource":["Hello"]},"children":[]},
{"data":{"id":"c9lnfu4ag4g0","created":1614564757711,"text":"3"},"children":[]},
{"data":{"id":"c9lnfyhlra80","created":1614564767223,"text":"4"},"children":[]},
{"data":{"id":"c9lnfyhmf8g0","created":1614564767224,"text":"5","hyperlink":"http://www.baidu.com","hyperlinkTitle":""},"children":[]},
{"data":{"id":"c9lnfyhme740","created":1614564767224,"text":"6","note":"123\n"},"children":[]},
{"data":{"id":"c9lng7qlo000","created":1614564787358,"text":"7","progress":1},
"children":[
{"data":{"id":"c9lng7qlkg80","created":1614564787358,"text":"8"},"children":[]},
{"data":{"id":"c9lng7qmfps0","created":1614564787359,"text":"9"},"children":[]}]},
{"data":{"id":"c9lngej4nm80","created":1614564802143,"text":"1","priority":1},
"children":[
{"data":{"id":"c9lngej4alc0","created":1614564802143,"text":"2"},
"children":[
{"data":{"id":"c9lngej4xtk0","created":1614564802144,"text":"3"},
"children":[
{"data":{"id":"c9lngej4zdc0","created":1614564802144,"text":"4"},"children":[]}]}]}]}
]
},
"template":"default","theme":"fresh-blue","version":"1.4.43"
}
由上述格式我们发现,最外层的字典有两个key,一个是root,一个是template,其中template所对应的value保存的是脑图的模板样式、主题样式以及当前版本。
在root所对应的value中最外层也只有两个key,一个是data,所保存的是脑图的初始节点的内容;而children所对应的value是一个列表,在列表中又包含字典,字典下面再包含data和children。
这么一描述,问题也就清晰了,很容易让人联想到递归函数。
因此我打算使用递归函数对数据进行处理。
那么,一步步来。
首先是利用oepn打开并加载json格式文件:
file_obj = open('All.json', encoding='utf-8')
file_content = json.load(file_obj)
接着对格式进行处理,首先获取json数据中的data数据:
for k, v in file_content.items():
if k == 'root':
file_content = v
for k, v in file_content.items():
if k == "data":
for key, value in v.items():
if key=="text":
filename = value
else:
final_content = [[] for i in range(len(v))]
for i in range(len(v)):
final_content_list = []
final_content[i] = extract_dict(v[i])
然后需要对获取到的数据进行解析:
def extract_dict(d):
if isinstance(d, dict):
for k, v in d.items():
if k == "data":
for key, value in v.items():
if key == "text":
final_content_list.append("text:" + value)
elif key == "resource":
final_content_list.append("resource:" + ' '.join(value))
elif key == "image":
final_content_list.append("image:" + value)
elif key == "hyperlink":
final_content_list.append("hyperlink:" + value)
elif key == "note":
final_content_list.append("note:" + value)
elif key == "progress":
final_content_list.append("progress:" + str(value))
elif key == "priority":
final_content_list.append("priority:" + str(value))
elif k == "children":
if len(v) != 0:
for j in range(len(v)):
extract_dict(v[j])
## else:
## final_content_list.append(d)
return final_content_list
最后是将解析后的数据保存到Excel中,这边我们使用了openpyxl库:
## 将list数据写入Excel
workbook = Workbook()
save_file = filename + ".xlsx"
worksheet = workbook.active
worksheet.title = filename
for row in final_content:
worksheet.append(row)
workbook.save(filename=save_file)
完整的代码如下:
import json
from openpyxl import Workbook
### 解析kityminder-editor的json数据到list
def extract_dict(d):
if isinstance(d, dict):
for k, v in d.items():
if k == "data":
for key, value in v.items():
if key == "text":
final_content_list.append("text:" + value)
elif key == "resource":
final_content_list.append("resource:" + ' '.join(value))
elif key == "image":
final_content_list.append("image:" + value)
elif key == "hyperlink":
final_content_list.append("hyperlink:" + value)
elif key == "note":
final_content_list.append("note:" + value)
elif key == "progress":
final_content_list.append("progress:" + str(value))
elif key == "priority":
final_content_list.append("priority:" + str(value))
elif k == "children":
if len(v) != 0:
for j in range(len(v)):
extract_dict(v[j])
## else:
## final_content_list.append(d)
return final_content_list
file_obj = open('All.json', encoding='utf-8')
file_content = json.load(file_obj)
for k, v in file_content.items():
if k == 'root':
file_content = v
for k, v in file_content.items():
if k == "data":
for key, value in v.items():
if key=="text":
filename = value
else:
final_content = [[] for i in range(len(v))]
for i in range(len(v)):
final_content_list = []
final_content[i] = extract_dict(v[i])
## 将list数据写入Excel
workbook = Workbook()
save_file = filename + ".xlsx"
worksheet = workbook.active
worksheet.title = filename
for row in final_content:
worksheet.append(row)
workbook.save(filename=save_file)