json文件两种读取方式
这是一段两个人的对话标注抄本
[
{
"start_time": {
"original": "0:00:00.611000"
},
"end_time": {
"original": "0:00:05.760000"
},
"words": "+",
"speaker": "",
"location": "",
"session_id": "MDT_F2F_223"
},
{
"start_time": {
"original": "0:00:05.760000"
},
"end_time": {
"original": "0:00:09.755000"
},
"words": "不想上了,你知道吗?我都烦死了",
"speaker": "SPK506",
"location": "",
"session_id": "MDT_F2F_223"
},
...
]
json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
cnt = 0
with open(json_file, 'r') as f:
lines = f.readlines()
print(lines)
for line in lines:
print(line)
if cnt == 3:
break
else:
cnt += 1
['[\n', ' {\n', ' "start_time": {\n', ' "original": "0:00:00.611000"\n', ' },\n', ' "end_time": {\n', ' "original": "0:00:05.760000"\n', ' },\n', ' "words": "+",\n', ' "speaker": "",\n', ' "location": "",\n', ' "session_id": "MDT_F2F_223"\n', ' },\n', ' {\n', ' "start_time": {\n', ' "original": "0:00:05.760000"\n', ' },\n', ' "end_time": {\n', ' "original": "0:00:09.755000"\n', ' },\n', ' "words": "不想上了,你知道吗?我都烦死了",\n', ' "speaker": "SPK506",\n', ' "location": "",\n', ' "session_id": "MDT_F2F_223"\n', ' },\n', ' {\n', ' "start_time": {\n', ' "original": "0:00:09.755000"\n', ' },\n', ' "end_time": {\n', ' "original": "0:00:10.376000"\n', ' },\n', ' "words": "[*]",\n', ' "speaker": "",\n', ' "location": "",\n', ' "session_id": "MDT_F2F_223"\n', ' },\n', ' {\n', ' "start_time": {\n', ' "original": "0:00:10.376000"\n', ' },\n', ' "end_time": {\n', ' "original": "0:00:11.995000"\n', ' },\n', ' "words": "诶,那你考研想考哪儿",\n', ' ']']
[
{
"start_time": {
"original": "0:00:00.611000"
这是打印出来效果
原始文件中一行就是list一个item
如何读取json文件
方式1(常用)
import json
json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
with open(json_file, 'r') as f:
data1 = json.load(f)
用json模块load方法,load的参数是json文件对象
方式2
json_file ='/data/magic_data/transcription/train/MDT_F2F_223.json'
with open(json_file, 'r') as f:
lines = f.readlines()
json_str = ''.join(lines).replace('\n', '').replace(' ', '').replace(',}', '}')
data2 = json.loads(json_str)
lines得到一个字符串列表,列表每一项的值就是json文件中一行,调用json的loads方法,传入的参数是字符串, json_str = ‘’.join(lines)可以简写成这样,不用替换换行和空格字符也行
以上两种读取方式都行,一般习惯用第一种,第二种也比较浪费内存。
返回数据
上面两种方式读取的json文件,返回一个字典列表,其实和打印原json文件长得差不多,打印data2或者data1显示如下
[{'start_time': {'original': '0:00:00.611000'},
'end_time': {'original': '0:00:05.760000'},
'words': '+',
'speaker': '',
'location': '',
'session_id': 'MDT_F2F_223'},
{'start_time': {'original': '0:00:05.760000'},
'end_time': {'original': '0:00:09.755000'},
'words': '不想上了,你知道吗?我都烦死了',
'speaker': 'SPK506',
'location': '',
'session_id': 'MDT_F2F_223'},
{'start_time': {'original': '0:00:09.755000'},
'end_time': {'original': '0:00:10.376000'},
'words': '[*]',
'speaker': '',
'location': '',
'session_id': 'MDT_F2F_223'}]