python汇总json数据_python整理包含中文的json格式数据

最新推荐文章于 2024-07-06 02:39:47 发布

weixin_39937312

最新推荐文章于 2024-07-06 02:39:47 发布

阅读量169

点赞数 1

文章标签： python汇总json数据

json解析关键词提取内容摘要前端开发后端开发

关键词由CSDN通过智能技术生成

import json

import jieba,jieba.analyse

import re

#一、读取文件

f=open("D:\\文件名.do",encoding="utf-8")

text=f.read()

print(text[0:10])

print(text[-1])

print(type(text))

#二、用json解析

json=json.loads(text)#有时显示错误，重启编程窗口就好了

type(json)#dict

print(len(json),json.keys())

json['bond']

type(json['queryPage']['list'])==type(list())#True

#三、探索json结构

def select_json(json):#定义展示大小的函数来筛选主要内容

content_count_0={}

#if

type(json)=='list':

if

type(json)==type(list()):

keys=json

for i,key in enumerate(keys):

try:

content_count_0[str(i)+str(type(key))]=len(str(key))

except:

content_count_0[str(i)+type(key)]=0

else:

keys=json.keys()

for key in keys:

try:

content_count_0[key]=len(str(json[key]))

except:

content_count_0[key]=0

print(content_count_0)#展示每一层的字符串长度、以定位主键key

return

content_count_0

content_count_0=select_json(json)

content_count_1=select_json(json['queryPage'])

content_count_2=select_json(json['queryPage']['list'])

contents=json['queryPage']['list']#列表

[len(content) for content in contents]#看看列表内容格式是否固定

contents[0]['lksFields'][4]['value']#题目

contents[0]['lksFields'][0]['value']#内容

[len(content['lksFields']) for content in

contents]#看看列表内容格式是否固定

#四、摘取合并主要内容并去重复【目前还不完整，因为同一层的长度不一致、不能用固定位置法来提取数据】

title_abstract={"标题":["关键词周围摘要","出现次数"]}

for content in contents:

title=content['lksFields'][4]['value']

if title in

title_abstract:

title_abstract[title][1] += 1

else:

title_abstract[title]=[content['lksFields'][0]['value'],1]

#五、全部获取

#s="abcde"

#("a" in s) or ("r" in s) or ("f" in s) #True

#s.replace("c","")

title_abstracts=[]

contain_zh = re.compile(u'[\u4e00-\u9fa5]+')

for content in contents:

for elm in

content['lksFields']:

s =elm['value']

if contain_zh.search(s):#判断字符串中是否包含汉字

#if ("kms" in s) or ("gateway" in s) or ("news"

in s) or ("knowledge" in s):

#

pass

#

else:

title_abstracts.append(s)

contents_str=str(title_abstracts).replace("","").replace("","").replace("","")

print(len(title_abstracts),len(contents_str))

text_file=open("D:\\文件名.txt","w",encoding="utf-8")

text_file.write(contents_str)

text_file.close()

tf_idf_keywords=jieba.analyse.extract_tags(contents_str,topK=500,

withWeight=True,allowPOS=(),

withFlag=True)

#去除换行 o:p以及空格等无用字符

#分句、去重更有效的提取关键信息

weixin_39937312

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。