json文件截取部分如下所示:
{"total":4658,"p":10,"ps":500,"paging":{"pageIndex":10,"pageSize":500,"total":4658},"effortTotal":46580,"debtTotal":46580,"issues":[{"key":"AX1wsTp4vsycaNfv1YJD","rule":"c:FunctionName","severity":"MAJOR","component":"device_kernel:linux/drivers/pci/dwc/pci-dra7xx.c","project":"device_kernel","line":175,"hash":"d449ab7318aeabdf32810d2752cbfb9f","textRange":{"startLine":175,"endLine":175,"startOffset":0,"endOffset":73},"flows":[],"status":"OPEN","message":"Rename function \"dra7xx_pcie_enable_msi_interrupts\" to match the regular expression ^[a-z_][a-z0-9_]{2,30}$.","effort":"10min","debt":"10min","author":"","tags":["convention"],"transitions":["confirm","resolve","falsepositive","wontfix"],"actions":["set_type","set_tags","comment","set_severity","assign"],"comments":[],"creationDate":"2021-11-30T19:46:07+0800","updateDate":"2021-11-30T19:46:07+0800","type":"CODE_SMELL","scope":"MAIN"},{"key":"AX1wsTp4vsycaNfv1YJE","rule":"c:FunctionName","severity":"MAJOR","component":"device_kernel:linux/drivers/pci/dwc/pci-dra7xx.c","project":"device_kernel","line":185,"hash":"59133fa599aef543ad94a8834779fc69","textRange":{"startLine":185,"endLine":185,"startOffset":0,"endOffset":77},"flows":[],"status":"OPEN","message":"Rename function \"dra7xx_pcie_enable_wrapper_interrupts\" to match the regular expression ^[a-z_][a-z0-9_]{2,30}$.","effort":"10min","debt":"10min","author":"","tags":["convention"],"transitions":["confirm","resolve","falsepositive","wontfix"],"actions":["set_type","set_tags","comment","set_severity","assign"],"comments":[],"creationDate":"2021-11-30T19:46:07+0800","updateDate":"2021-11-30T19:46:07+0800","type":"CODE_SMELL","scope":"MAIN"}]
- 实现目标:
要求提取里面得字段component的值,并输出到excel;如果有相同的值,统计相同值的个数,并把值和个数对应输出到excel中
- 本脚本实现逻辑:
1.使用jsonpath组件将json文件多层嵌套中的component值截取出来
2.将获取的值写入xsl文件中
3.使用value_counts()方法统计值得个数
4.再次将计数结果写入csv文件中
import json, xlwt
import jsonpath
import pandas as pd
# jsonpath组件为用来解析多层嵌套的json数据,本处理json文件取component值正是多层嵌套下的值
# 读json文件:从json文件中读取内容存入python对象,不再用loads而是要用load
file = open('test.json', 'r', encoding='utf-8')
test = json.load(file) # 使用load方法将json数据解码,返回python字段的数据类型
s1 = jsonpath.jsonpath(test, "$..component") # 使用jsonpath组件逐层获取component值,此处component可替换成所要获取的值
# print(s1) # 返回的是一个列表
s2 = ("\n".join(s1)) # 列表转为字符串
#将component值写入xsl文件中
workbook = xlwt.Workbook(encoding='utf-8')
# 创建一个worksheet
worksheet = workbook.add_sheet('sheet')
x = 1 # 在第二行开始写
y = 0 # 在第一列开始写
xls = xlwt.Workbook()
sheet = xls.add_sheet('sheet', cell_overwrite_ok=True) # 生成excel的方法,声明excel
for z in s2.split("\n"):
sheet.write(0, 0, 'component') # 在第一行第一列单元格写"component"
sheet.write(x, y, z) # x代表行,y代表列
x += 1
xls.save('test.xls') # 保存
#计数
data = 'test.xls'
data = pd.read_excel('test.xls') # 导入信息
d = data['component'].value_counts()
# print(d)
#再次将计数结果写入test去重.csv表中
list=d
name = ['component']
test = pd.DataFrame(columns=name,data=list)
# print(test)
test.to_csv('test增加去重计数.csv',encoding='gbk')
- 实现效果
ps:
python脚本实现了目标要求,但也有部分逻辑语法上的问题,还需要优化,思路和方法可以参考下哈哈~~