任务描述:
有三种大模型对一组问题进行了回答,记录保存在json文件中。现在对回答的记录进行总数、正确次数、正确率进行统计,统计结果保存在目标json文件中,并把没有处理的文件分类保存,留以后人工判断。
任务实现
1.编写配置文件
{
"base_loc": "./files/",
"source": [
"origin_llm",
"origin_llm_kb",
"zhipu_ai"
],
"target": "./output.json",
"unresolved_loc": "./unresolved/"
}
将此次统计所用到的配置项都保存到配置文件中,便于以后复用
2.编写脚本
每个大模型的回答记录单独放置在不同的文件中,对每个文件的统计逻辑是相同的。
方法实现:
def count_up(jsons, llm_name):
total_num = len(jsons)
solved_num = 0
unsolved = []
for j in jsons:
correct_answer = str(j['correct_answer'])
kb_answer = str(j['kb_answer'])
if kb_answer.find(correct_answer) != -1 or kb_answer.find(correct_answer.lower()) != -1:
solved_num += 1
if not contains_letters(kb_answer):
unsolved.append(j)
rate = format(solved_num / total_num, '.4f')
return {
"Tot_problem_nums": total_num,
"Solved_problem_nums": solved_num,
"Solve_rate": rate,
"llm_name": llm_name
}, unsolved
def contains_letters(text):
return bool(re.search(r'[a-dA-D]', text))
这里传入两个参数,json文件,大模型名称。
回答记录的json文件对每次问题的信息都是采用一个对象的形式存储的,所以文件的第一层就是一个数组。
对数组的每个对象统计正确的次数保存;对无法统计的(回答中不含有选项字母的)问题保存,一并返回。
最终返回的是一个大模型的统计记录和未处理的数组。
main:
if __name__ == '__main__':
with open('config/config.json', 'r', encoding='utf-8') as f:
config = json.load(f)
files = config['source']
base = config['base_loc']
target = config['target']
unresolved_loc = config['unresolved_loc']
res = []
for name in files:
loc = base + name + '.json'
with open(loc, 'r', encoding='utf-8') as cf:
jsonlist = json.load(cf)
ok, unresolved_list = count_up(jsonlist, name)
res.append(ok)
# 未处理的保存
loc = unresolved_loc + name + '_unresolved.json'
with open(loc, 'w', encoding='utf-8') as uf:
json.dump(unresolved_list, uf, ensure_ascii=False, indent=4)
uf.close()
cf.close()
with open(target, 'w', encoding='utf-8') as f:
json.dump(res, f, ensure_ascii=False, indent=4)
print(contains_letters('done'))
读取配置文件,遍历source数组,对每个文件保存未处理的记录,保存统计结果。
3.结果呈现