本项目下载的是中科院刘焕勇的源码
https://github.com/liuhuanyong/QASystemOnMedicalKG
下载后如何运行的步骤方法:
(1)安装neo4j数据库以及相应的包,安装Neo4j时要先安装JDKjava开发工具包。要注意使用的版本问题,Neo4j是版本4的,Java是1.8版本的,在本项目中使用的是py2neo=4.3.0版本的数据包,太高不可以运行。
以下是关于安装Neo4j的相关链接以及基础了解:
https://so.csdn.net/so/search?q=neo4j&spm=1001.2101.3001.7020
(2)python 安装py2neo和pyahocorasick包,安装pyahocorasick的时候报错,提示安装Visual Studio Build Tools:
先安装 Microsoft Visual C++ :在 https://visualstudio.microsoft.com/downloads/ 下载Build Tools, 安装后,在模块选择里勾选Visual Studio Build Tools里面的C++ Build Tools。
有的人说直接用anaconda安装pyahocorasick不需要安装VC,具体我没试过。
(3)接着运行程序:
1)先修改build_medicalgraph和answer_search的user和password,改成你的neo4j的账号名和密码
2)然后在build_medicalgraph的最后两行添加:
handler.create_graphnodes()
handler.create_graphrels()
3)运行build_medicalgraph,有的可能会报错:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 81: illegal multibyte sequence.
把有open的地方加上encoding=‘utf-8’
4)数据很多,会运行几个小时,运行完之后打开neo4j explore,就有节点和图
5)再运行chatbot_graph.py,输入你想问的问题,就会出来答案
关于模型代码的解析:
(1)对于知识图谱的构建,首先是数据的获取,数据主要是通过爬虫获取到的,且是结构化数据,对于半结构化数据无需从句子或文章中进行知识抽取等相关操作,最终本文主要是通过将数据保存成json格式使用数据。构建数据这部分主要是构建实体类型,属性以及关系的相关操作,源代码中有相应的注解,就不在此贴出相关的代码解释了。代码还包括了问句的分类、解析、对解析结果的查询以及返回查询问句结果几部分,代码包括自己的理解,如有其他见解或错误请提出,仅代表我个人的理解。
(2)部分代码片段
问句分类部分
import os
import ahocorasick
#自动机
#可实现自动批量匹配字符串的作用,即可一次返回该条字符串中命中的所有关键词
class QuestionClassifier:
def __init__(self):
#cur_dir 是当前目录,其中[:-1]可以达到返回上一层的效果
#获取的绝对路径os.path.abspath(__file__)
cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
# 特征词路径
self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
self.department_path = os.path.join(cur_dir, 'dict/department.txt')
self.check_path = os.path.join(cur_dir, 'dict/check.txt')
self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
self.food_path = os.path.join(cur_dir, 'dict/food.txt')
self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
# 加载特征词,七类词包括七种实体部分的词和构建的领域词和一些否定词
self.disease_wds= [i.strip() for i in open(self.disease_path,encoding='utf-8') if i.strip()]
self.department_wds= [i.strip() for i in open(self.department_path,encoding='utf-8') if i.strip()]
self.check_wds= [i.strip() for i in open(self.check_path,encoding='utf-8') if i.strip()]
self.drug_wds= [i.strip() for i in open(self.drug_path,encoding='utf-8') if i.strip()]
self.food_wds= [i.strip() for i in open(self.food_path,encoding='utf-8') if i.strip()]
self.producer_wds= [i.strip() for i in open(self.producer_path,encoding='utf-8') if i.strip()]
self.symptom_wds= [i.strip() for i in open(self.symptom_path,encoding='utf-8') if i.strip()]
self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
self.deny_words = [i.strip() for i in open(self.deny_path,encoding='utf-8') if i.strip()]
# 构造领域actree
self.region_tree = self.build_actree(list(self.region_words))
# 构建词典-格式比如{'感冒':'disease'....}
self.wdtype_dict = self.build_wdtype_dict()
# 问句疑问词,问句疑问包含了疾病的属性和边相关的问题词
self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
'怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
'怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
'怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
'怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
'有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']
print('model init finished ......')
return
'''分类主函数'''
def classify(self, question):
data = {}
# # check_medical 是定义在后面的函数
# 搜寻最终提取词的信息 比如{'感冒‘:’diseases‘.....}
medical_dict = self.check_medical(question)
if not medical_dict:
return {}
data['args'] = medical_dict
#收集问句当中所涉及到的实体类型
types = []
for type_ in medical_dict.values():
types += type_
question_type = 'others'
question_types = []
# 症状
if self.check_words(self.symptom_qwds, question) and ('disease' in types):
question_type = 'disease_symptom'
question_types.append(question_type)
if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
question_type = 'symptom_disease'
question_types.append(question_type)
# 原因
if self.check_words(self.cause_qwds, question) and ('disease' in types):
question_type = 'disease_cause'
question_types.append(question_type)
# 并发症
if self.check_words(self.acompany_qwds, question) and ('disease' in types):
question_type = 'disease_acompany'
question_types.append(question_type)
# 推荐食品
if self.check_words(self.food_qwds, question) and 'disease' in types:
deny_status = self.check_words(self.deny_words, question)
if deny_status:
question_type = 'disease_not_food'
else:
question_type = 'disease_do_food'
question_types.append(question_type)
#已知食物找疾病
if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
deny_status = self.check_words(self.deny_words, question)
if deny_status:
question_type = 'food_not_disease'
else:
question_type = 'food_do_disease'
question_types.append(question_type)
# 推荐药品
if self.check_words(self.drug_qwds, question) and 'disease' in types:
question_type = 'disease_drug'
question_types.append(question_type)
# 药品治啥病
if self.check_words(self.cure_qwds, question) and 'drug' in types:
question_type = 'drug_disease'
question_types.append(question_type)
# 疾病接受检查项目
if self.check_words(self.check_qwds, question) and 'disease' in types:
question_type = 'disease_check'
question_types.append(question_type)
# 已知检查项目查相应疾病
if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
question_type = 'check_disease'
question_types.append(question_type)
# 症状防御
if self.check_words(self.prevent_qwds, question) and 'disease' in types:
question_type = 'disease_prevent'
question_types.append(question_type)
# 疾病医疗周期
if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
question_type = 'disease_lasttime'
question_types.append(question_type)
# 疾病治疗方式
if self.check_words(self.cureway_qwds, question) and 'disease' in types:
question_type = 'disease_cureway'
question_types.append(question_type)
# 疾病治愈可能性
if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
question_type = 'disease_cureprob'
question_types.append(question_type)
# 疾病易感染人群
if self.check_words(self.easyget_qwds, question) and 'disease' in types :
question_type = 'disease_easyget'
question_types.append(question_type)
# 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
if question_types == [] and 'disease' in types:
question_types = ['disease_desc']
# 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
if question_types == [] and 'symptom' in types:
question_types = ['symptom_disease']
# 将多个分类结果进行合并处理,组装成一个字典
data['question_types'] = question_types
return data
'''构造词对应的类型
根据7类实体构造{特征词:特征词对应类型}词典。
存储region_word中对应词汇的类型(疾病、科室)
'''
def build_wdtype_dict(self):
wd_dict = dict()
# region_words包含了一系列信息
for wd in self.region_words:
wd_dict[wd] = []
#查询 关键词 是否在对应的列表中存在,若存在则添加,不存在返回空
if wd in self.disease_wds:
wd_dict[wd].append('disease')
if wd in self.department_wds:
wd_dict[wd].append('department')
if wd in self.check_wds:
wd_dict[wd].append('check')
if wd in self.drug_wds:
wd_dict[wd].append('drug')
if wd in self.food_wds:
wd_dict[wd].append('food')
if wd in self.symptom_wds:
wd_dict[wd].append('symptom')
if wd in self.producer_wds:
wd_dict[wd].append('producer')
return wd_dict
#构造actree,加速过滤
#该函数构建领域actree,加速过滤。通过python的ahocorasick库实现。
#ahocorasick是一种字符串匹配算法,由两种数据结构实现:trie和Aho-Corasick自动机。
#Trie是一个字符串索引的词典,检索相关项时时间和字符串长度成正比。
#AC自动机能够在一次运行中找到给定集合所有字符串。AC自动机其实就是在Trie树上实现KMP,
#可以完成多模式串的匹配。
#具体ahocorasick用法非本文重点,
#可参考https://blog.csdn.net/pirage/article/details/51657178等博文。
#类似KMP,快速匹配
def build_actree(self, wordlist):
actree = ahocorasick.Automaton()#初始化trie树
for index, word in enumerate(wordlist):
actree.add_word(word, (index, word))#向trie树中添加单词
actree.make_automaton()#将trie树转化成Aho-Corasick
return actree
#问句过滤
#通过ahocorasick库的iter()函数匹配领域词,将有重复字符串的领域词去除短的,
# 取最长的领域词返回。功能为过滤问句中含有的领域词,
# 返回{问句中的领域词:词所对应的实体类型}。
# 思路
#1.初始化
#词典:疾病、科室、检查项目、药物、食物、具体品牌的药、症状、表否定意义的词以及一个拥有全部词语的词典region_word
#把region_word中所有的词取出构造actree(为了加快后面的搜索速度):region_tree
#新建一个词典wdtype_dict,存储region_word中对应词汇的类型(疾病、科室...)
#构造同义词词典,便于理解用户意思,适应不同的表述方法
#2.分析用户的问题
#问句过滤(过滤出用户提到的领域内信息):通过region_tree查找出所有在词典region_word中出现的关键词,并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
#问题分类(判断用户具体已知什么求什么):通过同义词表和wdtype_dict关键词词典判断出用户的具体问题
#原文链接:https://blog.csdn.net/floracuu/article/details/113574130
#问句过滤(过滤出用户提到的领域信息)通过region_tree查找出所有在词典region_word中出现的关键词
#并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
def check_medical(self, question):
region_wds = []
# region_tree 是一棵用region_wds 做出来的actree,快速找出question与之匹配的实体
# 但是有时候匹配的结果与我们想的不一,比如“瓜烧白菜”和“白菜”是不一样的
# 通过ahocorasick库的iter()函数匹配领域词
# # ahocorasick库 匹配问题 iter返回一个元组,i的形式如(3, (23192, '乙肝'))
for i in self.region_tree.iter(question):
#wd是question用actree作了加速
wd = i[1][1] #匹配到的词
region_wds.append(wd)
#利用停用词过滤
stop_wds = []
for wd1 in region_wds:
for wd2 in region_wds:
#如果词语不一样,则添加较长的
##判断每对儿词之间的关系,选择更详细的加入词典
#比如“内科”in“消化内科”,并且!=
if wd1 in wd2 and wd1 != wd2:
stop_wds.append(wd1)#取短词
#更新最后剩下的词语组合
final_wds = [i for i in region_wds if i not in stop_wds]#取长词
# 更新字典,格式比如{'感冒':'disease'....}
final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}
return final_dict
#基于特征词进行分类
#该函数检查问句中是否含有某实体类型内的特征词。
def check_words(self, wds, sent):
for wd in wds:
if wd in sent:
return True
return False
if __name__ == '__main__':
handler = QuestionClassifier()
#问题输入到分类过程
while 1:
question = input('input an question:')
data = handler.classify(question)
print(data)
问句解析
#将用户问题转换成neo4j的查询语句
#1.将提取出的问题关键词按照类型合并
#2.循环取出问题字段,将其翻译成neo4j查询语句
"""
parser_main函数
该函数为问句解析主函数。
首先传入问句分类结果,获取问句中领域词及其实体类型。
接着调用build_entitydict函数,返回形如{'实体类型':['领域词'],...}的entity_dict字典。
然后对问句分类返回值中[‘question_types’]的每一个question_type,
调用sql_transfer函数转换为neo4j的Cypher语言。
最后组合每种question_type转换后的sql查询语句。
原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
"""
class QuestionPaser:
# 如: args={'青光眼': ['disease'], '肺气肿': ['disease'], '消化内科': ['department']}
# 合并后: entity_dict= {'disease': ['青光眼', '肺气肿'], 'department': ['消化内科']}
#原文链接:https: // blog.csdn.net / floracuu / article / details / 113828998
'''构建实体节点'''
def build_entitydict(self, args):
#args 实质是将函数传入的参数存储在元组类型的变量args中
entity_dict = {}
#键值和类型
for arg, types in args.items():
for type in types:
if type not in entity_dict:
entity_dict[type] = [arg]
else:
entity_dict[type].append(arg)
return entity_dict
'''解析主函数'''
def parser_main(self, res_classify):
# 取到关键词
args = res_classify['args']
# 合并同类型的字段
entity_dict = self.build_entitydict(args)
question_types = res_classify['question_types']
sqls = []
# 取到所有的问题类型,并且将问题类型转换为对应的sql语句,每次通过sql_{}转换为词典全部存入sqls[]
# 其中sql_{}中一共有两个字段question_types和sql
for question_type in question_types:
sql_ = {}#变量后带下划线避免与系统关键词冲突。
sql_['question_type'] = question_type
sql = []
if question_type == 'disease_symptom':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'symptom_disease':
sql = self.sql_transfer(question_type, entity_dict.get('symptom'))
elif question_type == 'disease_cause':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_acompany':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_not_food':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_do_food':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'food_not_disease':
sql = self.sql_transfer(question_type, entity_dict.get('food'))
elif question_type == 'food_do_disease':
sql = self.sql_transfer(question_type, entity_dict.get('food'))
elif question_type == 'disease_drug':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'drug_disease':
sql = self.sql_transfer(question_type, entity_dict.get('drug'))
elif question_type == 'disease_check':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'check_disease':
sql = self.sql_transfer(question_type, entity_dict.get('check'))
elif question_type == 'disease_prevent':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_lasttime':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_cureway':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_cureprob':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_easyget':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
elif question_type == 'disease_desc':
sql = self.sql_transfer(question_type, entity_dict.get('disease'))
if sql:
sql_['sql'] = sql
sqls.append(sql_)
return sqls
'''针对不同的问题,翻译成Neo4j的SQL语句'''
def sql_transfer(self, question_type, entities):
if not entities:
return []
# 查询语句
sql = []
# 查询疾病的原因
if question_type == 'disease_cause':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cause".format(i) for i in entities]
# 查询疾病的防御措施
elif question_type == 'disease_prevent':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.prevent".format(i) for i in entities]
# 查询疾病的持续时间
elif question_type == 'disease_lasttime':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_lasttime".format(i) for i in entities]
# 查询疾病的治愈概率
elif question_type == 'disease_cureprob':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cured_prob".format(i) for i in entities]
# 查询疾病的治疗方式
elif question_type == 'disease_cureway':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_way".format(i) for i in entities]
# 查询疾病的易发人群
elif question_type == 'disease_easyget':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.easy_get".format(i) for i in entities]
# 查询疾病的相关介绍
elif question_type == 'disease_desc':
sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.desc".format(i) for i in entities]
# 查询疾病有哪些症状
elif question_type == 'disease_symptom':
sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
# 查询症状会导致哪些疾病
elif question_type == 'symptom_disease':
sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
# 查询疾病的并发症
elif question_type == 'disease_acompany':
sql1 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql2 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql = sql1 + sql2
# 查询疾病的忌口
elif question_type == 'disease_not_food':
sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
# 查询疾病建议吃的东西
elif question_type == 'disease_do_food':
sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql = sql1 + sql2
# 已知忌口查疾病
elif question_type == 'food_not_disease':
sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
# 已知推荐查疾病
elif question_type == 'food_do_disease':
sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql = sql1 + sql2
# 查询疾病常用药品-药品别名记得扩充
elif question_type == 'disease_drug':
sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql = sql1 + sql2
# 已知药品查询能够治疗的疾病
elif question_type == 'drug_disease':
sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
sql = sql1 + sql2
# 查询疾病应该进行的检查
elif question_type == 'disease_check':
sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
# 已知检查查询疾病
elif question_type == 'check_disease':
sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
return sql
#用cypher语句搜索问题类型,将找到的信息以python模式添加到答案里。
if __name__ == '__main__':
handler = QuestionPaser()
解析后的结果查询
"""
问句解析之后需要对解析后的结果进行查询。
该脚本创建了一个AnswerSearcher类。与build_medicalgraph.py类似,
该类定义了Graph类的成员变量g和返回答案列举的最大个数num_list。
该类的成员函数有两个,一个查询主函数一个回复模块。
search_main函数
传入问题解析的结果sqls,将保存在queries里的[‘question_type’]和[‘sql’]分别取出。
首先调用self.g.run(query).data()函数执行[‘sql’]中的查询语句得到查询结果,
再根据[‘question_type’]的不同调用answer_prettify函数将查询结果和答案话术结合起来。
最后返回最终的答案。
answer_prettify函数
该函数根据对应的qustion_type,调用相应的回复模板。
原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
"""
"""
执行neo4j查询语句并拼接成自然语言
"""
from py2neo import Graph
class AnswerSearcher:
#链接数据库
def __init__(self):
self.g = Graph(
host="127.0.0.1",
http_port=7474,
user="neo4j",
password="101827bdx")
self.num_limit = 20
'''执行cypher查询,并返回相应结果'''
def search_main(self, sqls):
final_answers = []
for sql_ in sqls:
question_type = sql_['question_type']
queries = sql_['sql']
answers = []
for query in queries:
#执行sql语句
ress = self.g.run(query).data()
answers += ress
#传过去当前问题和当前问题的所有回答
final_answer = self.answer_prettify(question_type, answers)
if final_answer:
final_answers.append(final_answer)
return final_answers
'''根据对应的qustion_type,调用相应的回复模板'''
def answer_prettify(self, question_type, answers):
final_answer = []
if not answers:
return ''
if question_type == 'disease_symptom':
# 根据上文,m代表疾病,n代表查询另一端结点,此处是症状
desc = [i['n.name'] for i in answers]
# {0}{1}代表format函数中变量的位置
# set方法是对元素进行去重,处理之后是一个字典形式,使用list是将其转化为列表
# 将症状去重化为列表,将列表中所有项通过分号连接成完整的部分
subject = answers[0]['m.name']
final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'symptom_disease':
desc = [i['m.name'] for i in answers]
subject = answers[0]['n.name']
final_answer = '症状{0}可能染上的疾病有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_cause':
desc = [i['m.cause'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}可能的成因有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_prevent':
desc = [i['m.prevent'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}的预防措施包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_lasttime':
desc = [i['m.cure_lasttime'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}治疗可能持续的周期为:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_cureway':
desc = [';'.join(i['m.cure_way']) for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}可以尝试如下治疗:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_cureprob':
desc = [i['m.cured_prob'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}治愈的概率为(仅供参考):{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_easyget':
desc = [i['m.easy_get'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}的易感人群包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_desc':
desc = [i['m.desc'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0},熟悉一下:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_acompany':
desc1 = [i['n.name'] for i in answers]
desc2 = [i['m.name'] for i in answers]
subject = answers[0]['m.name']
desc = [i for i in desc1 + desc2 if i != subject]
final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_not_food':
desc = [i['n.name'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}忌食的食物包括有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_do_food':
do_desc = [i['n.name'] for i in answers if i['r.name'] == '宜吃']
recommand_desc = [i['n.name'] for i in answers if i['r.name'] == '推荐食谱']
subject = answers[0]['m.name']
final_answer = '{0}宜食的食物包括有:{1}\n推荐食谱包括有:{2}'.format(subject, ';'.join(list(set(do_desc))[:self.num_limit]), ';'.join(list(set(recommand_desc))[:self.num_limit]))
elif question_type == 'food_not_disease':
desc = [i['m.name'] for i in answers]
subject = answers[0]['n.name']
final_answer = '患有{0}的人最好不要吃{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
elif question_type == 'food_do_disease':
desc = [i['m.name'] for i in answers]
subject = answers[0]['n.name']
final_answer = '患有{0}的人建议多试试{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
elif question_type == 'disease_drug':
desc = [i['n.name'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}通常的使用的药品包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'drug_disease':
desc = [i['m.name'] for i in answers]
subject = answers[0]['n.name']
final_answer = '{0}主治的疾病有{1},可以试试'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'disease_check':
desc = [i['n.name'] for i in answers]
subject = answers[0]['m.name']
final_answer = '{0}通常可以通过以下方式检查出来:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
elif question_type == 'check_disease':
desc = [i['m.name'] for i in answers]
subject = answers[0]['n.name']
final_answer = '通常可以通过{0}检查出来的疾病有{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
return final_answer
if __name__ == '__main__':
searcher = AnswerSearcher()
(3)本项目的问答系统完全基于规则匹配实现,通过关键词匹配,对问句进行分类, #医疗问题本身属于封闭域类场景,对领域问题进行穷举并分类, 然后使用cypher的match去匹配查找neo4j,根据返回数据组装问句回答,最后返回结果。 问答框架的构建是通过chatbot_graph.py、answer_search.py、 # question_classifier.py、question_parser.py等脚本实现。
资料链接直通: