创新实训-中医药知识图谱的构建与应用02

赫兹H

已于 2022-06-03 16:27:22 修改

阅读量1.8k

点赞数

分类专栏：创新实训文章标签：知识图谱

于 2022-03-18 21:34:02 首次发布

本文链接：https://blog.csdn.net/weixin_45897586/article/details/123582225

版权

创新实训专栏收录该内容

7 篇文章 2 订阅

订阅专栏

创新实训-中医药知识图谱的构建与应用02

梳理支持问答的类型
使用AC自动机进行实体提取
构造领域词典
问句过滤
问句分类
运行效果

这部分参考该github项目

梳理支持问答的类型

由于我们的数据集是针对中医药的数据集，因此支持的问答类型也都以中医药为中心。
在这里插入图片描述

使用AC自动机进行实体提取

我们采用词典匹配的方式进行意图识别，就是将用户查询的内容与词典中的内容进行匹配。
通过将该领域的实体名字加入到AC树中，可以查询问句中出现的实体名字，从而实现实体提取。

    def build_actree(self, wordlist):
        actree = ahocorasick.Automaton()
        for index, word in enumerate(wordlist):
            actree.add_word(word, (index, word))
        actree.make_automaton()
        return actree

构造领域词典

 def build_wdtype_dict(self):
        wd_dict = dict()
        for wd in self.region_words:
            wd_dict[wd] = []
            if wd in self.disease_wds:
                wd_dict[wd].append('disease')
            if wd in self.drug_wds:
                wd_dict[wd].append('drug')
            if wd in self.ingredient_wds:
                wd_dict[wd].append('ingedient')
            if wd in self.taste_wds:
                wd_dict[wd].append('taste')
            if wd in self.people_wds:
                wd_dict[wd].append('people')
        return wd_dict

问句过滤

由于在AC自动机匹配时，例如乌鸡白凤丸可能会有多个匹配，比如“乌鸡”和“乌鸡白凤丸”两个词都匹配，因此需要过滤筛选，将“乌鸡”这个词过滤掉。

    def check_medical(self, question):
    	#存储实体名称
        region_wds = []
        for i in self.region_tree.iter(question):
            #找到在问题中出现的词
            wd = i[1][1]
            region_wds.append(wd)
        stop_wds = []
        #过滤
        for wd1 in region_wds:
            for wd2 in region_wds:
                if wd1 in wd2 and wd1 != wd2:
                    stop_wds.append(wd1)
        #问题中出现的所有领域词
        final_wds = [i for i in region_wds if i not in stop_wds]
        #存储实体及其标签，如乌鸡白凤丸:drug
        final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}

        return final_dict

问句分类

根据问句中出现的疑问词和问句中出现的实体类型来判断问句类型

    '''分类主函数'''
    #传入问句question
    def classify(self, question):
        data = {}
        medical_dict = self.check_medical(question)
        if not medical_dict:
            return {}
        data['args'] = medical_dict
        #收集问句当中所涉及到的实体类型
        types = []
        for type_ in medical_dict.values():
            types += type_
        question_type = 'others'
        #记录问题的类型
        question_types = []

        # 能治疗什么症状
        if self.check_words(self.disease_wds, question) and ('drug' in types):
            question_type = 'drug_disease'
            question_types.append(question_type)
        #药物性味
        if self.check_words(self.taste_qwds, question) and ('drug' in types):
            question_type = 'drug_taste'
            question_types.append(question_type)

        #什么人不能吃
        if self.check_words(self.people_qwds, question) and ('drug' in types):
            question_type = 'drug_people'
            question_types.append(question_type)
        #应该吃啥药
        if self.check_words(self.eat_qwds, question) and ('disease' in types):
            question_type = 'disease_drug'
            question_types.append(question_type)

        #含有什么成分
        if self.check_words(self.ingredient_qwds, question) and ('drug' in types):
            question_type = 'drug_ingredient'
            question_types.append(question_type)

        

        # 若没有查到相关的外部查询信息，那么则将该药物的描述信息返回
        if question_types == [] and 'drug' in types:
            question_types = ['drug_desc']

        

        # 将多个分类结果进行合并处理，组装成一个字典
        data['question_types'] = question_types

        return data

运行效果

问句中的实体识别及问句意图识别功能完成！
在这里插入图片描述

赫兹H

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
创新实训-中医药知识图谱的构建与应用02

创新实训-中医药知识图谱的构建与应用02梳理支持问答的类型构建AC树进行实体提取问句过滤问句分类这部分参考该github项目梳理支持问答的类型由于我们的数据集是针对中医药的数据集，因此支持的问答类型也都以中医药为中心。构建AC树进行实体提取通过将该领域的实体名字加入到AC树中，可以查询问句中出现的实体名字，从而实现实体提取。 def build_actree(self, wordlist): actree = ahocorasick.Automaton()
复制链接

扫一扫

专栏目录