neo4j 知识图谱_知识图谱-医疗疾病相关

我找了好久的数据,一直找不到金融相关的好数据来制作知识图谱(主要是我假若拿真实数据出来,里面满满的个人资料根本不能写在这里),找到一位大佬提供的医疗疾病相关数据,借来模仿学习一下知识图谱相关的知识

首先我先来介绍一下什么是知识图谱

知识图谱简介

知识图谱(Knowledge Graph / Vault)又称为科学知识图谱,其本质上是语义网络,是一种基于图的数据结构,由代码(点)边(边)组成。 表示现实世界中存在的“实体”,每条边为实体与实体之间的“关系”。知识图谱是关系的最有效的表示方式。

通俗地讲,知识图谱就是把所有不同种类的信息(异构信息) 连接在一起而得到的一个关系网络。知识图谱提供了从“关系”的角度去分析问题的能力。

可以从下面的图谱可以查到 「刘德华」为中心向外可以查询到的相关资讯,透过视觉化的方式呈现刘德华相关的资讯

fabf89a0e18ffe04b7d8cbac47a22cbe.png

制作知识图谱的工具-Neo4j

简单来说Neo4j 为一个图像制作的工具,并且提供python一个好用的套件py2neo,让我这个初入图谱的小白可以快速的制作Neo4j 的点(使用graph.create(node)函数),可以看下图

75938fdda8f9bb70c57cd6b46327439c.png

创造两个点之间的关系,可以透过py2neo的Relationship函数,使用的方式如下

r = Relationship( 点A , "A和B之间关系的名称", 点B)

把两个点连成关系图,就会变成如下的情况

7084510cf4ee482336c9c3936b00d0c6.png

医疗相关数据

我是参考以下这位大佬的代码进行学习,有兴趣的小伙伴也可以试着修改里面的代码

https://github.com/zhihao-chen/QASystemOnMedicalGraph​github.com

这位大佬提供19种分类如下所列

7325b5b941b705955c100286b84e1659.png

将第一笔数据打印出来让大家看一下里面有哪些资料

72bb607f8c9849c9345aef95655f68c5.png

医疗数据取得

透过Excel将所有的医疗资料全部读取出来,并且提取实体与实体之间的关系

from py2neo import Graph, Node, Relationship
import pandas as pd
import re
import os

cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
data_path = os.path.join(cur_dir, 'DATA/disease.csv')
graph = Graph("http://localhost:7474", username="neo4j", password="123456789")


"""
读取文件,获得实体,实体关系
:return:
"""
# cols = ["name", "alias", "part", "age", "infection", "insurance", "department", "checklist", "symptom",
#         "complication", "treatment", "drug", "period", "rate", "money"]
# 实体
diseases = []  # 疾病
aliases = []  # 别名
symptoms = []  # 症状
parts = []  # 部位
departments = []  # 科室
complications = []  # 并发症
drugs = []  # 药品

# 疾病的属性:age, infection, insurance, checklist, treatment, period, rate, money
diseases_infos = []
# 关系
disease_to_symptom = []  # 疾病与症状关系
disease_to_alias = []  # 疾病与别名关系
diseases_to_part = []  # 疾病与部位关系
disease_to_department = []  # 疾病与科室关系
disease_to_complication = []  # 疾病与并发症关系
disease_to_drug = []  # 疾病与药品关系

all_data = pd.read_csv(data_path, encoding='gb18030').loc[:, :].values
for data in all_data:
    disease_dict = {}  # 疾病信息
    # 疾病
    disease = str(data[0]).replace("...", " ").strip()
    disease_dict["name"] = disease
    # 别名
    line = re.sub("[,、;,.;]", " ", str(data[1])) if str(data[1]) else "未知"
    for alias in line.strip().split():
        aliases.append(alias)
        disease_to_alias.append([disease, alias])
    # 部位
    part_list = str(data[2]).strip().split() if str(data[2]) else "未知"
    for part in part_list:
        parts.append(part)
        diseases_to_part.append([disease, part])
    # 年龄
    age = str(data[3]).strip()
    disease_dict["age"] = age
    # 传染性
    infect = str(data[4]).strip()
    disease_dict["infection"] = infect
    # 医保
    insurance = str(data[5]).strip()
    disease_dict["insurance"] = insurance
    # 科室
    department_list = str(data[6]).strip().split()
    for department in department_list:
        departments.append(department)
        disease_to_department.append([disease, department])
    # 检查项
    check = str(data[7]).strip()
    disease_dict["checklist"] = check
    # 症状
    symptom_list = str(data[8]).replace("...", " ").strip().split()[:-1]
    for symptom in symptom_list:
        symptoms.append(symptom)
        disease_to_symptom.append([disease, symptom])
    # 并发症
    complication_list = str(data[9]).strip().split()[:-1] if str(data[9]) else "未知"
    for complication in complication_list:
        complications.append(complication)
        disease_to_complication.append([disease, complication])
    # 治疗方法
    treat = str(data[10]).strip()[:-4]
    disease_dict["treatment"] = treat
    # 药品
    drug_string = str(data[11]).replace("...", " ").strip()
    for drug in drug_string.split()[:-1]:
        drugs.append(drug)
        disease_to_drug.append([disease, drug])
    # 治愈周期
    period = str(data[12]).strip()
    disease_dict["period"] = period
    # 治愈率
    rate = str(data[13]).strip()
    disease_dict["rate"] = rate
    # 费用
    money = str(data[14]).strip() if str(data[14]) else "未知"
    disease_dict["money"] = money

    diseases_infos.append(disease_dict)

diseases = set(diseases)
symptoms = set(symptoms)
aliases = set(aliases)
parts = set(parts)
departments = set(departments)
complications = set(complications)
drugs = set(drugs)
disease_to_alias = disease_to_alias
disease_to_symptom = disease_to_symptom
diseases_to_part = diseases_to_part
disease_to_department = disease_to_department
disease_to_complication = disease_to_complication
disease_to_drug = disease_to_drug
diseases_infos = diseases_infos

创建节点

我们使用create_graphNodes创建一个节点,在创建节点时主要透过以下两个步骤将节点建立起来:

  1. 创建节点的属性:就是将每个节点的详细资料创建进去,可以参考下图
  2. 创建节点:依照各个不同的标签(如:症状、部位...)创建节点名称

ee6cecfd66c5ca8740faa87448a3bcba.png
def create_node(self, label, nodes):
    """
    创建节点
    :param label: 标签
    :param nodes: 节点
    :return:
    """
    count = 0
    for node_name in nodes:
        node = Node(label, name=node_name)
        self.graph.create(node)
        count += 1
        print(count, len(nodes))
    return

def create_diseases_nodes(self, disease_info):
    """
    创建疾病节点的属性
    :param disease_info: list(Dict)
    :return:
    """
    count = 0
    for disease_dict in disease_info:
        node = Node("Disease", name=disease_dict['name'], age=disease_dict['age'],
                    infection=disease_dict['infection'], insurance=disease_dict['insurance'],
                    treatment=disease_dict['treatment'], checklist=disease_dict['checklist'],
                    period=disease_dict['period'], rate=disease_dict['rate'],
                    money=disease_dict['money'])
        self.graph.create(node)
        count += 1
        print(count)
    return

def create_graphNodes(self):
    """
    创建知识图谱实体
    :return:
    """
    disease, symptom, alias, part, department, complication, drug, rel_alias, rel_symptom, rel_part, 
    rel_department, rel_complication, rel_drug, rel_infos = self.read_file()
    self.create_diseases_nodes(rel_infos)
    self.create_node("Symptom", symptom)
    self.create_node("Alias", alias)
    self.create_node("Part", part)
    self.create_node("Department", department)
    self.create_node("Complication", complication)
    self.create_node("Drug", drug)

    return

建立节点关系

透过之前整理好的疾病与每一个种类关系,建立起节点之间的关系,建立关系需要以下两个重点:

  1. 取得疾病(disease)与每一种种类之间的关系
  2. 透过Cypher查询节点将关系建立起来

如下图是疾病(disease)与症状(alias)之间的关系,可以从disease_to_alias的list中取得

b9aede0abf2eb13eadc47768de856f81.png
def create_graphRels(self):
    disease, symptom, alias, part, department, complication, drug, rel_alias, rel_symptom, rel_part, 
    rel_department, rel_complication, rel_drug, rel_infos = self.read_file()

    self.create_relationship("Disease", "Alias", rel_alias, "ALIAS_IS", "别名")
    self.create_relationship("Disease", "Symptom", rel_symptom, "HAS_SYMPTOM", "症状")
    self.create_relationship("Disease", "Part", rel_part, "PART_IS", "发病部位")
    self.create_relationship("Disease", "Department", rel_department, "DEPARTMENT_IS", "所属科室")
    self.create_relationship("Disease", "Complication", rel_complication, "HAS_COMPLICATION", "并发症")
    self.create_relationship("Disease", "Drug", rel_drug, "HAS_DRUG", "药品")

def create_relationship(self, start_node, end_node, edges, rel_type, rel_name):
    """
    创建实体关系边
    :param start_node:
    :param end_node:
    :param edges:
    :param rel_type:
    :param rel_name:
    :return:
    """
    count = 0
    # 去重处理
    set_edges = []
    for edge in edges:
        set_edges.append('###'.join(edge))
    all = len(set(set_edges))
    for edge in set(set_edges):
        edge = edge.split('###')
        p = edge[0]
        q = edge[1]

        # 使用Neo4j的Cypher查询节点
        query = "match(p:%s),(q:%s) where p.name='%s'and q.name='%s' create (p)-[rel:%s{name:'%s'}]->(q)" % (
            start_node, end_node, p, q, rel_type, rel_name)
        try:
            self.graph.run(query)
            count += 1
            print(rel_type, count, all)
        except Exception as e:
            print(e)
    return

知识图谱的关系图

建立完成的图谱可以在浏览器中输入 http://localhost:7474 打开Neo4j可以查到建立完成的知识图谱

下图为「疾病与科室关系」可以看到皮肤科与其他疾病的关系

c97628e1a1a9257dd802c03fa9251f03.png

这样就完成了知识图谱的关系建立,有兴趣的小伙伴可以尝试建立属于自己的知识图谱!!!

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值