从文本到知识库：构建你的知识图谱指南_通过需求文档构建知识图谱-CSDN博客

本文链接：https://blog.csdn.net/tt_jishu/article/details/144406776

## 引言

在当今数据驱动的世界中，知识图谱成为组织和分析信息的强大工具。此文章旨在指导您从非结构化文本中构建知识图谱，这些图谱随后可以用作RAG（检索增强生成）应用程序中的知识库。我们将使用Neo4j图数据库来存储和管理图谱数据。在此过程中，我们也会讨论潜在的挑战及其解决方案。

⚠️ 安全注意：在构建知识图谱时，需要对数据库进行写操作，这存在一定的风险。在导入数据之前，请确保验证和验证数据。有关一般安全最佳实践，请参阅[这里](#) 。

## 主要内容

### 1. 构建知识图谱的步骤

1. 从文本中提取结构化信息：使用模型从文本中提取结构化图谱信息。
2. 存储到图数据库：将提取的图谱信息存储到图数据库中，以供下游RAG应用程序使用。

### 2. 设置环境

首先，获取所需的软件包并设置环境变量。本例中，我们将使用Neo4j图数据库。

```bash
%pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-experimental neo4j

注意： 您可能需要重新启动内核以使用更新的软件包。

我们在本指南中默认使用OpenAI模型。

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

然后，我们需要定义Neo4j凭据和连接。请按照这些安装步骤设置Neo4j数据库。

from langchain_community.graphs import Neo4jGraph

os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

graph = Neo4jGraph()

3. 使用LLM进行图转换

使用大语言模型（LLM）从文本中提取图谱数据可以将非结构化信息转换为结构化格式。LLMGraphTransformer通过解析和分类实体及其关系，将文本文档转换为结构化图文档。

from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo")

llm_transformer = LLMGraphTransformer(llm=llm)

4. 图谱构建示例

以下是构建知识图谱的Python代码示例：

from langchain_core.documents import Document

text = """
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

5. 存储到图数据库

graph.add_graph_documents(graph_documents)

常见问题和解决方案

节点和关系定义： 您可以根据特定需求定义要提取的节点和关系类型。
非确定性输出： 由于使用LLM，图谱构建过程可能会得到不同的结果。确保在需要时进行后处理和验证。

总结与进一步学习资源

可能的挑战包括从复杂文本中提取准确的关系，以及确保大规模数据的存储效率。通过使用API代理服务（例如在某些地区的网络限制下调用API时），可以提高访问稳定性。

进一步学习建议：

参考资料

Neo4j 安装指南
OpenAI API 参考
LangChain 文档

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---