langchain+neo4j 实现图谱GraphRAG

整体思路参考如下
![外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传](https://img-home.csdnimg.cn/images/20230724024159.png?origin_url=attachment%3A

1.首先单独conda create 一个环境,用来安装依赖

pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j

2.首先是基础代码,这一步先查看当前模型的能力是否能够提取出文本中的实体和关系。(在这之前,笔者尝试了deepseek,glm4等api)其中glm-4 api10次大概有2次能够识别正确结果。
后切换为qwen api,初步代码如下,参考的是langchain + neo4j

#!/usr/bin/python
# -*- coding: <utf-8> -*-
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.llms import Tongyi
import os

# 设置api_key
os.environ["DASHSCOPE_API_KEY"] = "sk-key"
llm = Tongyi()

#链接neo4j 账号密码
username = "user"
password = "password"
url = "your-url"
database = "neo4j"

graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

# LLMGraphTransformer模块 构建图谱
llm_transformer = LLMGraphTransformer(llm=llm)
# 导入文档————参考链接中居里夫人那段文本
raw_documents = TextLoader(r"E:\task1\data\1.txt").load()
# 将文本分割成每个包含20个tokens的块,并且这些块之间没有重叠
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=20, chunk_overlap=0
)
# Chunk the document
documents = text_splitter.split_documents(raw_documents)
# 打印原始文档内容
print(f"Raw Document Content: {raw_documents[0].page_content}")
# 打印分块后的文档内容
for i, doc in enumerate(documents):
    print(f"Chunk {i}: {doc.page_content}")
# 打印转换过程中的中间结果
graph_documents = llm_transformer.convert_to_graph_documents(documents)
for i, graph_doc in enumerate(graph_documents):
    print(f"Graph Document {i}:")
    print(f"Nodes: {graph_doc.nodes}")
    print(f"Relationships: {graph_doc.relationships}")

运行结果
以上是运行结果,可以看到提取的节点大多都囊括了。
但是对于一个较长的语句,是我目前不需要的。

3.后续需要调整代码,限制节点的类型,以及关系的类型。(根据参考链接进行修改)

#!/usr/bin/python
# -*- coding: <utf-8> -*-
from langchain.docstore.document import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.graphs import Neo4jGraph
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain.llms import Tongyi
import os

# 设置api_key
os.environ["DASHSCOPE_API_KEY"] = "sk-key"
llm = Tongyi()

#链接neo4j 账号密码
username = "user"
password = "password"
url = "your-url"
database = "neo4j"

graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

# 导入文档 
raw_documents = TextLoader(r"E:\task1\data\1.txt").load()
# 将文本分割成每个包含20个tokens的块,并且这些块之间没有重叠
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=20, chunk_overlap=0
)
# Chunk the document
documents = text_splitter.split_documents(raw_documents)
# 打印原始文档内容
print(f"Raw Document Content: {raw_documents[0].page_content}")
# 打印分块后的文档内容
for i, doc in enumerate(documents):
    print(f"Chunk {i}: {doc.page_content}")
# 打印转换过程中的中间结果
llm_transformer_filtered = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=["Person", "Country", "Organization"],
    allowed_relationships=["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
)
graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
    documents
)
for i, graph_doc in enumerate(graph_documents_filtered):
    print(f"Graph Document {i}:")
    print(f"Nodes: {graph_doc.nodes}")
    print(f"Relationships: {graph_doc.relationships}")

限定的结果输出
以上是限制好了节点的类型,关系有哪些。

4.最后添加到你链接的Neo4j图谱中。

graph.add_graph_documents(
  graph_documents_filtered, 
  baseEntityLabel=True, 
  include_source=True
)

neo4j图谱

5.构建非结构化检索(verctor_index)

# 文本向量化 混合检索 文本和关键词检索器构建
vector_index = Neo4jVector.from_existing_graph(
    embeddings,
    search_type="hybrid",
    node_label="Document",
    text_node_properties=["text"],
    embedding_node_property="embedding"
)

6.结构化检索器(图谱)

① 提取question中的实体,后续用于图谱中去提取结构化的那些数据

from langchain.prompts import (
    PromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
from typing import List
from langchain.llms import OpenAI  # 假设使用OpenAI的LLM
import json

# Extract entities from text 定义抽取实体的类
class Entities(BaseModel):
    """Identifying information about entities."""
    names: List[str] = Field(
        ...,
        description="All the person, organization, or business entities that "
        "appear in the text",
    )

    @classmethod
    def from_json(cls, json_data: dict):
        names = []
        if 'entities' in json_data:
            for entity in json_data['entities']:
                names.append(entity['entity'])
        return cls(names=names)

entity_query = "Where did Marie Curie work?"
# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Entities)

# 更新 ChatPromptTemplate 以包含新的提示词
chat_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are extracting organization and person entities from the text.",
        ),
        (
            "human",
            "Use the given format to extract information from the following "
            "input: {question}. Please ensure the output is in valid JSON format.",
        ),
    ]
)

# 创建提示模板
prompt_template = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# 格式化输入
_input = chat_prompt.format_prompt(question=entity_query)

# 获取 LLM 输出
output = llm(_input.to_string())

# 解析输出
parsed_output = json.loads(output)

# 使用自定义的 from_json 方法解析输出
entities = Entities.from_json(parsed_output)

# 获取 names 列表
names = entities.names
print(names)

② 构建结构化的检索器

def generate_full_text_query(input: str) -> str:
    """
    Generate a full-text search query for a given input string.

    This function constructs a query string suitable for a full-text search.
    It processes the input string by splitting it into words and appending a
    similarity threshold (~2 changed characters) to each word, then combines
    them using the AND operator. Useful for mapping entities from user questions
    to database values, and allows for some misspelings.
    """
    full_text_query = ""
    words = [el for el in remove_lucene_chars(input).split() if el]
    for word in words[:-1]:
        full_text_query += f" {word}~2 AND"
    full_text_query += f" {words[-1]}~2"
    return full_text_query.strip()

# Fulltext index query
def structured_retriever(names: List) -> str:
    """
    Collects the neighborhood of entities mentioned
    in the question
    """
    result = ""
    # # 前面提取到的实体
    # entities = entity_chain.invoke({"question": question})
    # entities = names
    # for entity in entities.names:
    for entity in names:
        response = graph.query(
            """CALL db.index.fulltext.queryNodes('entity', $query, {limit:2})
            YIELD node,score
            CALL {
              WITH node
              MATCH (node)-[r:!MENTIONS]->(neighbor)
              RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output
              UNION ALL
              WITH node
              MATCH (node)<-[r:!MENTIONS]-(neighbor)
              RETURN neighbor.id + ' - ' + type(r) + ' -> ' +  node.id AS output
            }
            RETURN output LIMIT 50
            """,
            {"query": generate_full_text_query(entity)},
        )
        result += "\n".join([el['output'] for el in response])
    return result

7.将非结构化检索器和结构化检索结合在一起

def retriever(question: str,names:List):
    print(f"Search query: {question}")
    structured_data = structured_retriever(names)
    unstructured_data = [el.page_content for el in vector_index.similarity_search(question)]
    final_data = f"""Structured data:
{structured_data}
Unstructured data:
{"#Document ". join(unstructured_data)}
    """
    return final_data

8.最后构建RAG链

# Condense a chat history and follow-up question into a standalone question
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question,
in its original language.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""  # noqa: E501
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

def _format_chat_history(chat_history: List[Tuple[str, str]]) -> List:
    buffer = []
    for human, ai in chat_history:
        buffer.append(HumanMessage(content=human))
        buffer.append(AIMessage(content=ai))
    return buffer

_search_query = RunnableBranch(
    # If input includes chat_history, we condense it with the follow-up question
    (
        RunnableLambda(lambda x: bool(x.get("chat_history"))).with_config(
            run_name="HasChatHistoryCheck"
        ),  # Condense follow-up question and chat into a standalone_question
        RunnablePassthrough.assign(
            chat_history=lambda x: _format_chat_history(x["chat_history"])
        )
        | CONDENSE_QUESTION_PROMPT
        | llm
        | StrOutputParser(),
    ),
    # Else, we have no chat history, so just pass through the question
    RunnableLambda(lambda x : x["question"]),
)


template = """Answer the question based only on the following context:
{context}

Question: {question}
Use natural language and be concise.
Answer:"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
    RunnableParallel(
        {
            "context": _search_query | retriever,
            "question": RunnablePassthrough(),
        }
    )
    | prompt
    | llm
    | StrOutputParser()
)

提问

chain.invoke({"question": "Where did Marie Curie work?"})

在这里插入图片描述
最后实现了整个graphRAG的链路。

参考文档:https://blog.langchain.dev/enhancing-rag-based-applications-accuracy-by-constructing-and-leveraging-knowledge-graphs/
https://blog.csdn.net/KQe397773106/article/details/138051927
源代码参考复现:https://github.com/tomasonjo/blogs/blob/master/llm/enhancing_rag_with_graph.ipynb?ref=blog.langchain.dev

  • 5
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Vue是一种用于构建用户界面的JavaScript框架,而Spring Boot是一个用于构建企业级Java应用的框架,Neo4j是一种图形数据库。想要将它们结合起来实现视图可视化,可以按照以下步骤进行: 1. 使用Vue来创建前端界面:使用Vue的组件和路由功能,可以方便地创建用户界面。可以使用Vue的模板语法和数据绑定来实现动态更新界面的功能。 2. 使用Spring Boot来创建后端服务:Spring Boot提供了丰富的功能和库,可以轻松地构建和管理后端服务。可以使用Spring Boot的依赖注入和控制反转来管理组件之间的依赖关系。 3. 使用Neo4j来存储和管理数据:Neo4j是一种图形数据库,可以存储关系型数据,并提供灵活的查询功能。可以使用Neo4j的图形查询语言来查询和操作数据库的数据。 4. 将Vue和Spring Boot连接起来:可以使用Vue的Ajax或Axios等工具来发送和接收数据。可以使用Spring Boot的REST API来处理前端请求,并与Neo4j进行交互。 5. 实现数据的可视化:根据需要,在前端界面使用合适的图表库或可视化工具来展示从后端获取的数据。可以根据数据类型选择合适的图表类型,如柱状图、折线图或饼图等。 通过以上步骤,就可以实现将Vue、Spring Boot和Neo4j结合起来,实现视图可视化的功能。前端界面使用Vue进行开发,后端服务使用Spring Boot进行构建和管理,而数据存储和查询则使用Neo4j进行处理。最终,可以通过前端界面展示从后端获取的数据,并以图表或其他形式进行可视化展示。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值