153 PropertyGraphIndex 深入解析`__init_

本文链接：https://blog.csdn.net/xycxycooo/article/details/142550097

Llama Index中的属性图索引：深入解析`init`方法

在现代数据科学和人工智能领域，属性图（Property Graph）已成为处理复杂信息的重要工具。属性图通过结构化的方式表示实体及其关系，使得信息的检索和理解变得更加高效。本文将深入探讨Llama Index中的PropertyGraphIndex类的__init__方法，帮助程序员全面理解其工作原理及实际应用。

前置知识

在开始之前，确保你具备以下基础知识：

Python基础：熟悉Python编程。
OpenAI API密钥：你需要一个OpenAI API密钥来使用OpenAI模型。
Llama Index：使用pip install llama-index安装Llama Index库。

环境设置

首先，让我们通过安装所需的包并配置OpenAI API密钥来设置环境。

# 安装Llama Index
%pip install llama-index

# 设置OpenAI API密钥
import os
os.environ["OPENAI_API_KEY"] = "sk-..."

# 配置日志
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

`PropertyGraphIndex`的`init`方法

PropertyGraphIndex的__init__方法负责初始化属性图索引对象。它接受多个参数，包括节点、语言模型、属性图存储、向量存储等，并根据这些参数配置索引对象。

参数说明

参数名	类型	描述	默认值
nodes	Optional[Sequence[BaseNode]]	要插入索引的节点列表。	None
llm	Optional[LLM]	用于提取三元组的语言模型。默认使用Settings.llm。	None
kg_extractors	Optional[List[TransformComponent]]	用于提取三元组的转换组件列表。默认使用[SimpleLLMPathExtractor(llm=llm), ImplicitEdgeExtractor()]。	None
property_graph_store	Optional[PropertyGraphStore]	要使用的属性图存储。如果未提供，将创建一个新的SimplePropertyGraphStore。	None
vector_store	Optional[BasePydanticVectorStore]	如果图存储不支持向量查询，则使用的向量存储索引。	None
use_async	bool	是否使用异步进行转换。默认值为True。	True
embed_model	Optional[EmbedType]	用于嵌入节点的嵌入模型。如果未提供且embed_kg_nodes=True，则使用Settings.embed_model。	None
embed_kg_nodes	bool	是否嵌入KG节点。默认值为True。	True
callback_manager	Optional[CallbackManager]	要使用的回调管理器。	None
transformations	Optional[List[TransformComponent]]	在插入节点之前应用于节点的转换列表。这些转换在kg_extractors之前应用。	None
storage_context	Optional[StorageContext]	要使用的存储上下文。	None
show_progress	bool	是否显示转换的进度条。默认值为False。	False

代码解析

def __init__(
    self,
    nodes: Optional[Sequence[BaseNode]] = None,
    llm: Optional[LLM] = None,
    kg_extractors: Optional[List[TransformComponent]] = None,
    property_graph_store: Optional[PropertyGraphStore] = None,
    # vector related params
    vector_store: Optional[BasePydanticVectorStore] = None,
    use_async: bool = True,
    embed_model: Optional[EmbedType] = None,
    embed_kg_nodes: bool = True,
    # parent class params
    callback_manager: Optional[CallbackManager] = None,
    transformations: Optional[List[TransformComponent]] = None,
    storage_context: Optional[StorageContext] = None,
    show_progress: bool = False,
    **kwargs: Any,
) -> None:
    """Init params."""
    storage_context = storage_context or StorageContext.from_defaults(
        property_graph_store=property_graph_store
    )

    # lazily initialize the graph store on the storage context
    if property_graph_store is not None:
        storage_context.property_graph_store = property_graph_store
    elif storage_context.property_graph_store is None:
        storage_context.property_graph_store = SimplePropertyGraphStore()

    if vector_store is not None:
        storage_context.vector_stores[DEFAULT_VECTOR_STORE] = vector_store

    if embed_kg_nodes and (
        storage_context.property_graph_store.supports_vector_queries
        or embed_kg_nodes
    ):
        self._embed_model = (
            resolve_embed_model(embed_model)
            if embed_model
            else Settings.embed_model
        )
    else:
        self._embed_model = None  # type: ignore

    self._kg_extractors = kg_extractors or [
        SimpleLLMPathExtractor(llm=llm or Settings.llm),
        ImplicitPathExtractor(),
    ]
    self._use_async = use_async
    self._llm = llm
    self._embed_kg_nodes = embed_kg_nodes
    self._override_vector_store = (
        vector_store is not None
        or not storage_context.property_graph_store.supports_vector_queries
    )

    super().__init__(
        nodes=nodes,
        callback_manager=callback_manager,
        storage_context=storage_context,
        transformations=transformations,
        show_progress=show_progress,
        **kwargs,
    )

设计思路

初始化存储上下文：
- 如果storage_context未提供，则使用默认的存储上下文。
- 如果property_graph_store未提供，则创建一个新的SimplePropertyGraphStore。
配置向量存储：
- 如果vector_store已提供，则将其添加到存储上下文的向量存储中。
配置嵌入模型：
- 如果embed_kg_nodes为True且属性图存储支持向量查询，则使用提供的嵌入模型或默认的嵌入模型。
配置三元组提取器：
- 使用默认的三元组提取器（SimpleLLMPathExtractor和ImplicitPathExtractor）或用户提供的提取器。
配置异步和语言模型：
- 设置是否使用异步进行转换。
- 设置语言模型。
调用父类初始化方法：
- 调用父类的初始化方法，传递节点、回调管理器、存储上下文、转换和进度条显示等参数。

代码示例

from llama_index.core import PropertyGraphIndex, StorageContext
from llama_index.core.graph_stores import SimplePropertyGraphStore
from llama_index.llms.openai import OpenAI

# 定义LLM
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")

# 创建属性图存储
property_graph_store = SimplePropertyGraphStore()
storage_context = StorageContext.from_defaults(graph_store=property_graph_store)

# 创建PropertyGraphIndex
pg_index = PropertyGraphIndex(
    storage_context=storage_context,
    llm=llm,
    embed_kg_nodes=True,
    show_progress=True,
)