GraphRAG入门

CarlowZJ

已于 2025-05-13 21:46:33 修改

阅读量24

点赞数

分类专栏： AI开发框架文章标签： python 数据库开发语言 GraphRAG

于 2025-05-09 22:51:20 首次发布

原文链接：https://microsoft.github.io/graphrag/get_started/

版权

AI开发框架专栏收录该内容

18 篇文章

订阅专栏

Getting Started 入门

Requirements 要求

Install GraphRAG 安装 GraphRAG

Running the Indexer 运行索引器

Set Up Your Workspace Variables设置工作区变量

Using OpenAI 使用 OpenAI

Using Azure OpenAI 使用 Azure OpenAI

Using Managed Auth on Azure在 Azure 上使用托管身份验证

Running the Indexing pipeline运行索引管道

Using the Query Engine使用查询引擎

Going Deeper 深入了解

Getting Started 入门

Requirements 要求

Python 3.10-3.12

To get started with the GraphRAG system, you have a few options:
要开始使用 GraphRAG 系统，您有以下几个选择：

👉 Use the GraphRAG Accelerator solution
👉 使用 GraphRAG Accelerator 解决方案
👉 Install from pypi.
👉 从 pypi 安装。
👉 Use it from source
👉 从源头使用它

The following is a simple end-to-end example for using the GraphRAG system, using the install from pypi option.
以下是使用 GraphRAG 系统的简单端到端示例，使用从 pypi 安装选项。

It shows how to use the system to index some text, and then use the indexed data to answer questions about the documents.
它展示了如何使用系统索引一些文本，然后使用索引数据来回答有关文档的问题。

Install GraphRAG 安装 GraphRAG

pip install graphrag

Running the Indexer 运行索引器

We need to set up a data project and some initial configuration. First let's get a sample dataset ready:
我们需要建立一个数据项目并进行一些初始配置。首先，让我们准备一个示例数据集：

mkdir -p ./ragtest/input

Get a copy of A Christmas Carol by Charles Dickens from a trusted source:
从可靠来源获取查尔斯·狄更斯的《圣诞颂歌》副本：

curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragtest/input/book.txt

Set Up Your Workspace Variables
设置工作区变量

To initialize your workspace, first run the graphrag init command. Since we have already configured a directory named ./ragtest in the previous step, run the following command:
要初始化工作区，首先运行 graphrag init 命令。由于我们已经在上一步中配置了一个名为 ./ragtest 的目录，因此请运行以下命令：

graphrag init --root ./ragtest

This will create two files: .env and settings.yaml in the ./ragtest directory.
这将在 ./ragtest 目录中创建两个文件： .env 和 settings.yaml 。

.env contains the environment variables required to run the GraphRAG pipeline. If you inspect the file, you'll see a single environment variable defined, GRAPHRAG_API_KEY=<API_KEY>. Replace <API_KEY> with your own OpenAI or Azure API key.
.env 包含运行 GraphRAG 管道所需的环境变量。检查该文件，你会看到定义了一个环境变量， GRAPHRAG_API_KEY=<API_KEY> . 将 <API_KEY> 替换为你自己的 OpenAI 或 Azure API 密钥。
settings.yaml contains the settings for the pipeline. You can modify this file to change the settings for the pipeline.
settings.yaml 包含流水线的设置。您可以修改此文件来更改流水线的设置。

Using OpenAI 使用 OpenAI

If running in OpenAI mode, you only need to update the value of GRAPHRAG_API_KEY in the .env file with your OpenAI API key.
如果在 OpenAI 模式下运行，您只需使用 OpenAI API 密钥更新 .env 文件中 GRAPHRAG_API_KEY 的值。

Using Azure OpenAI 使用 Azure OpenAI

In addition to setting your API key, Azure OpenAI users should set the variables below in the settings.yaml file. To find the appropriate sections, just search for the models: root configuration; you should see two sections, one for the default chat endpoint and one for the default embeddings endpoint. Here is an example of what to add to the chat model config:
除了设置 API 密钥外，Azure OpenAI 用户还应在 settings.yaml 文件中设置以下变量。要查找相应的部分，只需搜索 models: root 配置；您应该会看到两个部分，一个用于默认聊天端点，另一个用于默认嵌入端点。以下是添加到聊天模型配置的示例：

type: azure_openai_chat # Or azure_openai_embedding for embeddings
api_base: https://<instance>.openai.azure.com
api_version: 2024-02-15-preview # You can customize this for other versions
deployment_name: <azure_model_deployment_name>

Using Managed Auth on Azure
在 Azure 上使用托管身份验证

To use managed auth, add an additional value to your model config and comment out or remove the api_key line:
要使用托管身份验证，请在模型配置中添加附加值并注释掉或删除 api_key 行：

auth_type: azure_managed_identity # Default auth_type is is api_key
# api_key: ${GRAPHRAG_API_KEY}

You will also need to login with az login and select the subscription with your endpoint.
您还需要使用 az login 登录并选择您的端点的订阅。

Running the Indexing pipeline
运行索引管道

Finally we'll run the pipeline!
最后我们将运行管道！

graphrag index --root ./ragtest

pipeline executing from the CLI

This process will take some time to run. This depends on the size of your input data, what model you're using, and the text chunk size being used (these can be configured in your settings.yaml file). Once the pipeline is complete, you should see a new folder called ./ragtest/output with a series of parquet files.
此过程需要一些时间。这取决于输入数据的大小、使用的模型以及所使用的文本块大小（这些可以在 settings.yaml 文件中配置）。管道完成后，您应该会看到一个名为 ./ragtest/output 的新文件夹，其中包含一系列 parquet 文件。

Using the Query Engine
使用查询引擎

Now let's ask some questions using this dataset.
现在让我们使用这个数据集提出一些问题。

Here is an example using Global search to ask a high-level question:
以下是使用全局搜索提出高级问题的示例：

graphrag query \
--root ./ragtest \
--method global \
--query "What are the top themes in this story?"

Here is an example using Local search to ask a more specific question about a particular character:
以下是使用本地搜索询问有关特定角色的更具体问题的示例：

graphrag query \
--root ./ragtest \
--method local \
--query "Who is Scrooge and what are his main relationships?"

Please refer to Query Engine docs for detailed information about how to leverage our Local and Global search mechanisms for extracting meaningful insights from data after the Indexer has wrapped up execution.
请参阅查询引擎文档以获取有关如何在索引器完成执行后利用我们的本地和全局搜索机制从数据中提取有意义的见解的详细信息。

Going Deeper 深入了解

For more details about configuring GraphRAG, see the configuration documentation.
有关配置 GraphRAG 的更多详细信息，请参阅配置文档。
To learn more about Initialization, refer to the Initialization documentation.
要了解有关初始化的更多信息，请参阅初始化文档。
For more details about using the CLI, refer to the CLI documentation.
有关使用 CLI 的更多详细信息，请参阅 CLI 文档。
Check out our visualization guide for a more interactive experience in debugging and exploring the knowledge graph.
查看我们的可视化指南，以获得调试和探索知识图谱的更具互动性的体验。