Ollama+GraphRAG 本地部署环境搭建指导

miwusenlingmiwu

已于 2024-09-06 09:15:20 修改

阅读量711

点赞数 24

文章标签： java 服务器开发语言

于 2024-09-05 11:12:20 首次发布

本文链接：https://blog.csdn.net/miwusenlingmiwu/article/details/141922573

版权

背景

GraphRAG 最近在RAG 领域非常火热，又是大厂出品，开源还免费。出于好奇，利用周末时间动手搭建了一个本地运行GraphRAG项目的试验环境。搭建过程有不少坑，在这里记录发布，以帮助需要的人.

本地环境

硬件: 2020 款 X_86_64 Mackbook pro, 4C 8G, 集成显卡.
软件: Python:3.11, ollma
LLM: mistral:7b

搭建步骤

>pip3 install graphrag # version 0.3.1
>mkdir -p ./graphrag/input # 创建输入文件夹
>curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./graphrag/input/book.txt # 拉取数据文件查尔斯·狄更斯的《圣诞颂歌》
>cd ./graphrag
>python -m graphrag.index --init #初始化工作区,或用这个也可以python -m graphrag.index --init --root ./graphrag
修改 settings.yaml # 参考文章末尾配置文件修改
修改.env 文件, 添加如下配置.(这是可选项，我加上后直接卡在了辅助声明属性抽取上)
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True

代码修改

1. 修改: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py

       注释掉：
       '''
       embedding = await self.client.embeddings.create(
            input=input,
            **args,
        )
       return [d.embedding for d in embedding.data]
       '''
      
       新增：
       import ollama
       embedding_list = []
       for inp in input:
             embedding = ollama.embeddings(model="mistral:7b",prompt=inp)
             embedding_list.append(embedding["embedding"])
       return embedding_list

2. 修改 /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/llm/oai/embedding.py

       注释掉：
       # embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
       # chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
       # chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
       # return chunk_embeddings.tolist()
       新增： 
       import ollama
       embedding = ollama.embeddings(model='mistral:7b', prompt=chunk)['embedding']
       chunk_len = len(chunk)
       return chunk_embeddings

3. 修改 /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/llm

       新增:
       tokens = token_encoder.decode(tokens) #  将tokens解码成字符串

4. /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/prompt_tune/prompt/entity_relationship.py
25 row 修改为

       Use {{record_delimiter}} as the list delimiter.

说明：微软最新的代码已经修复了这个错误.

5. Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/structured_search/local_search/search.py

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/structured_search/global_search/search.py

修改所有search_messages：

       """ 
       search_messages = [
                {"role": "system", "content": search_prompt},
                {"role": "user", "content": query},
       ]
       """
       search_messages = [ 
                {"role": "user", "content": search_prompt + "\n\n ### USER QUESTION ### \n\n" + query}
       ]

构建&测试

测试数据：小学课文<吃水不忘挖井人>. 原来的圣诞颂歌构建时间太长, 换了个短的.

>python -m graphrag.index #构建图索引. 注意执行命令的路径

构建记录截图：

>python -m graphrag.query --method local "这篇文章的主题是什么?"

报错。猜测与模型适配及训练语料太短有关.

>python -m graphrag.query --method global "毛主席与水井有啥关系?"
运行成功。但结果却很感人:( 个人猜测原因可能是训练样本太短，所用的模型又是处理英文为主的，导致没有提前到足够的信息. 有条件的可以换个对中文支持好的大模型,外加好的测试语料试试.

错误记录

说明：已经正确修改代码和启动模型服务，但依旧有如下错误

1. FilePath: "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-
packages/graspologic/partition/leiden.py"
Error:

hierarchical_clusters_native = gn.hierarchical_leiden(

^^^^^^^^^^^^^^^^^^^^^^^

leiden.EmptyNetworkError: EmptyNetworkError

错误原因：llm 模型能力不足导致。
解决办法：从qwen:0.5 切换成mistral:7b就可以。
参考链接: Issue #562
FilePath: "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/frame.py\"
Error:
ine 4299, in __setitem__\n self._setitem_array(key, value)\n File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n check_key_length(self.columns, key, value)\n File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
......

File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key
错误原因：LLM seems doesn't understand what prompt says.It may be various reasons such like LLM's max context window, or just services is not working as expect.
解决办法: 将setting.yaml 文件中的 chunks 大小的上限从1200 调整到 300或200.
参考链接: Issue 362
Error:

raise APITimeoutError(request=request) from err\nopenai.APITimeoutError: Request timed out.\n", "source": "Request timed out.", "details": {"doc_index": 0, "text": ".....

File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1568, in _request

raise APITimeoutError(request=request) from err

openai.APITimeoutError: Request timed out.

错误原因：本地机器配置不行，导致模型调用耗时远超预期
解决办法: 修改settings.yaml文件中的 timeout 超时设置。我改成了1800秒.
本地查询时报错信息

File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/lance/dataset.py", line 2704, in _coerce_query_vector

query = pa.FloatingPointArray.from_pandas(query, type=pa.float32())

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas

File "pyarrow/array.pxi", line 339, in pyarrow.lib.array

File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array

File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status

pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays

错误原因：猜测与模型适配及训练语料太短有关.
解决办法：更换更强大的模型也许能解决。

配置文件参考

encoding_model: cl100k_base

skip_workflows: []

llm:

api_key: ${GRAPHRAG_API_KEY}

# api_key: ollama

type: openai_chat # or azure_openai_chat

# model: qwen2:0.5b

model: mistral:7b

model_supports_json: true # recommended if this is available for your model.

# max_tokens: 4000

request_timeout: 1800.0

api_base: http://localhost:11434/v1

# api_version: 2024-02-15-preview

# organization: <organization_id>

# deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

# max_retries: 10

# max_retry_wait: 10.0

# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

# concurrent_requests: 25 # the number of parallel inflight requests that may be made

# temperature: 0 # temperature for sampling

# top_p: 1 # top-p sampling

# n: 1 # Number of completions to generate

parallelization:

stagger: 0.3

# num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

## parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio

# target: required # or all

llm:

api_key: ${GRAPHRAG_API_KEY}

# api_key: ollama

type: openai_embedding # or azure_openai_embedding

# model: qwen2:0.5b

model: mistral:7b

api_base: http://localhost:11434/api

# api_version: 2024-02-15-preview

# organization: <organization_id>

# deployment_name: <azure_model_deployment_name>

request_timeout: 1800.0

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

# max_retries: 10

# max_retry_wait: 10.0

# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

# concurrent_requests: 25 # the number of parallel inflight requests that may be made

# batch_size: 16 # the number of documents to send in a single request

# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request

chunks:

size: 200

overlap: 100

group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:

type: file # or blob

file_type: text # or csv

base_dir: "input"

file_encoding: utf-8

file_pattern: ".*\\.txt$"

cache:

type: file # or blob

base_dir: "cache"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

storage:

type: file # or blob

base_dir: "output/${timestamp}/artifacts"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

reporting:

type: file # or console, blob

base_dir: "output/${timestamp}/reports"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

entity_extraction:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt"

entity_types: [organization,person,geo,event]

max_gleanings: 1

summarize_descriptions:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt"

max_length: 500

claim_extraction:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

# enabled: true

prompt: "prompts/claim_extraction.txt"

description: "Any claims or facts that could be relevant to information discovery."

max_gleanings: 1

community_reports:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt"

max_length: 2000

max_input_length: 8000

cluster_graph:

max_cluster_size: 10

embed_graph:

enabled: false # if true, will generate node2vec embeddings for nodes

# num_walks: 10

# walk_length: 40

# window_size: 2

# iterations: 3

# random_seed: 597832

umap:

enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:

graphml: false

raw_entities: false

top_level_nodes: false

local_search:

# text_unit_prop: 0.5

# community_prop: 0.1

# conversation_history_max_turns: 5

# top_k_mapped_entities: 10

# top_k_relationships: 10

# llm_temperature: 0 # temperature for sampling

# llm_top_p: 1 # top-p sampling

# llm_n: 1 # Number of completions to generate

# max_tokens: 12000

global_search:

# llm_temperature: 0 # temperature for sampling

# llm_top_p: 1 # top-p sampling

# llm_n: 1 # Number of completions to generate

# max_tokens: 12000

# data_max_tokens: 12000

# map_max_tokens: 1000

# reduce_max_tokens: 2000

# concurrency: 32