Ollama+GraphRAG 本地部署环境搭建指导

背景

GraphRAG 最近在RAG 领域非常火热,又是大厂出品,开源还免费。出于好奇,利用周末时间动手搭建了一个本地运行GraphRAG项目的试验环境。搭建过程有不少坑,在这里记录发布,以帮助需要的人. 

本地环境

  • 硬件: 2020 款 X_86_64 Mackbook pro, 4C 8G, 集成显卡.
  • 软件: Python:3.11, ollma
  • LLM: mistral:7b

搭建步骤

  1. >pip3 install graphrag             # version 0.3.1
  2. >mkdir -p ./graphrag/input      # 创建输入文件夹
  3. >curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt > ./graphrag/input/book.txt   # 拉取数据文件查尔斯·狄更斯的《圣诞颂歌》
  4. >cd ./graphrag
  5. >python -m graphrag.index --init    #初始化工作区,或用这个也可以python -m graphrag.index --init --root ./graphrag
  6. 修改 settings.yaml      # 参考文章末尾配置文件修改
  7. 修改.env 文件, 添加如下配置.(这是可选项,我加上后直接卡在了辅助声明属性抽取上)
    GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True            

代码修改

1. 修改: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/llm/openai/openai_embeddings_llm.py

       注释掉:
       '''
       embedding = await self.client.embeddings.create(
            input=input,
            **args,
        )
       return [d.embedding for d in embedding.data]
       '''
      
       新增:
       import ollama
       embedding_list = []
       for inp in input:
             embedding = ollama.embeddings(model="mistral:7b",prompt=inp)
             embedding_list.append(embedding["embedding"])
       return embedding_list


      
     2. 修改 /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/llm/oai/embedding.py
         

       注释掉:
       # embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
       # chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
       # chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
       # return chunk_embeddings.tolist()
       新增: 
       import ollama
       embedding = ollama.embeddings(model='mistral:7b', prompt=chunk)['embedding']
       chunk_len = len(chunk)
       return chunk_embeddings


     3. 修改 /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/llm

       新增:
       tokens = token_encoder.decode(tokens) #  将tokens解码成字符串


      4.  /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/prompt_tune/prompt/entity_relationship.py
          25 row 修改为 

       Use {{record_delimiter}} as the list delimiter. 


          说明: 微软最新的代码已经修复了这个错误. 

      5. Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/structured_search/local_search/search.py

          /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/graphrag/query/structured_search/global_search/search.py

          修改所有search_messages:

       """ 
       search_messages = [
                {"role": "system", "content": search_prompt},
                {"role": "user", "content": query},
       ]
       """
       search_messages = [ 
                {"role": "user", "content": search_prompt + "\n\n ### USER QUESTION ### \n\n" + query}
       ]

构建&测试

          测试数据:小学课文<吃水不忘挖井人>. 原来的圣诞颂歌构建时间太长, 换了个短的.

          >python -m graphrag.index  #构建图索引. 注意执行命令的路径

          构建记录截图:

          ​​​​

          >python -m graphrag.query  --method local "这篇文章的主题是什么?"

          报错。猜测与模型适配及训练语料太短有关. 

          >python -m graphrag.query  --method global "毛主席与水井有啥关系?"
          运行成功。但结果却很感人:(  个人猜测原因可能是训练样本太短,所用的模型又是处理英文为主的,导致没有提前到足够的信息. 有条件的可以换个对中文支持好的大模型,外加好的测试语料试试. 

错误记录 

    说明:已经正确修改代码和启动模型服务,但依旧有如下错误

    1.  FilePath:  "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-
         packages/graspologic/partition/leiden.py"
         Error:

  1. hierarchical_clusters_native = gn.hierarchical_leiden(

                                       ^^^^^^^^^^^^^^^^^^^^^^^

    leiden.EmptyNetworkError: EmptyNetworkError

    错误原因:llm 模型能力不足导致。
    解决办法:从qwen:0.5 切换成mistral:7b就可以。
    参考链接:   Issue #562
  2. FilePath: "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/frame.py\"
    Error:  
  3. ine 4299, in __setitem__\n    self._setitem_array(key, value)\n  File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/frame.py\", line 4341, in _setitem_array\n    check_key_length(self.columns, key, value)\n  File \"/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/indexers/utils.py\", line 390, in check_key_length\n    raise ValueError(\"Columns must be same length as key\")\nValueError: Columns must be same length as key\n", "source": "Columns must be same length as key", "details": null}
    ......

    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length raise ValueError("Columns must be same length as key") ValueError: Columns must be same length as key

  4. 错误原因:LLM seems doesn't understand what prompt says.It may be various reasons such like LLM's max context window, or just services is not working as expect.
    解决办法: 将setting.yaml 文件中的 chunks 大小的上限从1200 调整到 300或200.
    参考链接: Issue 362

  5. Error:

    raise APITimeoutError(request=request) from err\nopenai.APITimeoutError: Request timed out.\n", "source": "Request timed out.", "details": {"doc_index": 0, "text": ".....

    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1568, in _request

        raise APITimeoutError(request=request) from err

    openai.APITimeoutError: Request timed out.

    错误原因:本地机器配置不行,导致模型调用耗时远超预期
    解决办法:  修改settings.yaml文件中的 timeout 超时设置。 我改成了1800秒.
  6. 本地查询时报错信息

    File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/lance/dataset.py", line 2704, in _coerce_query_vector

        query = pa.FloatingPointArray.from_pandas(query, type=pa.float32())

                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

      File "pyarrow/array.pxi", line 1115, in pyarrow.lib.Array.from_pandas

      File "pyarrow/array.pxi", line 339, in pyarrow.lib.array

      File "pyarrow/array.pxi", line 85, in pyarrow.lib._ndarray_to_array

      File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status

    pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays

    错误原因:猜测与模型适配及训练语料太短有关. 
    解决办法:更换更强大的模型也许能解决。

配置文件参考

encoding_model: cl100k_base

skip_workflows: []

llm:

api_key: ${GRAPHRAG_API_KEY}

# api_key: ollama

type: openai_chat # or azure_openai_chat

# model: qwen2:0.5b

model: mistral:7b

model_supports_json: true # recommended if this is available for your model.

# max_tokens: 4000

request_timeout: 1800.0

api_base: http://localhost:11434/v1

# api_version: 2024-02-15-preview

# organization: <organization_id>

# deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

# max_retries: 10

# max_retry_wait: 10.0

# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

# concurrent_requests: 25 # the number of parallel inflight requests that may be made

# temperature: 0 # temperature for sampling

# top_p: 1 # top-p sampling

# n: 1 # Number of completions to generate

 

parallelization:

stagger: 0.3

# num_threads: 50 # the number of threads to use for parallel processing

 

async_mode: threaded # or asyncio

 

embeddings:

## parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio

# target: required # or all

llm:

api_key: ${GRAPHRAG_API_KEY}

# api_key: ollama

type: openai_embedding # or azure_openai_embedding

# model: qwen2:0.5b

model: mistral:7b

api_base: http://localhost:11434/api

# api_version: 2024-02-15-preview

# organization: <organization_id>

# deployment_name: <azure_model_deployment_name>

request_timeout: 1800.0

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

# max_retries: 10

# max_retry_wait: 10.0

# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

# concurrent_requests: 25 # the number of parallel inflight requests that may be made

# batch_size: 16 # the number of documents to send in a single request

# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request

 

chunks:

size: 200

overlap: 100

group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:

type: file # or blob

file_type: text # or csv

base_dir: "input"

file_encoding: utf-8

file_pattern: ".*\\.txt$"

 

cache:

type: file # or blob

base_dir: "cache"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

 

storage:

type: file # or blob

base_dir: "output/${timestamp}/artifacts"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

 

reporting:

type: file # or console, blob

base_dir: "output/${timestamp}/reports"

# connection_string: <azure_blob_storage_connection_string>

# container_name: <azure_blob_storage_container_name>

 

entity_extraction:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt"

entity_types: [organization,person,geo,event]

max_gleanings: 1

 

summarize_descriptions:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt"

max_length: 500

 

claim_extraction:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

# enabled: true

prompt: "prompts/claim_extraction.txt"

description: "Any claims or facts that could be relevant to information discovery."

max_gleanings: 1

 

community_reports:

## llm: override the global llm settings for this task

## parallelization: override the global parallelization settings for this task

## async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt"

max_length: 2000

max_input_length: 8000

 

cluster_graph:

max_cluster_size: 10

 

embed_graph:

enabled: false # if true, will generate node2vec embeddings for nodes

# num_walks: 10

# walk_length: 40

# window_size: 2

# iterations: 3

# random_seed: 597832

 

umap:

enabled: false # if true, will generate UMAP embeddings for nodes

 

snapshots:

graphml: false

raw_entities: false

top_level_nodes: false

 

local_search:

# text_unit_prop: 0.5

# community_prop: 0.1

# conversation_history_max_turns: 5

# top_k_mapped_entities: 10

# top_k_relationships: 10

# llm_temperature: 0 # temperature for sampling

# llm_top_p: 1 # top-p sampling

# llm_n: 1 # Number of completions to generate

# max_tokens: 12000

 

global_search:

# llm_temperature: 0 # temperature for sampling

# llm_top_p: 1 # top-p sampling

# llm_n: 1 # Number of completions to generate

# max_tokens: 12000

# data_max_tokens: 12000

# map_max_tokens: 1000

# reduce_max_tokens: 2000

# concurrency: 32


 

参考文档

官方参考文档: Configuration Template Prompt Tuning ⚙️ 

数据文件地址:使用查询引擎 | GraphRAG:中文文档教程,助力大模型LLM应用开发从入门到精通

使用介绍参考 ollama轻松部署本地GraphRAG(避雷篇)_graphrag ollama-CSDN博客  傻瓜操作:GraphRAG、Ollama 本地部署及踩坑记录-CSDN博客 

【个人经验】GraphRAG+Ollama 本地部署 已跑通!_errors occurred during the pipeline run, see logs -CSDN博客 GraphRAG本地运行(Ollama的LLM接口+Xinference的embedding模型)无需gpt的api_graphrag ollama-CSDN博客

嵌入模型的说明:六、OpenAI之嵌入式(Embedding)_text-embedding-3-small-CSDN博客

原理说明:深入Microsoft GraphRAG之索引阶段:原理、测试及如何集成到Neo4j图数据库 - 文章 - 开发者社区 - 火山引擎

提示词说明: Community Reports 提示词中文版 | GraphRAG:中文文档教程,助力大模型LLM应用开发从入门到精通

总结

本着一切从简的原则,没有使用conda, poetry等包管理工具。GraphRAG也是用的官方的安装包,没有采用第三方修改过的版本。

没有用梯子和付费OpenApi 服务。直接本地基于Ollama 跑 llm 提供推理和向量服务。中间试过好多模型,如qwen2:0.5b, qwen2:1.5b, Gemma2:2b, 但都没成功。后来切换到msitral:7b模型总算出了坑。所以最好的工程经验就是多多尝试。

受制于硬件,embeding 模型没有采用第三方模型,还是用llm, 以节省资源.

因本人笔记本性能原因,整个测试过程非常耗时,熬夜数日才最终跑成功。如果谁一步跑成功,祝贺你天赋异禀 :)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值