GraphRAG 替换本地模型报错解决思路

最新推荐文章于 2024-08-29 10:23:36 发布

Sky blue water

最新推荐文章于 2024-08-29 10:23:36 发布

阅读量809

点赞数 9

分类专栏： GraphRAG 踩的坑文章标签： knowledge graph graphql 语言模型

本文链接：https://blog.csdn.net/qq_43513268/article/details/141428327

版权

GraphRAG 踩的坑专栏收录该内容

1 篇文章 0 订阅

订阅专栏

create_base_entity_graph 报错解决

如果有人在更换大模型为本地模型的过程中，碰到类似

ERROR Error executing verb “cluster_graph” in
create_base_entity_graph: EmptyNetworkError

的问题，可以参考 graphrag issue 515 的解决方案，具体而言，就是将

graphrag/prompt_tune/prompt/entity_relationship.py

中

Return output in {language} as a single list of all the entities and relationships identified in steps 1 and 2. Use **{{record_delimiter}}** as the list delimiter.

这句 prompt record_delimeter 周边的 ** 删除。

然后就能正常运行了，感觉是本地小模型的理解能力不足，出现的这个问题

模型本地替换

本地替换的 LLM 是 GLM4-1M-Chat，替换的 embedding-model 是 bge-base-en-v1.5。
具体而言，如果要继续使用 openai-API 调用的形式，需要把 LLM 和 embedding-model 都封装成 openai-API 调用。

封装方法

LLM： LLM 可以使用 llama-factory 的方式封装为 openai API 服务，具体而言命令为 llamafactory-cli api examples/inference/glm_vllm.yaml，同时在 src/llamafactory/api/app.py 中修改 端口号 为自定义端口号即可，具体而言，glm_vllm.yaml 可以如下写

model_name_or_path: <YourModelPath>
template: llama3
infer_backend: vllm
vllm_enforce_eager: true

embedding_model：embedding_model 可以使用 fastchat 封装为 openai API 服务，这里使用的 fschat 版本为 0.2.35，运行具体而言，可以参考如下 shell 配置

#!/bin/bash

python -m fastchat.serve.controller --host 0.0.0.0 --port 21003 > <YourLogFile> 2>&1 &

python -m fastchat.serve.model_worker --model-path <YourModelPath> --model-names <YourModelName, better use gpt-4> --num-gpus 2 --controller-address http://0.0.0.0:21003 > <YourLogFile> 2>&1 &

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port <YourServicePort> --controller-address http://0.0.0.0:21003

GraphRAG 项目 settings.yaml 配置

在生成 ragtest 等目录后的 settings.yaml 的文件中，主要需要修改以下两处配置

llm：需要将 api_base 替换为自己的服务地址和端口
embeddings：需要将 api_base 替换为自己的服务地址和端口，同时注意这里的模型名称最好选择 gpt-4

llm:
  api_key: ${GRAPHRAG_API_KEY}
  type: openai_chat # or azure_openai_chat
  model: gpt-4
  model_supports_json: false # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://<IP:Port>/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 2 # the number of parallel inflight requests that may be made
  # temperature: 0 # temperature for sampling
  top_p: 0.9 # top-p sampling
  # n: 1 # Number of completions to generate

parallelization:
  stagger: 0.3
  # num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: gpt-4
    api_base: http://<IP:Port>/v1
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    concurrent_requests: 1 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional