GraphRAG如何使用ollama提供的llm model 和Embedding model服务构建本地知识库

最新推荐文章于 2025-04-16 11:17:57 发布

m0_74824865

最新推荐文章于 2025-04-16 11:17:57 发布

阅读量1.5k

点赞数 23

分类专栏：面试学习路线阿里巴巴文章标签： embedding flask python

本文链接：https://blog.csdn.net/m0_74824865/article/details/145460460

版权

使用GraphRAG踩坑无数

在GraphRAG的使用过程中将需要踩的坑都踩了一遍（不得不吐槽下，官方代码有很多遗留问题，他们自己也承认工作重心在算法的优化而不是各种模型和框架的兼容性适配性上），经过了大量的查阅各种资料以及debug过程（Indexing的过程有点费机器），最终成功运行了GraphRAG项目。先后测试了两种方式，都成功了:

使用ollama提供本地llm model和Embedding model服务
使用ollama提供llm model服务，使用lm-studio提供embedding model服务

之所以要使用ollama同时提供llm和Embedding模型服务，是因为ollama实在是太优雅了，使用超级简单，响应速度也超级快。

使用ollama提供服务的方式如下：

1、安装GraphRAG:

pip install graphrag -i https://pypi.tuna.tsinghua.edu.cn/simple

创建一个文件路径:./ragtest/input

mkdir -p ./ragtest/input
将语料文本文件放在这个路径下，文件格式为txt，注意：txt文件必须是utf-8编码的，可以用记事本打开另存为得到。
使用命令python -m graphrag.index --init --root ./ragtest初始化工程:

python -m graphrag.index --init --root ./ragtest
修改.env文件内容如下:

GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True

注意：必须加上参数GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True，否则无法生成协变量covariates，在Local Search时会出错。

修改.setting.yaml文件，内容如下:

encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama
type: openai_chat # or azure_openai_chat
model: qwen2
model_supports_json: true # recommended if this is available for your model.

max_tokens: 4000

request_timeout: 180.0

api_base: http://localhost:11434/v1/

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: ollama
type: openai_embedding # or azure_openai_embedding
model: nomic-embed-text
api_base: http://localhost:11434/api
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
…
使用ollama启动llm和Embedding服务，其中embedding 模型是nomic-embed-text:

ollama pull qwen2
ollama pull nomic-embed-text
ollama serve

修改文件:D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragllmopenaiopenai_embeddings_llm.py内容(根据大家自己安装GraphRAG的路径查找)，调用ollama服务:

import ollama

…

class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
“”“A text-embedding generator LLM.”“”

_client: OpenAIClientTypes
_configuration: OpenAIConfiguration

def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration):
    self.client = client
    self.configuration = configuration

async def _execute_llm(
    self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
    args = {
        "model": self.configuration.model,
        **(kwargs.get("model_parameters") or {}),
    }
    '''
    embedding = await self.client.embeddings.create(
        input=input,
        **args,
    )
    return [d.embedding for d in embedding.data]
    '''
    embedding_list = []
    for inp in input:
        embedding = ollama.embeddings(model="nomic-embed-text",prompt=inp)
        embedding_list.append(embedding["embedding"])
    return embedding_list

上面注释部分为官方原始代码，增加的代码是:

        embedding_list = []
        for inp in input:
            embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list

修改文件：D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragqueryllmoaiembedding.py, 调用ollama提供的模型服务，代码位置在:

import ollama
#…

embedding = ollama.embeddings(model=‘nomic-embed-text’, prompt=chunk)[‘embedding’]

在这里插入图片描述
上面注释的是官方代码，箭头指向的是要新增的代码。

修改文件:D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragqueryllm ext_utils.py里关于chunk_text()函数的定义:

def chunk_text(
    text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
):
    """Chunk text by token length."""
    if token_encoder is None:
        token_encoder = tiktoken.get_encoding("cl100k_base")
    tokens = token_encoder.encode(text)  # type: ignore
    tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

    chunk_iterator = batched(iter(tokens), max_tokens)
    yield from chunk_iterator

增加的语句是:

tokens = token_encoder.decode(tokens) # 将tokens解码成字符串

这里应该是GraphRAG官方代码里的bug，开发人员忘记将分词后的token解码成字符串，导致在后续Embedding处理过程中会报错：ZeroDivisionError: Weights sum to zero, can't be normalized

(graphrag) D:LearnGraphRAG>python -m graphrag.query --root ./newTest12 --method local "谁是叶文洁"


INFO: Reading settings from newTest12settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic-ai/nomic-embed-text-v1.5/nomic-embed-text-v1.5.Q8_0.gguf', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:1234/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 1}
Error embedding chunk {'OpenAIEmbedding': 'Error code: 400 - {'error': "'input' field must be a string or an array of strings"}'}
Traceback (most recent call last):
  File "D:ProgramDataminiconda3envsgraphraglib
unpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:ProgramDataminiconda3envsgraphraglib
unpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquery__main__.py", line 76, in <module>
    run_local_search(
  File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerycli.py", line 153, in run_local_search
    result = search_engine.search(query=query)
  File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerystructured_searchlocal_searchsearch.py", line 118, in search
    context_text, context_records = self.context_builder.build_context(
  File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerystructured_searchlocal_searchmixed_context.py", line 139, in build_context
    selected_entities = map_query_to_entities(
  File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerycontext_builderentity_extraction.py", line 55, in map_query_to_entities
    search_results = te