使用GraphRAG踩坑无数
在GraphRAG的使用过程中将需要踩的坑都踩了一遍(不得不吐槽下,官方代码有很多遗留问题,他们自己也承认工作重心在算法的优化而不是各种模型和框架的兼容性适配性上),经过了大量的查阅各种资料以及debug过程(Indexing的过程有点费机器),最终成功运行了GraphRAG项目。先后测试了两种方式,都成功了:
- 使用ollama提供本地llm model和Embedding model服务
- 使用ollama提供llm model服务,使用lm-studio提供embedding model服务
之所以要使用ollama同时提供llm和Embedding模型服务,是因为ollama实在是太优雅了,使用超级简单,响应速度也超级快。
使用ollama提供服务的方式如下:
1、安装GraphRAG:
pip install graphrag -i https://pypi.tuna.tsinghua.edu.cn/simple
-
创建一个文件路径:
./ragtest/input
mkdir -p ./ragtest/input
-
将语料文本文件放在这个路径下, 文件格式为txt, 注意:txt文件必须是
utf-8
编码的,可以用记事本打开另存为得到。 -
使用命令
python -m graphrag.index --init --root ./ragtest
初始化工程:python -m graphrag.index --init --root ./ragtest
-
修改
.env
文件内容如下:GRAPHRAG_API_KEY=ollama
GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True
注意:必须加上参数GRAPHRAG_CLAIM_EXTRACTION_ENABLED=True
,否则无法生成协变量covariates, 在Local Search时会出错。
-
修改.
setting.yaml
文件,内容如下:encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama
type: openai_chat # or azure_openai_chat
model: qwen2
model_supports_json: true # recommended if this is available for your model.max_tokens: 4000
request_timeout: 180.0
api_base: http://localhost:11434/v1/
api_version: 2024-02-15-preview
organization: <organization_id>
deployment_name: <azure_model_deployment_name>
tokens_per_minute: 150_000 # set a leaky bucket throttle
requests_per_minute: 10_000 # set a leaky bucket throttle
max_retries: 10
max_retry_wait: 10.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ollama
type: openai_embedding # or azure_openai_embedding
model: nomic-embed-text
api_base: http://localhost:11434/api
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
… -
使用ollama启动llm和Embedding服务,其中embedding 模型是
nomic-embed-text
:ollama pull qwen2
ollama pull nomic-embed-text
ollama serve -
修改文件:
D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragllmopenaiopenai_embeddings_llm.py
内容(根据大家自己安装GraphRAG的路径查找),调用ollama服务:import ollama
…
class OpenAIEmbeddingsLLM(BaseLLM[EmbeddingInput, EmbeddingOutput]):
“”“A text-embedding generator LLM.”“”_client: OpenAIClientTypes _configuration: OpenAIConfiguration def __init__(self, client: OpenAIClientTypes, configuration: OpenAIConfiguration): self.client = client self.configuration = configuration async def _execute_llm( self, input: EmbeddingInput, **kwargs: Unpack[LLMInput] ) -> EmbeddingOutput | None: args = { "model": self.configuration.model, **(kwargs.get("model_parameters") or {}), } ''' embedding = await self.client.embeddings.create( input=input, **args, ) return [d.embedding for d in embedding.data] ''' embedding_list = [] for inp in input: embedding = ollama.embeddings(model="nomic-embed-text",prompt=inp) embedding_list.append(embedding["embedding"]) return embedding_list
上面注释部分为官方原始代码,增加的代码是:
embedding_list = []
for inp in input:
embedding = ollama.embedding(model="nomic-embed-text",prompt=inp)
embedding_list.append(embedding["embedding"])
return embedding_list
-
修改文件:
D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragqueryllmoaiembedding.py
, 调用ollama提供的模型服务, 代码位置在:import ollama
#…embedding = ollama.embeddings(model=‘nomic-embed-text’, prompt=chunk)[‘embedding’]
上面注释的是官方代码,箭头指向的是要新增的代码。
- 修改文件:
D:ProgramDataminiconda3envsgraphRAGLibsite-packagesgraphragqueryllm ext_utils.py
里关于chunk_text()
函数的定义:
def chunk_text(
text: str, max_tokens: int, token_encoder: tiktoken.Encoding | None = None
):
"""Chunk text by token length."""
if token_encoder is None:
token_encoder = tiktoken.get_encoding("cl100k_base")
tokens = token_encoder.encode(text) # type: ignore
tokens = token_encoder.decode(tokens) # 将tokens解码成字符串
chunk_iterator = batched(iter(tokens), max_tokens)
yield from chunk_iterator
增加的语句是:
tokens = token_encoder.decode(tokens) # 将tokens解码成字符串
这里应该是GraphRAG官方代码里的bug,开发人员忘记将分词后的token解码成字符串,导致在后续Embedding处理过程中会报错:ZeroDivisionError: Weights sum to zero, can't be normalized
(graphrag) D:LearnGraphRAG>python -m graphrag.query --root ./newTest12 --method local "谁是叶文洁"
INFO: Reading settings from newTest12settings.yaml
creating llm client with {'api_key': 'REDACTED,len=6', 'type': "openai_chat", 'model': 'qwen2', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:11434/v1/', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25}
creating embedding llm client with {'api_key': 'REDACTED,len=9', 'type': "openai_embedding", 'model': 'nomic-ai/nomic-embed-text-v1.5/nomic-embed-text-v1.5.Q8_0.gguf', 'max_tokens': 4000, 'temperature': 0, 'top_p': 1, 'n': 1, 'request_timeout': 180.0, 'api_base': 'http://localhost:1234/v1', 'api_version': None, 'organization': None, 'proxy': None, 'cognitive_services_endpoint': None, 'deployment_name': None, 'model_supports_json': None, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 1}
Error embedding chunk {'OpenAIEmbedding': 'Error code: 400 - {'error': "'input' field must be a string or an array of strings"}'}
Traceback (most recent call last):
File "D:ProgramDataminiconda3envsgraphraglib
unpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:ProgramDataminiconda3envsgraphraglib
unpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquery__main__.py", line 76, in <module>
run_local_search(
File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerycli.py", line 153, in run_local_search
result = search_engine.search(query=query)
File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerystructured_searchlocal_searchsearch.py", line 118, in search
context_text, context_records = self.context_builder.build_context(
File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerystructured_searchlocal_searchmixed_context.py", line 139, in build_context
selected_entities = map_query_to_entities(
File "D:ProgramDataminiconda3envsgraphraglibsite-packagesgraphragquerycontext_builderentity_extraction.py", line 55, in map_query_to_entities
search_results = te