记录一下部署过程,ollama最好升级到最新版本,并下载llm模型和embedding模型,支持的模型可以在 这里 查询
新建环境
python环境为3.10
到3.12
conda create -n <环境名> python=3.10
安装graphrag和相关依赖库
pip install graphrag
pip install ollama
pip install langchain_community
新建存放数据的文件夹
mkdir -p ./ragtest/input
在input文件夹里放数据文件,只支持txt
和csv
初始化
python -m graphrag.index --init --root ./ragtest
会生成一系列文件,其中settings.yaml
文件需做如下修改:
在encoding_model
部分修改model
为ollama
中下载的模型,api_base
设置地址为ollama
的发布地址
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ollama
type: openai_chat # or azure_openai_chat
model: deepseek-v2
model_supports_json: true # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
api_base: http://127.0.0.1:11434/v1
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
在embeddings
部分修改model
为ollama
中的embedding
模型,api_base
设置地址为ollama
的发布地址,注意这里的地址和上面那个地址的后缀是不一样的,这里的nomic-embed-text
是ollama
中的embedding
模型
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: nomic-embed-text
api_base: http://127.0.0.1:11434/api
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
修改graphrag源码
源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/llm/openai/openai_embeddings_llm.py
如果embedding模型不是nomic-embed-text,在这部分也需要改成自己的模型名称
......
async def _execute_llm(
self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
) -> EmbeddingOutput | None:
args = {
"model": self.configuration.model,
**(kwargs.get("model_parameters") or {}),
}
## 以下为修改内容(记得import ollama)
embedding_list = []
for inp in input:
embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
embedding_list.append(embedding["embedding"])
return embedding_list
## 以下为原始内容
# embedding = await self.client.embeddings.create(
# input=input,
# **args,
# )
# return [d.embedding for d in embedding.data]
(非常必要,能查看日志是哪里报错)源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/query/__main__.py
if __name__ == "__main__":
## 添加log输出
import logging
import sys
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[
logging.StreamHandler(sys.stdout),
],
)
源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/query/llm/oai/embedding.py
整个代码覆盖
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License
"""OpenAI Embedding model implementation."""
import asyncio
from collections.abc import Callable
from typing import Any
import numpy as np
import tiktoken
from tenacity import (
AsyncRetrying,
RetryError,
Retrying,
retry_if_exception_type,
stop_after_attempt,
wait_exponential_jitter,
)
from graphrag.query.llm.base import BaseTextEmbedding
from graphrag.query.llm.oai.base import OpenAILLMImpl
from graphrag.query.llm.oai.typing import (
OPENAI_RETRY_ERROR_TYPES,
OpenaiApiType,
)
from graphrag.query.llm.text_utils import chunk_text
from graphrag.query.progress import StatusReporter
from langchain_community.embeddings import OllamaEmbeddings
class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl):
"""Wrapper for OpenAI Embedding models."""
def __init__(
self,
api_key: str | None = None,
azure_ad_token_provider: Callable | None = None,
model: str = "text-embedding-3-small",
deployment_name: str | None = None,
api_base: str | None = None,
api_version: str | None = None,
api_type: OpenaiApiType = OpenaiApiType.OpenAI,
organization: str | None = None,
encoding_name: str = "cl100k_base",
max_tokens: int = 8191,
max_retries: int = 10,
request_timeout: float = 180.0,
retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES, # type: ignore
reporter: StatusReporter | None = None,
):
OpenAILLMImpl.__init__(
self=self,
api_key=api_key,
azure_ad_token_provider=azure_ad_token_provider,
deployment_name=deployment_name,
api_base=api_base,
api_version=api_version,
api_type=api_type, # type: ignore
organization=organization,
max_retries=max_retries,
request_timeout=request_timeout,
reporter=reporter,
)
self.model = model
self.encoding_name = encoding_name
self.max_tokens = max_tokens
self.token_encoder = tiktoken.get_encoding(self.encoding_name)
self.retry_error_types = retry_error_types
def embed(self, text: str, **kwargs: Any) -> list[float]:
"""
Embed text using OpenAI Embedding's sync function.
For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
"""
token_chunks = chunk_text(
text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
)
chunk_embeddings = []
chunk_lens = []
for chunk in token_chunks:
try:
embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
chunk_embeddings.append(embedding)
chunk_lens.append(chunk_len)
# TODO: catch a more specific exception
except Exception as e: # noqa BLE001
self._reporter.error(
message="Error embedding chunk",
details={self.__class__.__name__: str(e)},
)
continue
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
return chunk_embeddings.tolist()
async def aembed(self, text: str, **kwargs: Any) -> list[float]:
"""
Embed text using OpenAI Embedding's async function.
For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
"""
token_chunks = chunk_text(
text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
)
chunk_embeddings = []
chunk_lens = []
embedding_results = await asyncio.gather(*[
self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks
])
embedding_results = [result for result in embedding_results if result[0]]
chunk_embeddings = [result[0] for result in embedding_results]
chunk_lens = [result[1] for result in embedding_results]
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens) # type: ignore
chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
return chunk_embeddings.tolist()
def _embed_with_retry(
self, text: str | tuple, **kwargs: Any
) -> tuple[list[float], int]:
try:
retryer = Retrying(
stop=stop_after_attempt(self.max_retries),
wait=wait_exponential_jitter(max=10),
reraise=True,
retry=retry_if_exception_type(self.retry_error_types),
)
for attempt in retryer:
with attempt:
embedding = (
OllamaEmbeddings(
model=self.model,
).embed_query(text)
or []
)
return (embedding, len(text))
except RetryError as e:
self._reporter.error(
message="Error at embed_with_retry()",
details={self.__class__.__name__: str(e)},
)
return ([], 0)
else:
# TODO: why not just throw in this case?
return ([], 0)
async def _aembed_with_retry(
self, text: str | tuple, **kwargs: Any
) -> tuple[list[float], int]:
try:
retryer = AsyncRetrying(
stop=stop_after_attempt(self.max_retries),
wait=wait_exponential_jitter(max=10),
reraise=True,
retry=retry_if_exception_type(self.retry_error_types),
)
async for attempt in retryer:
with attempt:
embedding = (
await OllamaEmbeddings(
model=self.model,
).embed_query(text) or [] )
return (embedding, len(text))
except RetryError as e:
self._reporter.error(
message="Error at embed_with_retry()",
details={self.__class__.__name__: str(e)},
)
return ([], 0)
else:
# TODO: why not just throw in this case?
return ([], 0)
版本号
Package Version
------------------------- -----------
aiofiles 24.1.0
aiohttp 3.9.5
aiolimiter 1.1.0
aiosignal 1.3.1
annotated-types 0.7.0
anyio 4.4.0
anytree 2.12.1
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
autograd 1.6.2
azure-common 1.1.28
azure-core 1.30.2
azure-identity 1.17.1
azure-search-documents 11.5.0
azure-storage-blob 12.21.0
beartype 0.18.5
cachetools 5.4.0
certifi 2024.7.4
cffi 1.16.0
charset-normalizer 3.3.2
click 8.1.7
cloudpickle 3.0.0
contourpy 1.2.1
cramjam 2.8.3
cryptography 43.0.0
cycler 0.12.1
dask 2024.7.1
dask-expr 1.1.9
dataclasses-json 0.6.7
datashaper 0.0.49
decorator 5.1.1
deprecation 2.1.0
devtools 0.12.2
diskcache 5.6.3
distro 1.9.0
environs 11.0.0
exceptiongroup 1.2.2
executing 2.0.1
fastparquet 2024.5.0
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2024.6.1
future 1.0.0
gensim 4.3.3
graphrag 0.1.1
graspologic 3.4.1
graspologic-native 1.2.1
greenlet 3.0.3
h11 0.14.0
httpcore 1.0.5
httpx 0.27.0
hyppo 0.4.0
idna 3.7
importlib_metadata 8.1.0
isodate 0.6.1
joblib 1.4.2
jsonpatch 1.33
jsonpointer 3.0.0
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lancedb 0.9.0
langchain 0.2.11
langchain-community 0.2.10
langchain-core 0.2.23
langchain-text-splitters 0.2.2
langsmith 0.1.93
linkify-it-py 2.0.3
llvmlite 0.43.0
locket 1.0.0
markdown-it-py 3.0.0
marshmallow 3.21.3
matplotlib 3.9.1
mdit-py-plugins 0.4.1
mdurl 0.1.2
msal 1.30.0
msal-extensions 1.2.0
multidict 6.0.5
mypy-extensions 1.0.0
networkx 3.3
nltk 3.8.1
numba 0.60.0
numpy 1.26.4
ollama 0.3.0
openai 1.37.0
orjson 3.10.6
overrides 7.7.0
packaging 24.1
pandas 2.2.2
partd 1.4.2
patsy 0.5.6
pillow 10.4.0
pip 24.0
portalocker 2.10.1
POT 0.9.4
psutil 6.0.0
py 1.11.0
pyaml-env 1.2.1
pyarrow 15.0.0
pycparser 2.22
pydantic 2.8.2
pydantic_core 2.20.1
Pygments 2.18.0
PyJWT 2.8.0
pylance 0.13.0
pynndescent 0.5.13
pyparsing 3.1.2
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
pytz 2024.1
PyYAML 6.0.1
ratelimiter 1.2.0.post0
referencing 0.35.1
regex 2024.5.15
requests 2.32.3
retry 0.9.2
rich 13.7.1
rpds-py 0.19.0
scikit-learn 1.5.1
scipy 1.12.0
seaborn 0.13.2
setuptools 69.5.1
six 1.16.0
smart-open 7.0.4
sniffio 1.3.1
SQLAlchemy 2.0.31
statsmodels 0.14.2
swifter 1.4.0
tenacity 8.5.0
textual 0.70.0
threadpoolctl 3.5.0
tiktoken 0.7.0
toolz 0.12.1
tqdm 4.66.4
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2024.1
uc-micro-py 1.0.3
umap-learn 0.5.6
urllib3 2.2.2
uvloop 0.19.0
wheel 0.43.0
wrapt 1.16.0
yarl 1.9.4
zipp 3.19.2
构建索引和知识图谱(需要很长时间)
python -m graphrag.index --root ./ragtest
查询(只能本地查询)
python -m graphrag.query --root ./ragtest --method local "<问题>"
后续
可以尝试https://github.com/TheAiSingularity/graphrag-local-ollama
的优化
尝试过后发现用这个仓库,要改的东西也差不太多
报错处理
⠦ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.
这里的日志报错我记得有三种,不记得具体日志长啥样了:
- embedding模型的名字
nomic-embed-text
写成了下划线,有两个地方要改这个名字 - ollama的版本太低了不支持embedding模型,没法发送embedding的请求。低版本的ollama也可以正常下载embedding模型,所以最好检查一下发送embedding请求能不能正常返回,命令行直接运行下面的代码,能返回矩阵就是能正常用,否则就需要更新ollama:
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "The sky is blue because of Rayleigh scattering"
}'
- 大模型的选择,有些大模型似乎是不支持json格式(返回空值,也可能是prompt的问题),或者是回答时会在json格式前面加一点解释文字,导致报错:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
解决方法是改 prompt 或者换模型