本地部署ollama+graphrag

sahye_plinnae

已于 2024-07-31 12:22:48 修改

阅读量435

点赞数 1

文章标签： python

于 2024-07-24 00:06:02 首次发布

本文链接：https://blog.csdn.net/sahye_plinnae/article/details/140635652

版权

记录一下部署过程，ollama最好升级到最新版本，并下载llm模型和embedding模型，支持的模型可以在这里查询

新建环境

python环境为3.10到3.12

conda create -n <环境名> python=3.10

安装graphrag和相关依赖库

pip install graphrag
pip install ollama
pip install langchain_community

新建存放数据的文件夹

mkdir -p ./ragtest/input

在input文件夹里放数据文件，只支持txt和csv

初始化

python -m graphrag.index --init --root ./ragtest

会生成一系列文件，其中settings.yaml文件需做如下修改：
在encoding_model部分修改model为ollama中下载的模型，api_base设置地址为ollama的发布地址

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ollama
  type: openai_chat # or azure_openai_chat
  model: deepseek-v2
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://127.0.0.1:11434/v1
  # api_base: https://<instance>.openai.azure.com
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  # tokens_per_minute: 150_000 # set a leaky bucket throttle
  # requests_per_minute: 10_000 # set a leaky bucket throttle
  # max_retries: 10
  # max_retry_wait: 10.0
  # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  # concurrent_requests: 25 # the number of parallel inflight requests that may be made

在embeddings部分修改model为ollama中的embedding模型，api_base设置地址为ollama的发布地址，注意这里的地址和上面那个地址的后缀是不一样的，这里的nomic-embed-text是ollama中的embedding模型

embeddings:
  ## parallelization: override the global parallelization settings for embeddings
  async_mode: threaded # or asyncio
  llm:
    api_key: ${GRAPHRAG_API_KEY}
    type: openai_embedding # or azure_openai_embedding
    model: nomic-embed-text
    api_base: http://127.0.0.1:11434/api
    # api_base: https://<instance>.openai.azure.com
    # api_version: 2024-02-15-preview
    # organization: <organization_id>
    # deployment_name: <azure_model_deployment_name>
    # tokens_per_minute: 150_000 # set a leaky bucket throttle
    # requests_per_minute: 10_000 # set a leaky bucket throttle
    # max_retries: 10
    # max_retry_wait: 10.0
    # sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
    # concurrent_requests: 25 # the number of parallel inflight requests that may be made
    # batch_size: 16 # the number of documents to send in a single request
    # batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
    # target: required # or optional

修改graphrag源码

源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/llm/openai/openai_embeddings_llm.py
如果embedding模型不是nomic-embed-text，在这部分也需要改成自己的模型名称


	......

    async def _execute_llm(
        self, input: EmbeddingInput, **kwargs: Unpack[LLMInput]
    ) -> EmbeddingOutput | None:
        args = {
            "model": self.configuration.model,
            **(kwargs.get("model_parameters") or {}),
        }
        
        ## 以下为修改内容(记得import ollama)
        embedding_list = []
        for inp in input:
            embedding = ollama.embeddings(model="nomic-embed-text", prompt=inp)
            embedding_list.append(embedding["embedding"])
        return embedding_list
        
        ## 以下为原始内容
#         embedding = await self.client.embeddings.create(
#             input=input,
#             **args,
#         )
#         return [d.embedding for d in embedding.data]

（非常必要，能查看日志是哪里报错）源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/query/__main__.py

if __name__ == "__main__":
    ## 添加log输出
    import logging
    import sys
    
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s - %(levelname)s - %(message)s",
        handlers=[
            logging.StreamHandler(sys.stdout),
        ],
    )

源码路径为envs/<环境名>/lib/<python版本>/site-packages/graphrag/query/llm/oai/embedding.py
整个代码覆盖

# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License

"""OpenAI Embedding model implementation."""

import asyncio
from collections.abc import Callable
from typing import Any

import numpy as np
import tiktoken
from tenacity import (
    AsyncRetrying,
    RetryError,
    Retrying,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential_jitter,
)

from graphrag.query.llm.base import BaseTextEmbedding
from graphrag.query.llm.oai.base import OpenAILLMImpl
from graphrag.query.llm.oai.typing import (
    OPENAI_RETRY_ERROR_TYPES,
    OpenaiApiType,
)
from graphrag.query.llm.text_utils import chunk_text
from graphrag.query.progress import StatusReporter

from langchain_community.embeddings import OllamaEmbeddings



class OpenAIEmbedding(BaseTextEmbedding, OpenAILLMImpl):
    """Wrapper for OpenAI Embedding models."""

    def __init__(
        self,
        api_key: str | None = None,
        azure_ad_token_provider: Callable | None = None,
        model: str = "text-embedding-3-small",
        deployment_name: str | None = None,
        api_base: str | None = None,
        api_version: str | None = None,
        api_type: OpenaiApiType = OpenaiApiType.OpenAI,
        organization: str | None = None,
        encoding_name: str = "cl100k_base",
        max_tokens: int = 8191,
        max_retries: int = 10,
        request_timeout: float = 180.0,
        retry_error_types: tuple[type[BaseException]] = OPENAI_RETRY_ERROR_TYPES,  # type: ignore
        reporter: StatusReporter | None = None,
    ):
        OpenAILLMImpl.__init__(
            self=self,
            api_key=api_key,
            azure_ad_token_provider=azure_ad_token_provider,
            deployment_name=deployment_name,
            api_base=api_base,
            api_version=api_version,
            api_type=api_type,  # type: ignore
            organization=organization,
            max_retries=max_retries,
            request_timeout=request_timeout,
            reporter=reporter,
        )

        self.model = model
        self.encoding_name = encoding_name
        self.max_tokens = max_tokens
        self.token_encoder = tiktoken.get_encoding(self.encoding_name)
        self.retry_error_types = retry_error_types

    def embed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's sync function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        Please refer to: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        for chunk in token_chunks:
            try:
                embedding, chunk_len = self._embed_with_retry(chunk, **kwargs)
                chunk_embeddings.append(embedding)
                chunk_lens.append(chunk_len)
            # TODO: catch a more specific exception
            except Exception as e:  # noqa BLE001
                self._reporter.error(
                    message="Error embedding chunk",
                    details={self.__class__.__name__: str(e)},
                )

                continue
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    async def aembed(self, text: str, **kwargs: Any) -> list[float]:
        """
        Embed text using OpenAI Embedding's async function.

        For text longer than max_tokens, chunk texts into max_tokens, embed each chunk, then combine using weighted average.
        """
        token_chunks = chunk_text(
            text=text, token_encoder=self.token_encoder, max_tokens=self.max_tokens
        )
        chunk_embeddings = []
        chunk_lens = []
        embedding_results = await asyncio.gather(*[
            self._aembed_with_retry(chunk, **kwargs) for chunk in token_chunks
        ])
        embedding_results = [result for result in embedding_results if result[0]]
        chunk_embeddings = [result[0] for result in embedding_results]
        chunk_lens = [result[1] for result in embedding_results]
        chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)  # type: ignore
        chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings)
        return chunk_embeddings.tolist()

    def _embed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = Retrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            for attempt in retryer:
                with attempt:
                    embedding = (
                        OllamaEmbeddings(
                            model=self.model,
                        ).embed_query(text)
                        or []
                    )
                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

    async def _aembed_with_retry(
        self, text: str | tuple, **kwargs: Any
    ) -> tuple[list[float], int]:
        try:
            retryer = AsyncRetrying(
                stop=stop_after_attempt(self.max_retries),
                wait=wait_exponential_jitter(max=10),
                reraise=True,
                retry=retry_if_exception_type(self.retry_error_types),
            )
            async for attempt in retryer:
                with attempt:
                    embedding = (
                        await OllamaEmbeddings(
                            model=self.model,
                        ).embed_query(text) or [] )
                    return (embedding, len(text))
        except RetryError as e:
            self._reporter.error(
                message="Error at embed_with_retry()",
                details={self.__class__.__name__: str(e)},
            )
            return ([], 0)
        else:
            # TODO: why not just throw in this case?
            return ([], 0)

版本号

Package                   Version
------------------------- -----------
aiofiles                  24.1.0
aiohttp                   3.9.5
aiolimiter                1.1.0
aiosignal                 1.3.1
annotated-types           0.7.0
anyio                     4.4.0
anytree                   2.12.1
asttokens                 2.4.1
async-timeout             4.0.3
attrs                     23.2.0
autograd                  1.6.2
azure-common              1.1.28
azure-core                1.30.2
azure-identity            1.17.1
azure-search-documents    11.5.0
azure-storage-blob        12.21.0
beartype                  0.18.5
cachetools                5.4.0
certifi                   2024.7.4
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpickle               3.0.0
contourpy                 1.2.1
cramjam                   2.8.3
cryptography              43.0.0
cycler                    0.12.1
dask                      2024.7.1
dask-expr                 1.1.9
dataclasses-json          0.6.7
datashaper                0.0.49
decorator                 5.1.1
deprecation               2.1.0
devtools                  0.12.2
diskcache                 5.6.3
distro                    1.9.0
environs                  11.0.0
exceptiongroup            1.2.2
executing                 2.0.1
fastparquet               2024.5.0
fonttools                 4.53.1
frozenlist                1.4.1
fsspec                    2024.6.1
future                    1.0.0
gensim                    4.3.3
graphrag                  0.1.1
graspologic               3.4.1
graspologic-native        1.2.1
greenlet                  3.0.3
h11                       0.14.0
httpcore                  1.0.5
httpx                     0.27.0
hyppo                     0.4.0
idna                      3.7
importlib_metadata        8.1.0
isodate                   0.6.1
joblib                    1.4.2
jsonpatch                 1.33
jsonpointer               3.0.0
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
lancedb                   0.9.0
langchain                 0.2.11
langchain-community       0.2.10
langchain-core            0.2.23
langchain-text-splitters  0.2.2
langsmith                 0.1.93
linkify-it-py             2.0.3
llvmlite                  0.43.0
locket                    1.0.0
markdown-it-py            3.0.0
marshmallow               3.21.3
matplotlib                3.9.1
mdit-py-plugins           0.4.1
mdurl                     0.1.2
msal                      1.30.0
msal-extensions           1.2.0
multidict                 6.0.5
mypy-extensions           1.0.0
networkx                  3.3
nltk                      3.8.1
numba                     0.60.0
numpy                     1.26.4
ollama                    0.3.0
openai                    1.37.0
orjson                    3.10.6
overrides                 7.7.0
packaging                 24.1
pandas                    2.2.2
partd                     1.4.2
patsy                     0.5.6
pillow                    10.4.0
pip                       24.0
portalocker               2.10.1
POT                       0.9.4
psutil                    6.0.0
py                        1.11.0
pyaml-env                 1.2.1
pyarrow                   15.0.0
pycparser                 2.22
pydantic                  2.8.2
pydantic_core             2.20.1
Pygments                  2.18.0
PyJWT                     2.8.0
pylance                   0.13.0
pynndescent               0.5.13
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-dotenv             1.0.1
pytz                      2024.1
PyYAML                    6.0.1
ratelimiter               1.2.0.post0
referencing               0.35.1
regex                     2024.5.15
requests                  2.32.3
retry                     0.9.2
rich                      13.7.1
rpds-py                   0.19.0
scikit-learn              1.5.1
scipy                     1.12.0
seaborn                   0.13.2
setuptools                69.5.1
six                       1.16.0
smart-open                7.0.4
sniffio                   1.3.1
SQLAlchemy                2.0.31
statsmodels               0.14.2
swifter                   1.4.0
tenacity                  8.5.0
textual                   0.70.0
threadpoolctl             3.5.0
tiktoken                  0.7.0
toolz                     0.12.1
tqdm                      4.66.4
typing_extensions         4.12.2
typing-inspect            0.9.0
tzdata                    2024.1
uc-micro-py               1.0.3
umap-learn                0.5.6
urllib3                   2.2.2
uvloop                    0.19.0
wheel                     0.43.0
wrapt                     1.16.0
yarl                      1.9.4
zipp                      3.19.2

构建索引和知识图谱（需要很长时间）

python -m graphrag.index --root ./ragtest

查询（只能本地查询）

python -m graphrag.query --root ./ragtest --method local "<问题>"

后续

~~可以尝试https://github.com/TheAiSingularity/graphrag-local-ollama的优化~~
尝试过后发现用这个仓库，要改的东西也差不太多

报错处理

⠦ GraphRAG Indexer 
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.

这里的日志报错我记得有三种，不记得具体日志长啥样了：

embedding模型的名字nomic-embed-text写成了下划线，有两个地方要改这个名字
ollama的版本太低了不支持embedding模型，没法发送embedding的请求。低版本的ollama也可以正常下载embedding模型，所以最好检查一下发送embedding请求能不能正常返回，命令行直接运行下面的代码，能返回矩阵就是能正常用，否则就需要更新ollama：

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The sky is blue because of Rayleigh scattering"
}'

大模型的选择，有些大模型似乎是不支持json格式（返回空值，也可能是prompt的问题），或者是回答时会在json格式前面加一点解释文字，导致报错：json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) 解决方法是改 prompt 或者换模型