Dify外接RAGFlow知识库

铁骨铮铮刘大海

已于 2025-02-17 14:39:45 修改

阅读量1.8k

点赞数 9

文章标签： python 人工智能语言模型

于 2025-02-17 14:06:11 首次发布

本文链接：https://blog.csdn.net/m0_50664825/article/details/145679130

版权

1. 部署RAGFlow

参考官网教程：ragflow/README_zh.md at main · infiniflow/ragflow · GitHub

配置要求：

CPU >= 4 核
RAM >= 16 GB
Disk >= 50 GB
Docker >= 24.0.0 & Docker Compose >= v2.26.1
Docker安装参考Install | Docker Docs

2. 创建RAGFlow知识库

3. 构建RAGFlow知识库API

参考Dify外部知识库API规范External Knowledge API | Dify

将RAGFlow自带的知识库检索APIHTTP API | RAGFlow封装成符合Dify外部知识库API规范的接口。

from fastapi import FastAPI, HTTPException, Header, Depends
from pydantic import BaseModel
import requests

app = FastAPI()

ORIGINAL_API_URL = "http://{your_ragflow _address}/api/v1/retrieval"


class RetrievalSetting(BaseModel):
    top_k: int
    score_threshold: float


class RetrievalRequest(BaseModel):
    knowledge_id: str
    query: str
    retrieval_setting: RetrievalSetting


def get_api_key(authorization: str = Header(...)):
    if not authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Invalid Authorization header format.")
    return authorization.split(" ")[1]


@app.post("/retrieval")
def retrieve_chunks(request_data: RetrievalRequest, api_key: str = Depends(get_api_key)):
    payload = {
        "question": request_data.query,
        "dataset_ids": [request_data.knowledge_id],
        "top_k": request_data.retrieval_setting.top_k,
        "similarity_threshold": request_data.retrieval_setting.score_threshold
    }
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {api_key}"
    }
    response = requests.post(ORIGINAL_API_URL, json=payload, headers=headers)

    if response.status_code != 200:
        raise HTTPException(status_code=response.status_code, detail=response.json())

    original_data = response.json()
    records = [
        {
            "content": chunk["content"],
            "score": chunk["similarity"],
            "title": chunk.get("document_keyword", "Unknown Document"),
            "metadata": {"document_id": chunk["document_id"]}
        }
        for chunk in original_data.get("data", {}).get("chunks", [])
    ]
    return {"records": records}

import uvicorn

if __name__ == "__main__":
    uvicorn.run("app:app", host="0.0.0.0", port=8500, reload=True)