综合对比了几个开源的RAG框架,最终选择了MaxKB作为基础框架来实现RAG应用,MaxKB更轻量简单,对于企业内使用不需要那么复杂的系统,简单好用扩展方便就可以了,这里主要对MaxKB本地开发环境搭建做一个整理。
当前环境是:Win11 + Docker Desktop + Conda
开发环境构建
conda环境安装
conda安装
conda安装maxkb独立环境
conda create --name MaxKB python=3.11
conda activate MaxKB
安装依赖
pip install poetry
poetry源替换
poetry self add poetry-plugin-pypi-mirror
修改配置文件安装源:
pyproject.toml
[[tool.poetry.source]]
name = "tsinghua"
url = "https://pypi.tuna.tsinghua.edu.cn/simple/"
priority = "primary"
postgresql数据库安装
安装postgresql,要带vector向量:
docker-compose.yaml
version: "3.3"
services:
postgres:
image: ankane/pgvector:latest
container_name: pjp_postgres
restart: always
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: 123456
ports:
- 5432:5432
docker-compose up -d
连接数据库(工具)
create database maxkb;
连接maxkb,然后创建vector:
CREATE EXTENSION “vector”;
embedding向量模型下载
到魔塔社区下载
https://www.modelscope.cn/models/sungw111/text2vec-base-chinese-sentence
拷贝到:D:\opt\maxkb\model
MaxKB配置修改
D:/opt/maxkb/conf/config.yml
# 数据库链接信息
DB_NAME: maxkb
DB_HOST: 192.168.1.152
DB_PORT: 5432
DB_USER: root
DB_PASSWORD: 123456
DB_ENGINE: django.db.backends.postgresql_psycopg2
DEBUG: false
TIME_ZONE: Asia/Shanghai
# 模型相关配置
# 模型路径:如果EMBEDDING_MODEL_NAME是绝对路径则无效,反之则会从https://huggingface.co/下载模型到当前目录
EMBEDDING_MODEL_PATH: /opt/maxkb/model/
# 模型名称:如果模型名称是路径,则会加载目录下的模型,如果是模型名称,则会在https://huggingface.co/下载模型 模型的下载位置为EMBEDDING_MODEL_PATH
EMBEDDING_MODEL_NAME: /opt/maxkb/model/text2vec-base-chinese-sentence
MaxKB前端启动
前端初始化和启动:
(MaxKB_2) PS D:\rwl_space\project_python\MaxKB> cd ui
(MaxKB_2) PS D:\rwl_space\project_python\MaxKB\ui> npm install
(MaxKB_2) PS D:\rwl_space\project_python\MaxKB\ui> npm run dev
MaxKB后端启动
需要启动web、celery、model三个模块。
通过pycharm启动:
启动8080后台接口:
python main.py dev web
启动celery定时任务:
python main.py dev celery
Celery start: cmd=['celery', '-A', 'ops', 'worker', '-P', 'threads', '-l', 'info', '-c', '10', '-Q', 'celery', '--heartbeat-interval', '10', '-n', 'celery@%h', '--without-mingle'], kwargs={'cwd': 'D:\\rwl_space\\project_python\\MaxKB\\apps'}
-------------- celery@USER-20240227OR v5.5.1 (immunity)
--- ***** -----
-- ******* ---- Windows-10-10.0.22631-SP0 2025-04-16 11:58:34
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: MaxKB:0x1f4f2d551d0
- ** ---------- .> transport: sqla+sqlite:///D:\rwl_space\project_python\MaxKB\data\celery_task\celery_db.sqlite3
- ** ---------- .> results: sqlite:///D:\rwl_space\project_python\MaxKB\data\celery_task\celery_results.sqlite3
- *** --- * --- .> concurrency: 10 (thread)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> celery exchange=celery(direct) key=celery
启动本地模型11636:
python main.py dev local_model
登录使用
http://192.168.1.152:3000
默认账户密码:
用户名: admin
密码: MaxKB@123…
其他
2025.04.16:异常问题No module named ‘pwd’
MaxKB用start启动报错:
2025-04-16 10:18:44,563 - root - ERROR - Start service error [‘all’]: No module named ‘pwd’
通过打印堆栈traceback.print_exc()找到是uitls.py引用了daemon导致的:
File "D:\rwl_space\project_python\MaxKB\apps\common\management\commands\start.py", line 1, in <module>
from .services.command import BaseActionCommand, Action
File "D:\rwl_space\project_python\MaxKB\apps\common\management\commands\services\command.py", line 8, in <module>
from .utils import ServicesUtil
File "D:\rwl_space\project_python\MaxKB\apps\common\management\commands\services\utils.py", line 4, in <module>
import daemon
File "D:\program\miniconda\envs\MaxKB_2\Lib\site-packages\daemon\__init__.py", line 33, in <module>
from .daemon import DaemonContext
File "D:\program\miniconda\envs\MaxKB_2\Lib\site-packages\daemon\daemon.py", line 13, in <module>
import pwd
解决方案1:使用 dev启动,但是要启动好几个服务
参考:https://bbs.fit2cloud.com/t/topic/9750/3
解决方案2:修改源码
apps/common/management/commands/services/utils.py去掉daemon依赖
apps/common/management/commands/services/services/celery_base.py 判断linux和windows
但是这种还是没法做到跟windowns兼容,很多服务都是通过cmd启动的
这个目录下有pid记录,重启的时候要清理下:
D:\rwl_space\project_python\MaxKB\tmp