

茴香豆 是由书生·浦语团队开发的一款开源、专门针对国内企业级使用场景设计并优化的知识问答工具。在基础 RAG 课程中我们了解到,RAG 可以有效的帮助提高 LLM 知识检索的相关性、实时性,同时避免 LLM 训练带来的巨大成本。在实际的生产和生活环境需求,对 RAG 系统的开发、部署和调优的挑战更大,如需要解决群应答、能够无关问题拒答、多渠道应答、更高的安全性挑战。因此,根据大量国内用户的实际需求,总结出了三阶段Pipeline的茴香豆知识问答助手架构,帮助企业级用户可以快速上手安装部署。


  • 三阶段 Pipeline (前处理、拒答、响应),提高相应准确率和安全性

  • 打通微信和飞书群聊天,适合国内知识问答场景

  • 支持各种硬件配置安装,安装部署限制条件少

  • 适配性强,兼容多个 LLM 和 API

  • 傻瓜操作,安装和配置方便

本教程将通过茴香豆 Web 版和本地版的搭建,带领同学们学会如何快速搭建一个企业级的 RAG 知识问答系统。

1 Web 版茴香豆

Web 版茴香豆部署在浦源平台,可以让大家零编程体验茴香豆的各种功能。这里 有作者大神亲自的视频演示。

1.1 创建 Web 版茴香豆账户和密码

登录 OpenXLab浦源 - 应用中心,可以看到 Web 版茴香豆的知识库注册页面,在对应处输入想要创建的知识库名称和密码,该名称就是 Web 版茴香豆的账户密码,请牢记,以后对该知识助手进行维护和修改都要使用这个账户和密码。

1.2 创建 Web 版茴香豆知识库

完成账户创建或者输入已有账户密码后会进入相应知识库的开发页面,当前 Web 版茴香豆功能包括:

  • 添加/删除文档

  • 编辑正反例

  • 打通微信和飞书群

  • 开启网络搜索功能(需要填入自己的 Serper token,token 获取参考 3.1 开启网络搜索

  • 聊天测试

点击添加文档的 查看或上传 按钮,对知识库文档进行修改,目前支持 pdf、word、markdown、excel、ppt、html 和 txt 格式文件的上传和删除。上传或删除文件后将自动进行特征提取,生成的向量知识库被用于后续 RAG 检索和相似性比对。


1.3 通过配置正反例调优知识助手效果

在真实的使用场景中,调试知识助手回答相关问题和拒答无关问题(如闲聊)是保证回答准确率和效率十分重要的部分。茴香豆的架构中,除了利用 LLM 的功能判断问题相关性,也可以通过手动添加正例(希望模型回答的问题)和反例(希望模型拒答的问题)来调优知识助手的应答效果。

在 Web 版茴香豆中,点击添加正反例下的 查看或编辑 按钮,进入正反例添加页面:




2 茴香豆本地标准版搭建

在第一部分中,我们利用 Web 版茴香豆实现了零代码开发部署一款 RAG 知识助手,在接下来的部分,我们要动手尝试将茴香豆从源码部署到本地服务器(以 InternlmStudio 为例),并开发一款简单的知识助手 Demo。

2.1 环境搭建

2.1.1 配置服务器

首先登录 InternStudio ,选择创建开发机:

镜像选择 Cuda11.7-conda ,资源类型选择 30% A\*100。输入开发机名称 huixiangdou, 点击立即创建。

在 开发机 页面选择刚刚创建的个人开发机 huixiangdou,单击 启动

等服务器准备好开发机资源后,点击 进入开发机,继续进行开发环境的搭建。

2.1.2 搭建茴香豆虚拟环境

命令行中输入一下命令,创建茴香豆专用 conda 环境:

studio-conda -o internlm-base -t huixiangdou


conda activate huixiangdou

环境激活成功后,命令行前的括号内会显示正在使用的环境,请确保所有茴香豆操作指令在 huixiangdou 环境下运行。

2.2 安装茴香豆


2.2.1 下载茴香豆


cd /root
# 克隆代码仓库
git clone https://github.com/internlm/huixiangdou && cd huixiangdou
git checkout 79fa810


2.2.2 安装茴香豆所需依赖


conda activate huixiangdou
# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install BCEmbedding==0.1.5 cmake==3.30.2 lit==18.1.8 sentencepiece==0.2.0 protobuf==5.27.3 accelerate==0.33.0
pip install -r requirements.txt
# python3.8 安装 faiss-gpu 而不是 faiss

2.2.3 下载模型文件


# 创建模型文件夹
cd /root && mkdir models

# 复制BCE模型
ln -s /root/share/new_models/maidalun1020/bce-embedding-base_v1 /root/models/bce-embedding-base_v1
ln -s /root/share/new_models/maidalun1020/bce-reranker-base_v1 /root/models/bce-reranker-base_v1

# 复制大模型参数(下面的模型,根据作业进度和任务进行**选择一个**就行)
ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-7b /root/models/internlm2-chat-7b


2.2.4 更改配置文件

茴香豆的所有功能开启和模型切换都可以通过 config.ini 文件进行修改,默认参数如下:

# `feature_store.py` use this throttle to distinct `good_questions` and `bad_questions`
reject_throttle = -1.0
# text2vec model, support local relative path, huggingface repo and URL.
# for example:
#  "maidalun1020/bce-embedding-base_v1"
#  "BAAI/bge-m3"
#  "https://api.siliconflow.cn/v1/embeddings"
embedding_model_path = "maidalun1020/bce-embedding-base_v1"

# reranker model, support list:
#  "maidalun1020/bce-reranker-base_v1"
#  "BAAI/bge-reranker-v2-minicpm-layerwise"
#  "https://api.siliconflow.cn/v1/rerank"
reranker_model_path = "maidalun1020/bce-reranker-base_v1"

# if using `siliconcloud` API as `embedding_model_path` or `reranker_model_path`, give the token
api_token = ""
api_rpm = 1000
work_dir = "workdir"

engine = "serper"
# web search engine support ddgs and serper
# For ddgs, see https://pypi.org/project/duckduckgo-search
# For serper, check https://serper.dev/api-key to get a free API key
serper_x_api_key = "YOUR-API-KEY-HERE"
domain_partial_order = ["arxiv.org", "openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
save_dir = "logs/web_search_result"

enable_local = 1
enable_remote = 0
# hybrid llm service address
client_url = ""

# local LLM configuration
# support "internlm/internlm2-chat-7b", "internlm2_5-7b-chat" and "qwen/qwen-7b-chat-int8"
# support local path, for example
# local_llm_path = "/path/to/your/internlm2_5"

local_llm_path = "internlm/internlm2_5-7b-chat"
local_llm_max_text_length = 3000
# llm server listen port
local_llm_bind_port = 8888

# remote LLM service configuration
# support "gpt", "kimi", "deepseek", "zhipuai", "step", "internlm", "xi-api" and "alles-apin"
# support "siliconcloud", see https://siliconflow.cn/zh-cn/siliconcloud
# xi-api and alles-apin is chinese gpt proxy
# for internlm, see https://internlm.intern-ai.org.cn/api/document

remote_type = "kimi"
remote_api_key = "YOUR-API-KEY-HERE"
# max text length for remote LLM.
# use 128000 for kimi, 192000 for gpt/xi-api, 16000 for deepseek, 128000 for zhipuai, 40000 for internlm2
remote_llm_max_text_length = 128000
# openai API model type, support model list:
# "auto" for kimi. To save money, we auto select model name by prompt length.
# "auto" for step to save money, see https://platform.stepfun.com/
# "gpt-4-0613" for gpt/xi-api,
# "deepseek-chat" for deepseek,
# "glm-4" for zhipuai,
# "gpt-4-1106-preview" for alles-apin or OpenAOE
# "internlm2-latest" for internlm
# for example "alibaba/Qwen1.5-110B-Chat", see https://siliconflow.readme.io/reference/chat-completions-1
remote_llm_model = "auto"
# request per minute
rpm = 500

base_url = ''
api_key = 'token-abc123'

# enable web search or not
enable_web_search = 1
# enable search enhancement or not
enable_sg_search = 0
# enable coreference resolution in `PreprocNode`
enable_cr = 0
save_path = "logs/work.txt"

enable = 0
start = "00:00:00"
end = "23:59:59"
has_weekday = 1

# download `src` from https://github.com/sourcegraph/src-cli#installation
binary_src_path = "/usr/local/bin/src"
src_access_token = "YOUR-SRC-ACCESS-TOKEN"

# add your repo here, we just take opencompass and lmdeploy as example
github_repo_id = "open-compass/opencompass"
introduction = "用于评测大型语言模型(LLM). 它提供了完整的开源可复现的评测框架,支持大语言模型、多模态模型的一站式评测,基于分布式技术,对大参数量模型亦能实现高效评测。评测方向汇总为知识、语言、理解、推理、考试五大能力维度,整合集纳了超过70个评测数据集,合计提供了超过40万个模型评测问题,并提供长文本、安全、代码3类大模型特色技术能力评测。"
# introduction = "For evaluating Large Language Models (LLMs). It provides a fully open-source, reproducible evaluation framework, supporting one-stop evaluation for large language models and multimodal models. Based on distributed technology, it can efficiently evaluate models with a large number of parameters. The evaluation directions are summarized in five capability dimensions: knowledge, language, understanding, reasoning, and examination. It integrates and collects more than 70 evaluation datasets, providing in total over 400,000 model evaluation questions. Additionally, it offers evaluations for three types of capabilities specific to large models: long text, security, and coding."

github_repo_id = "internlm/lmdeploy"
introduction = "lmdeploy 是一个用于压缩、部署和服务 LLM(Large Language Model)的工具包。是一个服务端场景下,transformer 结构 LLM 部署工具,支持 GPU 服务端部署,速度有保障,支持 Tensor Parallel,多并发优化,功能全面,包括模型转换、缓存历史会话的 cache feature 等. 它还提供了 WebUI、命令行和 gRPC 客户端接入。"
# introduction = "lmdeploy is a toolkit for compressing, deploying, and servicing Large Language Models (LLMs). It is a deployment tool for transformer-structured LLMs in server-side scenarios, supporting GPU server-side deployment, ensuring speed, and supporting Tensor Parallel along with optimizations for multiple concurrent processes. It offers comprehensive features including model conversion, cache features for caching historical sessions and more. Additionally, it provides access via WebUI, command line, and gRPC clients."
# add your repo here, we just take opencompass and lmdeploy as example

github_repo_id = "open-mmlab/mmpose"
introduction = "MMPose is an open-source toolbox for pose estimation based on PyTorch"

github_repo_id = "open-mmlab/mmdetection"
introduction = "MMDetection is an open source object detection toolbox based on PyTorch."

github_repo_id = "internlm/huixiangdou"
introduction = "茴香豆是一个基于 LLM 的群聊知识助手。设计拒答、响应两阶段 pipeline 应对群聊场景,解答问题同时不会消息泛滥。"

github_repo_id = "internlm/xtuner"
introduction = "XTuner is an efficient, flexible and full-featured toolkit for fine-tuning large models."

github_repo_id = "open-mmlab/mmyolo"
introduction = "OpenMMLab YOLO series toolbox and benchmark. Implemented RTMDet, RTMDet-Rotated,YOLOv5, YOLOv6, YOLOv7, YOLOv8,YOLOX, PPYOLOE, etc."

github_repo_id = "open-mmlab/Amphion"
introduction = "Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development."

github_repo_id = "open-mmlab/mmcv"
introduction = "MMCV is a foundational library for computer vision research and it provides image/video processing, image and annotation visualization, image transformation, various CNN architectures and high-quality implementation of common CPU and CUDA ops"

# chat group assistant type, support "lark_group", "wechat_personal", "wechat_wkteam" and "none"
# for "lark_group", open https://open.feishu.cn/document/home/introduction-to-custom-app-development/self-built-application-development-process to create one
# for "wechat_personal", read ./docs/add_wechat_group_zh.md to setup gateway
# for "wkteam", see https://wkteam.cn/
type = "none"

# for "lark", it is chat group webhook url, send reply to group, for example "https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxx"
# for "lark_group", it is the url to fetch chat group message, for example "", `` is your own public IPv4 addr
# for "wechat_personal", it is useless
webhook_url = "https://open.feishu.cn/open-apis/bot/v2/hook/xxxxxxxxxxxxxxx"

# when a new group chat message is received, should it be processed immediately or wait for 18 seconds in case the user hasn't finished speaking?
# support "immediate"
message_process_policy = "immediate"

# "lark_group" configuration examples, use your own app_id and secret !!!
app_id = "cli_a53a34dcb778500e"
app_secret = "2ajhg1ixSvlNm1bJkH4tJhPfTCsGGHT1"
encrypt_key = "abc"
verification_token = "def"

# "wechat_personal" listen port
bind_port = 9527

# wechat message callback server ip
callback_ip = ""
callback_port = 9528

# public redis config
redis_host = ""
redis_port = "6380"
redis_passwd = "hxd123"

# wkteam
account = ""
password = ""
# !!! `proxy` is very import parameter, it's your account location
# 1:北京 2:天津 3:上海 4:重庆 5:河北
# 6:山西 7:江苏 8:浙江 9:安徽 10:福建
# 11:江西 12:山东 13:河南 14:湖北 15:湖南
# 16:广东 17:海南 18:四川 20:陕西
# bad proxy would cause account deactivation !!!
proxy = -1

# save dir
dir = "wkteam"

# 群号和介绍
# 茴香豆相关
name = "茴香豆群(大暑)"
introduction = "github https://github.com/InternLM/HuixiangDou 用户体验群"

name = "茴香豆群(立夏)"
introduction = "github https://github.com/InternLM/HuixiangDou 用户体验群"

name = "茴香豆群(惊蛰)"
introduction = "github https://github.com/InternLM/HuixiangDou 用户体验群"

name = "茴香豆群(谷雨)"
introduction = "github https://github.com/InternLM/HuixiangDou 用户体验群"

name = "茴香豆群(雨水)"
introduction = "github https://github.com/InternLM/HuixiangDou 用户体验群"

# github.com/tencent/ncnn contributors
name = "卷卷群"
introduction = "ncnn contributors group"


sed -i '9s#.*#embedding_model_path = "/root/models/bce-embedding-base_v1"#' /root/huixiangdou/config.ini
sed -i '15s#.*#reranker_model_path = "/root/models/bce-reranker-base_v1"#' /root/huixiangdou/config.ini
sed -i '43s#.*#local_llm_path = "/root/models/internlm2-chat-7b"#' /root/huixiangdou/config.ini

也可以用编辑器手动修改,文件位置为 /root/huixiangdou/config.ini


注意!配置文件默认的模型和下载好的模型相同。如果不修改地址为本地模型地址,茴香豆将自动从 huggingface hub 拉取模型。如果选择拉取模型的方式,需要提前在命令行中运行 huggingface-cli login 命令,验证 huggingface 权限。

2.3 知识库创建

修改完配置文件后,就可以进行知识库的搭建,本次教程选用的是茴香豆和 MMPose 的文档,利用茴香豆搭建一个茴香豆和 MMPose 的知识问答助手。

conda activate huixiangdou

cd /root/huixiangdou && mkdir repodir

git clone https://github.com/internlm/huixiangdou --depth=1 repodir/huixiangdou
git clone https://github.com/open-mmlab/mmpose    --depth=1 repodir/mmpose

# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store

 在 huixiangdou 文件加下创建 repodir 文件夹,用来储存知识库原始文档。再创建一个文件夹 workdir 用来存放原始文档特征提取到的向量知识库。

知识库创建成功后会有一系列小测试,检验问题拒答和响应效果,如图所示,关于“mmpose 安装”的问题,测试结果可以很好的反馈相应答案和对应的参考文件,但关于“std::vector 使用”的问题,因为属于 C++ 范畴,不再在知识库范围内,测试结果显示拒答,说明我们的知识助手工作正常。

和 Web 版一样,本地版也可以通过编辑正反例来调整茴香豆的拒答和响应,正例位于 /root/huixiangdou/resource/good_questions.json 文件夹中,反例位于/root/huixiangdou/resource/bad_questions.json

需要注意的是,每次更新原始知识文档和正反例,都需要重新运行 python3 -m huixiangdou.service.feature_store 命令进行向量知识库的重新创建和应答阈值的更新。

配置中可见,在运行过一次特征提取后,茴香豆的阈值从 -1.0 更新到了 0.33。 配置文件中的 work_dir 参数指定了特征提取后向量知识库存放的位置。如果有多个知识库快速切换的需求,可以通过更改该参数实现。

2.4 测试知识助手

2.4.1 命令行运行


conda activate huixiangdou
cd /root/huixiangdou
python3 -m huixiangdou.main --standalone



2.4.2 Gradio UI 界面测试

茴香豆也用 gradio 搭建了一个 Web UI 的测试界面,用来测试本地茴香豆助手的效果。

本节课程中,茴香豆助手搭建在远程服务器上,因此需要先建立本地和服务器之间的透传,透传默认的端口为 7860,在本地机器命令行中运行如下命令:

ssh -CNg -L 7860: root@ssh.intern-ai.org.cn -p <你的ssh端口号>

在运行茴香豆助手的服务器端,输入下面的命令,启动茴香豆 Web UI:

conda activate huixiangdou
cd /root/huixiangdou
python3 -m huixiangdou.gradio

看到上图相同的结果,说明 Gradio 服务启动成功,在本地浏览器中输入 打开茴香豆助手测试页面:



python3 -m huixiangdou.service.feature_store
python3 -m huixiangdou.gradio






