Windows和Linux中Langchain-chatchat0.3.1.3部署

好烦啊想摆了

已于 2024-07-26 17:39:18 修改

阅读量1.1k

点赞数 32

文章标签： windows linux langchain

于 2024-07-26 17:38:13 首次发布

本文链接：https://blog.csdn.net/qq_43612410/article/details/140717753

版权

Langchain-chatchat0.3.1.3部署

前言
- 教程

前言

最近在学习RAG部署的时候，遇到一些很奇怪的问题，官方的版本更新太快，以至于部署方式跟不上文档更新，官方提供的部署方式实在是让人头疼（作者太笨），在经过不屑的努力，终于成功了，所以在这里分享一些经验。

教程

本教程针对本地部署，首先conda肯定是要有的，接着，根据官方仓库的地址：link找到docs>install>README_xinference.md文件，这里给出链接：link
在这里插入图片描述
当然，在这里初始化conda的工作就不需要在做了。
直接进入主题：

1、创建chatchat环境

1.根据上面给出的链接创建虚拟环境（以Linux为例）conda create -p /home/envs/chatchat python=3.10，注意，官方说是安装python3.8，但是我在用3.8之后不兼容，所以这里还是用3.10吧，python这里的路径可以自定义。
2.激活坏境conda activate /home/envs/chatchat
3.根据官方文档中安装其他依赖，如下：
pip install langchain-chatchat -U
pip install xinference_client faiss-gpu "unstructured[pdf]"
这里有些小伙伴可能会出现gpu版本安装不上的问题，把gpu改成cpu就可以了（如果只是想复现一下的话）
弄完之后，我们就可以执行一下chatchat init初始化chatchat，出现像下图这样就成功啦！
在这里插入图片描述

2、创建xinference环境

基本和上面一样，照着官方文档整就行了。这里提醒一下，不要在刚刚的终端命令上，最好新建一个窗口。
conda create -p /home/envs/xinference python=3.10
conda activate /home/envs/xinference
pip install xinference --force
pip install tiktoken sentence-transformers
这个时间可能比较久，可以指定清华源节省时间
-i https://pypi.tuna.tsinghua.edu.cn/simple/
全部安装完成之后，拉起xinference框架xinference-local
切记！！！！！！不要关闭这个窗口！！！！！
切记！！！！！！不要关闭这个窗口！！！！！
切记！！！！！！不要关闭这个窗口！！！！！

这里拉起xinference框架的时候也有可能会报错torch缺少…类型的错误，我在这里pip list查看torch版本发现是2.4，还有缺少torchaudio包，所以选择卸载重装并且降级torch，pytorch安装方法大家可以自行百度，这里给出链接:link，我用的是2.1版本，安装命令
conda:conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip:pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

3、初始化项目配置与数据目录

至此，docs>install>README_xinference.md文件所需要的命令已经全部执行完成了，剩下的转回仓库官方网址:link中的.md文件查看教程。
在这里插入图片描述
设置储存地址：
Windows:set CHATCHAT_ROOT=你的地址
Linux：export CHATCHAT_ROOT=你的地址例如：export CHATCHAT_ROOT=/home/envs/chatchat_data

接下来就是参考官方文档进行文件配置了：
1.修改配置文件
配置模型（model_settings.yaml）文件：
默认选用的 LLM 名称
DEFAULT_LLM_MODEL: qwen1.5-chat
默认选用的 Embedding 名称
DEFAULT_EMBEDDING_MODEL: bge-large-zh-v1.5
我们也可以自己选择xinference支持的推理模型，自行选择模型的时候，这里参考官方文档：link
2.LLM下载
使用 xinference registrations -t LLM查看可选择的模型，这里选择qwen1.5-chat。
选定一个模型的时候，我们新建一个llm.py文件，将上面的链接中Xinference Client代码复制过来，这里给大家贴出：
注意，下面的llm.py和embedding.py代码都是在xinference环境下执行的，因为刚刚拉起xinference环境之后就不能输入命令了，所以新建一个窗口进入xinference环境运行py文件。

from xinference.client import Client

client = Client("http://localhost:9997")在这里插入代码片
# The chatglm2 model has the capabilities of "chat" and "embed".
model_uid = client.launch_model(model_name="chatglm2",
                                model_engine="llama.cpp",
                                model_format="ggmlv3",
                                model_size_in_billions=6,
                                quantization="q4_0")
model = client.get_model(model_uid)

chat_history = []
prompt = "What is the largest animal?"
# If the model has "generate" capability, then you can call the
# model.generate API.
model.chat(
    prompt,
    chat_history=chat_history,
    generate_config={"max_tokens": 1024}
)

这里，官方文件也说了：注意，当加载 LLM 模型时，所能运行的引擎与 model_format 和 quantization 参数息息相关。
再看上面的代码，我们使用一个推理模型的时候，就需要查询他的model_format 、quantization、model_size_in_billions参数，但是这些在我们使用的时候是不会知道的，所以，参考链接：link关于模型的推理引擎部分，如下图：在这里插入图片描述
可以看到，官方提供了
xinference engine -e <xinference_endpoint> --model-name qwen-chat命令来进行查询，在这里，<xinference_endpoint>指的是你设定的服务地址，如果没修改的话默认是http://127.0.0.1:9997,qwen-chat是模型名称，如下图：
在这里插入图片描述
我使用的是qwen1.5-chat，所以总体修改为：xinference engine -e http://127.0.0.1:9997 --model-name qwen1.5-chat结果如下图
将上面的参数填入llm.py文件中，如果有不想修改的，那就直接用我的就好了：

from xinference.client import Client

client = Client("http://localhost:9997")
# The chatglm2 model has the capabilities of "chat" and "embed".
model_uid = client.launch_model(model_name="qwen1.5-chat",
                                model_engine="Transformers",
                                model_format="pytorch",
                                model_size_in_billions=4,
                                quantization="none")
model = client.get_model(model_uid)

chat_history = []
prompt = "What is the largest animal?"
# If the model has "generate" capability, then you can call the
# model.generate API.
resp = model.chat(
    prompt,
    chat_history=chat_history,
    generate_config={"max_tokens": 1024}
)
print(resp)

接下来就可以使用xinference框架自动下载qwen1.5-chat模型啦。然后python llm.py文件耐心等待就可以。

3.embedding模型下载
还是参考LLm模型下载的链接，以bge-small-en-v1.5为例，也可以选择其他支持的embedding模型，使用xinference registrations -t embedding 查询支持的embedding模型，新建embedding.py文件，将下面代码粘贴（懒人可以直接用）：

from xinference.client import Client

client = Client("http://localhost:9997")
# The bge-small-en-v1.5 is an embedding model, so the `model_type` needs to be specified.
model_uid = client.launch_model(model_name="bge-small-en-v1.5", model_type="embedding")
model = client.get_model(model_uid)

input_text = "What is the capital of China?"
model.create_embedding(input_text)

然后等待完成，完成之后，新建窗口进入chatchat环境下，执行chatchat kb -r命令，出现以下日志即为成功：
在这里插入图片描述
最后，执行chatchat start -a跳转至界面就成功结束啦~~