[开源项目学习]langchain-chatchat学习笔记3

临风而眠

已于 2024-05-18 18:09:27 修改

阅读量454

点赞数 3

分类专栏：大语言模型文章标签： fastchat

于 2024-05-18 17:58:20 首次发布

本文链接：https://blog.csdn.net/qq_52431436/article/details/139026824

版权

大语言模型专栏收录该内容

16 篇文章 1 订阅

订阅专栏

文章目录

fastchat
vllm

fastchat

来看看fastchat是怎么部署成api的呀

参考教程：

在这里插入图片描述

我按照第二个教程链接，一次运行controller，openai api server，还有model worker，然后报错了

[Errno 99] error while attempting to bind on address ('::1', 21001, 0, 0)

按照这篇说的解决了
我给三个命令都加上了host 0.0.0.0

python3 -m fastchat.serve.controller --host 0.0.0.0

python -m fastchat.serve.openai_api_server --host 0.0.0.0

python -m fastchat.serve.model_worker \
   --model-path /root/model/chatglm3-6b --port 21003 \
   --worker-address http://localhost:21003 \
   --host 0.0.0.0

我是分别在三个终端运行的

然后就可以调用端口啦（参考）

下面注释掉的也能用

# import requests
# import json

# response = requests.get('http://localhost:8000/v1/models')
# data = response.json()

# # 使用json.dumps函数美化JSON数据
# pretty_data = json.dumps(data, indent=4)
# print(pretty_data)

import requests
import json

url = "http://localhost:8000/v1/chat/completions"
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
}
data = {
    "model": "chatglm3-6b",
    "max_tokens": 2048,
    # "prompt": "写一篇1000字的作文：《2024回家过年》"
    "messages": [ 
      { "role": "system", "content": "你是一名二次元助手，回答要精简。" },
      { "role": "user", "content": "最近有什么好看的番剧？" }
    ]
}

response = requests.post(url, headers=headers, data=json.dumps(data))

# 打印响应内容
print(response.json())

或者用curl

 curl http://localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
     "model": "chatglm3-6b",
     "messages": [{"role": "user", "content": "北京景点"}],
     "temperature": 0.7
   }'

去看看源代码

可以看到是使用了fastapi

在这里插入图片描述

可以看看对应的接口文档，
在这里插入图片描述

http://localhost:21002/docs

可以去这些url后面加个/docs
然后try it out
就都能测试，这些接口

和那些代码里面@的地方是相对应的
在这里插入图片描述

在这里插入图片描述

如何终止服务

在这里插入图片描述

fastchat的封装解析

这篇写的挺好：FastChat工作原理解析

vllm

没写完，遇到bug没解决

参考：
- 【chatglm】（9）：使用fastchat和vllm部署chatlgm3-6b模型，并简单的进行速度测试对比
- https://github.com/lm-sys/FastChat/blob/main/docs/vllm_integration.md

在这里插入图片描述

python -m vllm.entrypoints.api_server --trust-remote-code --model /root/model/chatglm3-6b
INFO 05-18 15:38:07 llm_engine.py:70] Initializing an LLM engine with config: model='/root/model/chatglm3-6b', tokenizer='/root/model/chatglm3-6b', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1, quantization=None, enforce_eager=False, seed=0)
WARNING 05-18 15:38:07 tokenizer.py:62] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 05-18 15:38:19 llm_engine.py:275] # GPU blocks: 18773, # CPU blocks: 9362
INFO 05-18 15:38:22 model_runner.py:501] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 05-18 15:38:22 model_runner.py:505] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode.
INFO 05-18 15:38:26 model_runner.py:547] Graph capturing finished in 4 secs.
INFO:     Started server process [99226]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:40436 - "GET / HTTP/1.1" 404 Not Found
INFO:     127.0.0.1:40436 - "GET /favicon.ico HTTP/1.1" 404 Not Found

临风而眠

关注

3
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
[开源项目学习]langchain-chatchat学习笔记3

我按照第二个教程链接，一次运行controller，openai api server，还有model worker，然后报错了。来看看fastchat是怎么部署成api的呀。可以去这些url后面加个/docs。和那些代码里面@的地方是相对应的。可以看到是使用了fastapi。我是分别在三个终端运行的。可以看看对应的接口文档，然后try it out。没写完，遇到bug没解决。然后就可以调用端口啦（我给三个命令都加上了。就都能测试，这些接口。
复制链接

扫一扫