LMDeploy量化部署LLM&VLM实战（作业）

最新推荐文章于 2024-05-31 16:22:19 发布

墓袖远笺

最新推荐文章于 2024-05-31 16:22:19 发布

阅读量254

点赞数 3

文章标签：学习笔记

本文链接：https://blog.csdn.net/2301_80703617/article/details/137800294

版权

1.配置 LMDeploy 运行环境

首先创建开发机，选择镜像Cuda12.2-conda；选择10% A100*1GPU，切换为终端(Terminal)模式。然后创建一个名为lmdeploy的环境：

然后激活刚刚创建的虚拟环境，安装lmdeploy

2.以命令行方式与 InternLM2-Chat-1.8B 模型对话

下载模型，如果是在InternStudio开发机上，可以由开发机的共享目录软链接或拷贝模型：

ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b /root/
# cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b /root/

先使用Transformer库运行模型直接运行InternLM2-Chat-1.8B模型，

接着使用LMDeploy与模型进行对话，

3.设置KV Cache最大占用比例为0.4，开启W4A16量化，以命令行方式与模型对话

两者对比，显存占用降低。

开启W4A16后显存占用明显降低。

4.以API Server方式启动 lmdeploy，开启 W4A16量化，调整KV Cache的占用比例为0.4，分别使用命令行客户端与Gradio网页客户端与模型对话

4.1命令行客户端

通过以下命令启动API服务器，推理internlm2-chat-1_8b模型

lmdeploy serve api_server \
    /root/internlm2-chat-1_8b \
    --model-format hf \
    --quant-policy 0 \
    --server-name 0.0.0.0 \
    --server-port 23333 \
    --tp 1

运行命令行客户端，

lmdeploy serve api_client http://localhost:23333

运行后，可以通过命令行窗口直接与模型对话，

运行后，可以通过命令行窗口直接与模型对话。

4.2Gradio网页客户端

使用Gradio作为前端，启动网页客户端。

lmdeploy serve gradio http://localhost:23333 \
    --server-name 0.0.0.0 \
    --server-port 6006

打开浏览器，访问地址http://127.0.0.1:6006

然后就可以与模型进行对话了。

5.使用W4A16量化，调整KV Cache的占用比例为0.4，使用Python代码集成的方式运行internlm2-chat-1.8b模型

新建Python源代码文件，打开，然后填入以下内容，

from lmdeploy import pipeline

pipe = pipeline('/root/internlm2-chat-1_8b')
response = pipe(['Hi, pls intro yourself', '上海是'])
print(response)

保存后运行代码，

设置KV Cache占用比例，后得结果。

6.使用 LMDeploy 运行视觉多模态大模型 llava gradio demo

安装依赖库

pip install git+https://github.com/haotian-liu/LLaVA.git@4e2277a060da264c4f21b364c867cc622c945874

新建文件夹，填入以下内容，

from lmdeploy.vl import load_image
from lmdeploy import pipeline, TurbomindEngineConfig


backend_config = TurbomindEngineConfig(session_len=8192) # 图片分辨率较高时请调高session_len
# pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', backend_config=backend_config) 非开发机运行此命令
pipe = pipeline('/share/new_models/liuhaotian/llava-v1.6-vicuna-7b', backend_config=backend_config)

image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)

保存运行得输出结果。

用Gradio来运行llava模型。新建python文件，

import gradio as gr
from lmdeploy import pipeline, TurbomindEngineConfig


backend_config = TurbomindEngineConfig(session_len=8192) # 图片分辨率较高时请调高session_len
# pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', backend_config=backend_config) 非开发机运行此命令
pipe = pipeline('/share/new_models/liuhaotian/llava-v1.6-vicuna-7b', backend_config=backend_config)

def model(image, text):
    if image is None:
        return [(text, "请上传一张图片。")]
    else:
        response = pipe((text, image)).text
        return [(text, response)]

demo = gr.Interface(fn=model, inputs=[gr.Image(type="pil"), gr.Textbox()], outputs=gr.Chatbot())
demo.launch()

然后运行，进入网址，就可以用了。

参考资料

课程视频：https://www.bilibili.com/video/BV1tr421x75B/

课程文档：https://github.com/InternLM/Tutorial/blob/camp2/lmdeploy/README.md

墓袖远笺

关注

3
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
LMDeploy量化部署LLM&VLM实战（作业）

课程文档：https://github.com/InternLM/Tutorial/blob/camp2/lmdeploy/README.md。先使用Transformer库运行模型直接运行InternLM2-Chat-1.8B模型，下载模型，如果是在InternStudio开发机上，可以由开发机的共享目录。新建Python源代码文件，打开，然后填入以下内容，运行后，可以通过命令行窗口直接与模型对话，运行后，可以通过命令行窗口直接与模型对话。然后运行，进入网址，就可以用了。新建文件夹，填入以下内容，
复制链接

扫一扫