Windows10部署MiniCPM

最新推荐文章于 2024-09-06 10:58:14 发布

老干部rye

最新推荐文章于 2024-09-06 10:58:14 发布

阅读量1.3k

点赞数 29

文章标签：语言模型 ai

本文链接：https://blog.csdn.net/weixin_43881345/article/details/139458794

版权

1、系统配置

在这里插入图片描述

2、配置环境

2.1、创建虚拟环境

安装Anaconda，参考https://blog.csdn.net/weixin_43881345/article/details/136051556
创建一个基于python3.10的虚拟环境

conda create -n minicpm python=3.10

在这里插入图片描述

创建虚拟环境报错，根据提示删除环境变量Path中多余的引号

在这里插入图片描述

重新创建python3.10的虚拟环境，取名minicpm，并激活环境安装依赖

conda create -n minicpm python=3.10
activate minicpm
pip install Pillow==10.1.0
pip install bitsandbytes==0.43.1

2.2、下载模型

huggingface镜像站：https://hf-mirror.com/
魔塔：https://modelscope.cn/

MiniCPM-2B-sft-int4：https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-int4/files
MiniCPM-2B-128k：https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/files
MiniCPM-Llama3-V-2_5（8B）：https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5/files
MiniCPM-Llama3-V-2_5-int4（8B）：https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-int4/files
MiniCPM-Llama3-V-2_5-gguf：https://modelscope.cn/models/OpenBMB/MiniCPM-Llama3-V-2_5-gguf/files

通过git方式下载

git clone https://www.modelscope.cn/OpenBMB/MiniCPM-2B-sft-int4.git
git clone https://www.modelscope.cn/openbmb/MiniCPM-2B-128k.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5-int4.git
git clone https://www.modelscope.cn/OpenBMB/MiniCPM-Llama3-V-2_5-gguf.git

2.3、下载仓库源码

https://github.com/OpenBMB/MiniCPM-V
https://github.com/OpenBMB/MiniCPM

2.4、Anaconda虚拟环境安装依赖

cd MiniCPM-V
pip install -r requirements.txt

在这里插入图片描述

2.5、配置torch

2.5.1、查看cuda版本

#hello_world.py
import torch

# 输出带CPU，表示torch是CPU版本的，否则会是+cuxxx
print(f'torch的版本是：{torch.__version__}')
print(f'torch是否能使用cuda：{torch.cuda.is_available()}')
print(f'GPU数量：{torch.cuda.device_count()}')        # 查看GPU数量
print(f'torch方法查看CUDA版本：{torch.version.cuda}')  # torch方法查看CUDA版本

输出结果说明torch无法使用GPU（如支持GPU略过2.5、配置torch）
在这里插入图片描述
查看cuda驱动版本（安装显卡驱动后生成），本机12.4

nvidia-smi

在这里插入图片描述
查看cuda运行时版本（安装CUDA Tookit后生成），本机11.7

nvcc -V

在这里插入图片描述

2.5.2、安装支持cuda的torch

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

在这里插入图片描述
重新执行hello_world.py脚本，cuda已支持GPU

3、搭建MiniCPM-Llama3-V-2_5

3.1、本地WebUI Demo部署

配置模型路径
修改MiniCPM-V-main/web_demo_2.5.py，model_path 改为本地模型路径
显存较低，此处使用int4量化模型
model_path = ‘D:/model/MiniCPM-Llama3-V-2_5-int4’
在这里插入图片描述

修改完成后，启动WebUI服务

# 对于 NVIDIA GPU，请运行：
python web_demo_2.5.py --device cuda

# 对于搭载 MPS 的 Mac（Apple 芯片或 AMD GPU），请运行：
PYTORCH_ENABLE_MPS_FALLBACK=1 python web_demo_2.5.py --device mps

在这里插入图片描述

浏览器访问http://IP:8080
在这里插入图片描述

耗时约19.5s

3.2、API调用

创建脚本api_cpm_v2.5.py

#api_cpm_v2.5.py
import torch
import time
from PIL import Image
from transformers import AutoModel, AutoTokenizer

# 设置模型路径
model_path = 'D:/model/MiniCPM-Llama3-V-2_5-int4'

# 加载模型和分词器
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()

# 设置随机种子以保证结果可复现
torch.manual_seed(0)

start_time = time.time()

image = Image.open('D:/work/svc_mini_cpm/MiniCPM-V-main/assets/airplane.jpeg').convert('RGB')
question = '图片描述了什么?'
msgs = [{'role': 'user', 'content': question}]

res = model.chat(
    image=image,
    msgs=msgs,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7
)
end_time = time.time()
execution_time = end_time - start_time
print(res)
print("程序执行时间为：", execution_time, "秒")

成功获取结果
在这里插入图片描述

在这里插入图片描述

4、搭建MiniCPM-2B

4.1、本地WebUI Demo部署（失败）

启动WebUI服务

cd MiniCPM-main
python demo/vllm_based_demo.py --model_path D:/model/MiniCPM-2B-128k

在这里插入图片描述
安装vLLM，下载https://github.com/vllm-project/vllm

cd vllm-main
pip install -r requirements-cuda.txt

在这里插入图片描述

在这里插入图片描述
发生错误error: subprocess-exited-with-error

pip install --upgrade setuptools

4.2、API调用（失败）

创建脚本api_cpm.py

#api_cpm.py
from transformers import AutoModelForCausalLM, AutoTokenizer
import time
import torch

# 设置随机种子以保证结果可复现
torch.manual_seed(0)

path = 'D:/model/MiniCPM-2B-dpo-bf16'
# 加载tokenizer和模型
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

start_time = time.time()
responds, history = model.chat(tokenizer, "山东省最高的山是哪座山，它比黄山高还是矮？差距多少？", temperature=0.5, top_p=0.8, repetition_penalty=1.02)
end_time = time.time()
execution_time = end_time - start_time
print(responds)
print("程序执行时间为：", execution_time, "秒")

成功获得输出结果
在这里插入图片描述

当模型切换为MiniCPM-2B-sft-int4时，执行报错
在这里插入图片描述
手动安装 flash_attn，下载win编译好的flash attention，选择本机torch匹配的版本（cu121）
https://github.com/bdashore3/flash-attention/releases/tag/v2.4.1
直达下载链接：https://github.com/bdashore3/flash-attention/releases/download/v2.4.1/flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl
在这里插入图片描述

pip install D:/work/svc_mini_cpm/flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl

5、通过ollama加载模型

5.1、安装ollama

https://github.com/ollama/ollama
在这里插入图片描述
安装后执行C:\Users\用户\AppData\Local\Programs\Ollama\ollama app.exe，首次执行需下载模型

5.2、加载官方模型

官方模型库：https://ollama.com/library

ollama run modelbest/minicpm-2b-dpo

在这里插入图片描述
API调用
http://localhost:11434/api/generate
“stream”: false关闭流式输出

5.3、加载本地模型

手动创建模型配置文件
在这里插入图片描述
编辑文件内容

FROM ./MiniCPM-2B-dpo-Q4_0.gguf

在这里插入图片描述
通过PowerShell导入配置

ollama create MiniCPM-2B-dpo-Q4_0 -f MiniCPM-2B-dpo-Q4_0.mb

导入成功，运行模型
在这里插入图片描述

ollama run MiniCPM-2B-dpo-Q4_0

6、优化推理效率（失败）

下载w64devkit
https://github.com/skeeto/w64devkit/releases
在这里插入图片描述
运行w64devkit.exe，查看cmake、gcc版本

下载llama.cpp
https://github.com/ggerganov/llama.cpp

使用w64devkit.exe编译llama.cpp

cd d:
cd Download/models/llama.cpp-master
make

在这里插入图片描述
下载编译好的llama.cpp
https://github.com/ggerganov/llama.cpp/releases

老干部rye

关注

29
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫