Qwen2.5-7B-Instruct 私有化部署

最新推荐文章于 2025-04-17 17:08:11 发布

llc的足迹

最新推荐文章于 2025-04-17 17:08:11 发布

阅读量422

点赞数 5

分类专栏：大模型文章标签：大模型

本文链接：https://blog.csdn.net/this_is_id/article/details/145487407

版权

大模型专栏收录该内容

2 篇文章

订阅专栏

环境

GPU：NVIDIA GeForce RTX 3090
内存：128GB

模型下载

https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct/files

显存比较大可以下载deepseek-70B：https://www.modelscope.cn/models/mlx-community/DeepSeek-R1-Distill-Llama-70B-4bit/files
在这里插入图片描述
我这里采用命令行下载方式：

pip install modelscope
modelscope download --model Qwen/Qwen2.5-7B-Instruct

等待下载完成

在这里插入图片描述

docker-compose部署


version: "3"
  
services:
  vllm:
    container_name: qwen
    restart: always
    image: vllm/vllm-openai:v0.6.6
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    volumes:
      - ./Qwen2___5-7B-Instruct:/models
    command: ["--model", "/models","--served-model-name", "qwen2.5-7b-instruct",  "--gpu-memory-utilization", "0.90"]
    ports:
      - 2025:8000

API 调用

import openai

client = openai.OpenAI(
    base_url='http://192.168.2.25:2025/v1',
    api_key='NOT_NEED'

)

text = """I have an apple."""

prompt = f"""请帮我翻译：
    {text}"""

predict_ret = client.chat.completions.create(
    # 此处名称要和vllm中的served-model-name一致
    model='qwen2.5-7b-instruct',
    messages=[
        {'role': 'user', 'content': prompt}
    ]
)
print(
    predict_ret.choices[0].message.content
)