vllm安装踩坑

爱可乐的松鼠

已于 2024-03-13 16:00:18 修改

阅读量1w

点赞数 5

文章标签： llama

于 2024-03-11 11:20:16 首次发布

本文链接：https://blog.csdn.net/qq_33424313/article/details/136580178

版权

本文详细介绍了在CUDA11.7环境下安装VLLM项目，包括从源代码克隆、配置requirements.txt，以及可能遇到的安装问题和解决方案，如Flash-Attention安装错误和Triton与PyTorch版本冲突处理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

环境

cuda 11.7
pytorch 2.0.1

安装

下载vllm源码

git clone https://github.com/vllm-project/vllm.git
pip3 install -e . # cuda12的话直接安装就可以，cuda11.7按照如下的步骤

修改配置

requirements.txt

ninja  # For faster builds.
psutil
ray >= 2.9
sentencepiece  # Required for LLaMA tokenizer.
numpy
torch == 2.0.1
transformers >= 4.38.0  # Required for Gemma.
# xformers == 0.0.23.post1  # Required for CUDA 12.1.
xformers == 0.0.22
fastapi
uvicorn[standard]
pydantic >= 2.0  # Required for OpenAI server.
prometheus_client >= 0.18.0
pynvml == 11.5.0
triton == 2.0.0
outlines
# cupy-cuda12x == 12.1.0  # Required for CUDA graphs. CUDA 11.8 users should install cupy-cuda11x instead.
cupy-cuda11x

requirements-build.txt

# Should be mirrored in pyproject.toml
ninja
packaging
setuptools>=49.4.0
torch==2.0.1
wheel

pyproject.toml

# Should be mirrored in requirements-build.txt
requires = [
    "ninja",
    "packaging",
    "setuptools >= 49.4.0",
    "torch == 2.0.1", # 原来是2.1.0
    "wheel",
]