一、环境说明
- NVIDIA A40
- cuda12.1
- vllm0.4.2
- torch2.3.0
二、环境准备
PS:参考强哥踩坑记录 选择了上面的环境
GPTs-0064-部署 DeepSeek-V2-Lite-Chat
魔搭下载模型
pip install modelscope
from modelscope import snapshot_download
model_dir = snapshot_download('deepseek-ai/DeepSeek-V2-Lite-Chat',cache_dir='/root/autodl-tmp')
vllm分支
# 此仓库为DeepSeek基于vllm分支得来
git clone https://github.com/zwd003/vllm.git
cd vllm
pip install -e.
flash-attention
参考 :https://github.com/Dao-AILab/flash-attention
pip install flash-attn --no-build-isolation
pip list检查
Package Version Editable project location
--------------------------------- --------------- -------------------------
absl-py 2.0.0
addict 2.4.0
aiohttp 3.9.5
aiosignal 1.3.1
aliyun-python-sdk-core 2.15.1
aliyun-python-sdk-kms 2.16.3
annotated-types 0.7.0
anyio 4.2.0
argon2-cffi 23.1.0
argon2-cffi-bindings 21.2.0
arrow 1.3.0
asttokens 2.4.1
async-lru 2.0.4
async-timeout 4.0.3
attrs 23.2.0
Babel 2.14.0
beautifulsoup4 4.12.2
bleach 6.1.0
brotlipy 0.7.0
cachetools 5.3.2
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 3.0.0
cmake 3.29.3
comm 0.2.1
conda 22.11.1
conda-content-trust 0.1.3
conda-package-handling 1.9.0
contourpy 1.2.0
crcmod 1.7
cryptography 38.0.1
cycler 0.12.1
datasets 2.18.0
debugpy 1.8.0
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.8
diskcache 5.6.3
distro 1.9.0
dnspython 2.6.1
einops 0.8.0
email_validator 2.1.1
exceptiongroup 1.2.0
executing 2.0.1
fastapi 0.111.0
fastapi-cli 0.0.4
fastjsonschema 2.19.1
filelock 3.13.1
fonttools 4.47.0
fqdn 1.5.1
frozenlist 1.4.1
fsspec 2023.12.2
gast 0.5.4
google-auth 2.26.1
google-auth-oauthlib 1.2.0
grpcio 1.60.0
h11 0.14.0
httpcore 1.0.5
httptools 0.6.1
httpx 0.27.0
huggingface-hub 0.23.2
idna 3.4
importlib_metadata 7.1.0
interegular 0.3.3
ipykernel 6.28.0
ipython 8.20.0
ipywidgets 8.1.1
isoduration 20.11.0
jedi 0.19.1
Jinja2 3.1.2
jmespath 0.10.0
joblib 1.4.2
json5 0.9.14
jsonpointer 2.4
jsonschema 4.20.0
jsonschema-specifications 2023.12.1
jupyter_client 8.6.0
jupyter_core 5.7.1
jupyter-events 0.9.0
jupyter-lsp 2.2.1
jupyter_server 2.12.2
jupyter_server_terminals 0.5.1
jupyterlab 4.0.10
jupyterlab-language-pack-zh-CN 4.0.post6
jupyterlab_pygments 0.3.0
jupyterlab_server 2.25.2
jupyterlab-widgets 3.0.9
kiwisolver 1.4.5
lark 1.1.9
llvmlite 0.42.0
lm-format-enforcer 0.10.1
Markdown 3.5.1
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.8.2
matplotlib-inline 0.1.6
mdurl 0.1.2
mistune 3.0.2
modelscope 1.14.0
mpmath 1.3.0
msgpack 1.0.8
multidict 6.0.5
multiprocess 0.70.16
nbclient 0.9.0
nbconvert 7.14.0
nbformat 5.9.2
nest-asyncio 1.5.8
networkx 3.2.1
ninja 1.11.1.1
notebook_shim 0.2.3
numba 0.59.1
numpy 1.26.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 12.535.161
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
nvitop 1.3.2
oauthlib 3.2.2
openai 1.30.3
orjson 3.10.3
oss2 2.18.5
outlines 0.0.34
overrides 7.4.0
packaging 23.2
pandas 2.2.2
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.9.0
pillow 10.2.0
pip 22.3.1
platformdirs 4.1.0
pluggy 1.0.0
prometheus-client 0.19.0
prometheus-fastapi-instrumentator 7.0.0
prompt-toolkit 3.0.43
protobuf 4.23.4
psutil 5.9.7
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pyasn1 0.5.1
pyasn1-modules 0.3.0
pycosat 0.6.4
pycparser 2.21
pycryptodome 3.20.0
pydantic 2.7.1
pydantic_core 2.18.2
Pygments 2.17.2
pyOpenSSL 22.0.0
pyparsing 3.1.1
PySocks 1.7.1
python-dateutil 2.8.2
python-dotenv 1.0.1
python-json-logger 2.0.7
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.1
pyzmq 25.1.2
ray 2.23.0
referencing 0.32.1
regex 2024.5.15
requests 2.31.0
requests-oauthlib 1.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.7.1
rpds-py 0.16.2
rsa 4.9
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
safetensors 0.4.3
scipy 1.13.1
Send2Trash 1.8.2
sentencepiece 0.2.0
setuptools 65.5.0
shellingham 1.5.4
simplejson 3.19.2
six 1.16.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.5
stack-data 0.6.3
starlette 0.37.2
supervisor 4.2.5
sympy 1.12
tensorboard 2.15.1
tensorboard-data-server 0.7.2
termcolor 2.4.0
terminado 0.18.0
tiktoken 0.6.0
tinycss2 1.2.1
tokenizers 0.19.1
tomli 2.0.1
toolz 0.12.0
torch 2.3.0
torchaudio 2.1.2+cu121
torchvision 0.16.2+cu121
tornado 6.4
tqdm 4.64.1
traitlets 5.14.1
transformers 4.41.1
triton 2.3.0
typer 0.12.3
types-python-dateutil 2.8.19.20240106
typing_extensions 4.9.0
tzdata 2024.1
ujson 5.10.0
uri-template 1.3.0
urllib3 1.26.13
uvicorn 0.29.0
uvloop 0.19.0
vllm 0.4.2 /root/autodl-tmp/vllm
vllm-flash-attn 2.5.8.post1
vllm-nccl-cu12 2.18.1.0.4.0
watchfiles 0.22.0
wcwidth 0.2.13
webcolors 1.13
webencodings 0.5.1
websocket-client 1.7.0
websockets 12.0
Werkzeug 3.0.1
wheel 0.37.1
widgetsnbextension 4.0.9
xformers 0.0.26.post1
xxhash 3.4.1
yapf 0.40.2
yarl 1.9.4
zipp 3.19.0
启动脚本
代码如下(示例):
python -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 6006 --model /root/autodl-tmp/deepseek-ai/DeepSeek-V2-Lite-Chat --trust-remote-code --served-model-name DeepSeek-V2-Lite-Chat --max-model-len 8000
curl测试
curl --location --request POST 'http://0.0.0.0:6006/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
"model": "DeepSeek-V2-Lite-Chat",
"stream":"true",
"messages": [
{ "role": "user", "content": "续写以’打工人不要抱怨’为题的废话文学: 打工人不要抱怨; 虽然我们收入不高,但是工作量不低啊; 虽然我们工作时间长,但是退休时间晚啊;" }
]
}'
3.可能遇到的问题
NameError: name ‘vllm_ops’ is not defined
参考:https://github.com/vllm-project/vllm/issues/1814
解决方式:删除vllm文件夹 pip install vllm 重新安装(会跳过已安装的部分)