安装 vLLM
注意:vLLM 只支持 linux 或 wsl
执行 $ pip install vllm
安装 vllm 可能遇到下面报错。
-
AssertionError: vLLM only supports Linux platform (including WSL).
vLLM 不能直接跑在 windows 上,OMG.
进入 wsl(如 ubuntu 22.04,python 3.10) 执行:
-
安装 cuda
# apt-get install nvidia-cuda-toolkit
# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0
-
安装 vllm
# pip install vllm
报错:ERROR: Could not find a version that satisfies the requirement vllm (from versions: none) ERROR: No matching distribution found for vllm
升级 pip:
# python3 -m pip install --upgrade pip
再尝试:
# pip install vllm
... Downloading vllm-0.4.3-cp310-cp310-manylinux1_x86_64.whl (131.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 131.1/131.1 MB 640.5 kB/s eta 0:00:00 Downloading lm_format_enforcer-0.10.1-py3-none-any.whl (42 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.9/42.9 kB 569.5 kB/s eta 0:00:00 Downloading outlines-0.0.34-py3-none-any.whl (76 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.5/76.5 kB 613.2 kB/s eta 0:00:00 Downloading torch-2.3.0-cp310-cp310-manylinux1_x86_64.whl (779.1 MB) ━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━ 391.0/779.1 MB 3 65.6 kB/s eta 0:17:42 ...
-
Downloading openai-1.31.1-py3-none-any.whl 遇到 pip._vendor.urllib3.exceptions.ReadTimeoutError
# pip install urllib3==1.25.11
手动安装 openai
# pip install openai
更新回最新 urllib3,否则其他依赖也会下载超时
# pip install --upgrade urllib3
-
vLLM 安装成功
再尝试:
# pip install vllm
Installing collected packages: sentencepiece, py-cpuinfo, nvidia-ml-py, ninja, mpmath, websockets, uvloop, uvicorn, ujson, sympy, shellingham, safetensors, rpds-py, regex, python-multipart, python-dotenv, pygments, psutil, protobuf, prometheus-client, packaging, orjson, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, nest-asyncio, multidict, msgpack, mdurl, llvmlite, lark, joblib, interegular, httptools, fsspec, frozenlist, filelock, dnspython, diskcache, cmake, cloudpickle, charset-normalizer, attrs, async-timeout, yarl, watchfiles, triton, starlette, scipy, requests, referencing, nvidia-cusparse-cu12, nvidia-cudnn-cu12, numba, markdown-it-py, email_validator, aiosignal, tiktoken, rich, ray, prometheus-fastapi-instrumentator, nvidia-cusolver-cu12, lm-format-enforcer, huggingface-hub, aiohttp, typer, torch, tokenizers, xformers, vllm-flash-attn, transformers, fastapi-cli, outlines, fastapi, vllm Successfully installed aiohttp-3.9.5 aiosignal-1.3.1 async-timeout-4.0.3 attrs-23.2.0 charset-normalizer-3.3.2 cloudpickle-3.0.0 cmake-3.29.3 diskcache-5.6.3 dnspython-2.6.1 email_validator-2.1.1 fastapi-0.111.0 fastapi-cli-0.0.4 filelock-3.14.0 frozenlist-1.4.1 fsspec-2024.6.0 httptools-0.6.1 huggingface-hub-0.23.3 interegular-0.3.3 joblib-1.4.2 lark-1.1.9 llvmlite-0.42.0 lm-format-enforcer-0.10.1 markdown-it-py-3.0.0 mdurl-0.1.2 mpmath-1.3.0 msgpack-1.0.8 multidict-6.0.5 nest-asyncio-1.6.0 networkx-3.3 ninja-1.11.1.1 numba-0.59.1 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-ml-py-12.555.43 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.40 nvidia-nvtx-cu12-12.1.105 orjson-3.10.3 outlines-0.0.34 packaging-24.0 prometheus-client-0.20.0 prometheus-fastapi-instrumentator-7.0.0 protobuf-5.27.0 psutil-5.9.8 py-cpuinfo-9.0.0 pygments-2.18.0 python-dotenv-1.0.1 python-multipart-0.0.9 ray-2.23.0 referencing-0.35.1 regex-2024.5.15 requests-2.32.3 rich-13.7.1 rpds-py-0.18.1 safetensors-0.4.3 scipy-1.13.1 sentencepiece-0.2.0 shellingham-1.5.4 starlette-0.37.2 sympy-1.12.1 tiktoken-0.7.0 tokenizers-0.19.1 torch-2.3.0 transformers-4.41.2 triton-2.3.0 typer-0.12.3 ujson-5.10.0 uvicorn-0.30.1 uvloop-0.19.0 vllm-0.4.3 vllm-flash-attn-2.5.8.post2 watchfiles-0.22.0 websockets-12.0 xformers-0.0.26.post1 yarl-1.9.4
查看 vLLM 版本
# pip list | grep vllm
vllm 0.4.3 vllm-flash-attn 2.5.8.post2
-
windows 上找不到 torch\lib\shm.dll
OSError: [WinError 126] 找不到指定的模块。 Error loading "AppData\Local\Temp\pip-build-env-aq9ya1hl\overlay\Lib\site-packages\torch\lib\shm.dll" or one of its dependencies.
执行
$ nvcc --version
查看本机 cuda 版本:nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0
根据 https://pytorch.org/get-started/locally/ 的提示,由于在当前时间(2024-06-05)我的 cuda 12.4 版本过新,需要安装 Preview(Nightly) 版本的 PyTorch:
$ pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
Looking in indexes: https://download.pytorch.org/whl/nightly/cu124 Collecting torch Downloading https://download.pytorch.org/whl/nightly/cu124/torch-2.4.0.dev20240605%2Bcu124-cp310-cp310-win_amd64.whl (2543.7 MB) ━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/2.5 GB 2.0 MB/s eta 0:13:16 ... Installing collected packages: tbb, mpmath, intel-openmp, sympy, networkx, mkl, torch, torchvision, torchaudio Successfully installed intel-openmp-2021.4.0 mkl-2021.4.0 mpmath-1.2.1 networkx-3.2.1 sympy-1.12 tbb-2021.11.0 torch-2.4.0.dev20240605+cu124 torchaudio-2.2.0.dev20240605+cu124 torchvision-0.19.0.dev20240605+cu124
下载 GLM-4-9B 模型
使用 huggingface-cli 工具,可以很方便地从 https://hf-mirror.com 下载 THUDM/glm-4-9b
模型到本地 glm-4-9b/ 目录下:
$ huggingface-cli download --resume-download THUDM/glm-4-9b
--local-dir glm-4-9b/
使用 vLLM 运行 GLM-4-9B
从 https://github.com/THUDM/GLM-4 下载文档和 basic_demo 里的 python 脚本,运行:
$ python3 basic_demo/vllm_cli_demo.py