上一篇文章记录了本地部署Qwen2大模型之一:Ollama方式部署的过程,本文继续详细记录通过vLLM在本地部署该大模型的过程。
安装vLLM
按照 参考文章:https://blog.csdn.net/sexy19910923/article/details/140164232 中的说明,首先运行pip指令安装vLLM:
$ pip install vllm
但是指令执行非常慢,等不了,问豆包,怎么才能加快pip执行?回答说用国内镜像源。果断中止安装过程,添加国内pypi镜像源。可以直接在命令行中临时指定,也可以添加配置文件来永久指定。永久指定的方式是在当前用户主目录下创建~/.pip/pip.conf文件(如果目录不存在就先创建),在文件中输入以下内容后保存:
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
添加国内镜像源后再执行安装指令,果然速度飞快。
$ pip install -i https://pypi.tuna.tsinghua.edu.cn/simple vllm
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting vllm
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/c8/f4/e108a902ccad131d8978a9376343a6e95d78d0e12f152a796794647073ec/vllm-0.6.5-cp38-abi3-manylinux1_x86_64.whl (201.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 201.1/201.1 MB 9.2 MB/s eta 0:00:00
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from vllm) (5.4.1)
Requirement already satisfied: psutil in /usr/lib/python3/dist-packages (from vllm) (5.9.0)
Collecting tokenizers>=0.19.1
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/22/06/69d7ce374747edaf1695a4f61b83570d91cc8bbfc51ccfecf76f56ab4aac/tokenizers-0.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 21.8 MB/s eta 0:00:00
Collecting torchvision==0.20.1
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a2/f6/7ff89a9f8703f623f5664afd66c8600e3f09fe188e1e0b7e6f9a8617f865/torchvision-0.20.1-cp310-cp310-manylinux1_x86_64.whl (7.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 20.8 MB/s eta 0:00:00
Collecting gguf==0.10.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/1b/e4/c5f9bd71840ae9afb7e2b7c285ba209f2ef5e9cd83885f8c596c551d3026/gguf-0.10.0-py3-none-any.whl (71 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.6/71.6 KB 6.3 MB/s eta 0:00:00
Collecting einops
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/44/5a/f0b9ad6c0a9017e62d4735daaeb11ba3b6c009d69a26141b258cd37b5588/einops-0.8.0-py3-none-any.whl (43 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.2/43.2 KB 6.6 MB/s eta 0:00:00
Collecting openai>=1.45.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8e/5a/d22cd07f1a99b9e8b3c92ee0c1959188db4318828a3d88c9daac120bdd69/openai-1.58.1-py3-none-any.whl (454 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 454.3/454.3 KB 12.4 MB/s eta 0:00:00
Collecting partial-json-parser
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/9c/93/791cbea58b8dc27dc76621438d7b030741a4ad3bb5c222363dd01057175a/partial_json_parser-0.2.1.1.post4-py3-none-any.whl (9.9 kB)
Requirement already satisfied: pillow in /usr/lib/python3/dist-packages (from vllm) (9.0.1)
Collecting tiktoken>=0.6.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2e/28/cf3633018cbcc6deb7805b700ccd6085c9a5a7f72b38974ee0bffd56d311/tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 14.1 MB/s eta 0:00:00
Collecting torch==2.5.1
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2a/ef/834af4a885b31a0b32fff2d80e1e40f771e1566ea8ded55347502440786a/torch-2.5.1-cp310-cp310-manylinux1_x86_64.whl (906.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 906.4/906.4 MB 3.3 MB/s eta 0:00:00
Collecting uvicorn[standard]
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/61/14/33a3a1352cfa71812a3a21e8c9bfb83f60b0011f5e36f2b1399d51928209/uvicorn-0.34.0-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.3/62.3 KB 8.4 MB/s eta 0:00:00
Collecting prometheus_client>=0.18.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/ff/c2/ab7d37426c179ceb9aeb109a85cda8948bb269b7561a0be870cc656eefe4/prometheus_client-0.21.1-py3-none-any.whl (54 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.7/54.7 KB 5.3 MB/s eta 0:00:00
Collecting fastapi!=0.113.*,!=0.114.0,>=0.107.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/52/b3/7e4df40e585df024fac2f80d1a2d579c854ac37109675db2b0cc22c0bb9e/fastapi-0.115.6-py3-none-any.whl (94 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.8/94.8 KB 6.7 MB/s eta 0:00:00
Collecting requests>=2.26.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f9/9b/335f9764261e915ed497fcdeb11df5dfd6f7bf257d4a6a2a686d80da4d54/requests-2.32.3-py3-none-any.whl (64 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.9/64.9 KB 10.4 MB/s eta 0:00:00
Requirement already satisfied: protobuf in /usr/lib/python3/dist-packages (from vllm) (3.12.4)
Collecting sentencepiece
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/a6/27/33019685023221ca8ed98e8ceb7ae5e166032686fa3662c68f1f1edf334e/sentencepiece-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 23.7 MB/s eta 0:00:00
Collecting aiohttp
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/2f/cc/3a3fc7a290eabc59839a7e15289cd48f33dd9337d06e301064e1e7fb26c5/aiohttp-3.11.11-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 13.7 MB/s eta 0:00:00
Collecting pydantic>=2.9
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/f3/26/3e1bbe954fde7ee22a6e7d31582c642aad9e84ffe4b5fb61e63b87cd326f/pydantic-2.10.4-py3-none-any.whl (431 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 431.8/431.8 KB 13.1 MB/s eta 0:00:00
Collecting compressed-tensors==0.8.1
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/8a/b2/fbac759d64f52d11ab1a142b410cab732e933074d1e33b0249d59f72addf/compressed_tensors-0.8.1-py3-none-any.whl (87 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87.5/87.5 KB 13.3 MB/s eta 0:00:00
Collecting mistral_common[opencv]>=1.5.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/be/48/b2acdd507d13a3fd926934eaed0a0afcbcd7e85e90639c91242e7423f3c4/mistral_common-1.5.1-py3-none-any.whl (6.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 20.1 MB/s eta 0:00:00
。。。。。。
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/02/cc/b7e31358aac6ed1ef2bb790a9746ac2c69bcb3c8588b41616914eb106eaf/exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting httpcore==1.*
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/87/f5/72347bc88306acb359581ac4d52f23c0ef445b57157adedb9aee0cd689d2/httpcore-1.0.7-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.6/78.6 KB 12.4 MB/s eta 0:00:00
Collecting attrs>=17.3.0
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/89/aa/ab0f7891a01eeb2d2e338ae8fecbe57fcebea1a24dbb64d45801bfab481d/attrs-24.3.0-py3-none-any.whl (63 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.4/63.4 KB 4.2 MB/s eta 0:00:00
Collecting rpds-py>=0.7.1
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/4d/cf/96f1fd75512a017f8e07408b6d5dbeb492d9ed46bfe0555544294f3681b3/rpds_py-0.22.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (381 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 382.0/382.0 KB 15.4 MB/s eta 0:00:00
Collecting jsonschema-specifications>=2023.03.6
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/d1/0f/8910b19ac0670a0f80ce1008e5e751c4a57e14d2c4c13a482aa6079fa9d6/jsonschema_specifications-2024.10.1-py3-none-any.whl (18 kB)
Installing collected packages: sentencepiece, py-cpuinfo, nvidia-ml-py, mpmath, blake3, websockets, uvloop, typing_extensions, tqdm, sympy, sniffio, safetensors, rpds-py, regex, pyzmq, python-dotenv, pycountry, pybind11, protobuf, propcache, prometheus_client, pillow, partial-json-parser, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, nest_asyncio, msgspec, msgpack, jiter, interegular, httptools, h11, fsspec, frozenlist, filelock, exceptiongroup, einops, diskcache, dill, cloudpickle, charset-normalizer, attrs, async-timeout, astor, annotated-types, airportsdata, aiohappyeyeballs, uvicorn, triton, requests, referencing, pydantic-core, opencv-python-headless, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multidict, httpcore, gguf, depyf, anyio, aiosignal, yarl, watchfiles, tiktoken, starlette, pydantic, nvidia-cusolver-cu12, jsonschema-specifications, huggingface-hub, httpx, torch, tokenizers, prometheus-fastapi-instrumentator, openai, lm-format-enforcer, jsonschema, fastapi, aiohttp, xformers, transformers, torchvision, ray, outlines_core, mistral_common, xgrammar, outlines, compressed-tensors, vllm
Successfully installed aiohappyeyeballs-2.4.4 aiohttp-3.11.11 aiosignal-1.3.2 airportsdata-20241001 annotated-types-0.7.0 anyio-4.7.0 astor-0.8.1 async-timeout-5.0.1 attrs-24.3.0 blake3-1.0.0 charset-normalizer-3.4.0 cloudpickle-3.1.0 compressed-tensors-0.8.1 depyf-0.18.0 dill-0.3.9 diskcache-5.6.3 einops-0.8.0 exceptiongroup-1.2.2 fastapi-0.115.6 filelock-3.16.1 frozenlist-1.5.0 fsspec-2024.12.0 gguf-0.10.0 h11-0.14.0 httpcore-1.0.7 httptools-0.6.4 httpx-0.28.1 huggingface-hub-0.27.0 interegular-0.3.3 jiter-0.8.2 jsonschema-4.23.0 jsonschema-specifications-2024.10.1 lm-format-enforcer-0.10.9 mistral_common-1.5.1 mpmath-1.3.0 msgpack-1.1.0 msgspec-0.18.6 multidict-6.1.0 nest_asyncio-1.6.0 networkx-3.4.2 numpy-1.26.4 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-ml-py-12.560.30 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.4.127 openai-1.58.1 opencv-python-headless-4.10.0.84 outlines-0.1.11 outlines_core-0.1.26 partial-json-parser-0.2.1.1.post4 pillow-10.4.0 prometheus-fastapi-instrumentator-7.0.0 prometheus_client-0.21.1 propcache-0.2.1 protobuf-5.29.2 py-cpuinfo-9.0.0 pybind11-2.13.6 pycountry-24.6.1 pydantic-2.10.4 pydantic-core-2.27.2 python-dotenv-1.0.1 pyzmq-26.2.0 ray-2.40.0 referencing-0.35.1 regex-2024.11.6 requests-2.32.3 rpds-py-0.22.3 safetensors-0.4.5 sentencepiece-0.2.0 sniffio-1.3.1 starlette-0.41.3 sympy-1.13.1 tiktoken-0.7.0 tokenizers-0.21.0 torch-2.5.1 torchvision-0.20.1 tqdm-4.67.1 transformers-4.47.1 triton-3.1.0 typing_extensions-4.12.2 uvicorn-0.34.0 uvloop-0.21.0 vllm-0.6.5 watchfiles-1.0.3 websockets-14.1 xformers-0.0.28.post3 xgrammar-0.1.7 yarl-1.18.3
要注意的是,参考文章中的安装指令是pip install vLLM>=0.4.0,实际执行这个指令时会卡住不动,我到https://pypi.org/search/上找了,vLLM的最新版本是0.6.5,2024年12月18日刚更新,实际安装时不指定版本号默认就是最新的版本(在这里可以直接下载各版本的源码包,后面会用到)。
另外,注意到安装过程中显示了类似下面的告警:
WARNING: The script cpuinfo is installed in '/home/zhangsan/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
因为在pip install指令前面没有加sudo,是以普通用户身份安装,安装的目标位置默认为当前用户主目录下的.local/bin子目录。安装完成后,可以看到该子目录下已安装的所有程序文件:
$ ll ~/.local/bin/
总计 144
drwxrwxr-x 2 allen allen 4096 12月 21 01:12 ./
drwx------ 5 allen allen 4096 12月 21 01:11 ../
-rwxrwxr-x 1 allen allen 252 12月 21 01:12 convert-caffe2-to-onnx*
-rwxrwxr-x 1 allen allen 252 12月 21 01:12 convert-onnx-to-caffe2*
-rwxrwxr-x 1 allen allen 206 12月 21 01:11 cpuinfo*
-rwxrwxr-x 1 allen allen 212 12月 21 01:11 dotenv*
-rwxrwxr-x 1 allen allen 216 12月 21 01:11 f2py*
-rwxrwxr-x 1 allen allen 210 12月 21 01:12 fastapi*
-rwxrwxr-x 1 allen allen 2457 12月 21 01:11 get_gprof*
-rwxrwxr-x 1 allen allen 1651 12月 21 01:11 get_objgraph*
-rwxrwxr-x 1 allen allen 258 12月 21 01:11 gguf-convert-endian*
-rwxrwxr-x 1 allen allen 238 12月 21 01:11 gguf-dump*
-rwxrwxr-x 1 allen allen 254 12月 21 01:11 gguf-new-metadata*
-rwxrwxr-x 1 allen allen 254 12月 21 01:11 gguf-set-metadata*
-rwxrwxr-x 1 allen allen 204
本地用vLLM部署Qwen2大模型的坎坷历程

最低0.47元/天 解锁文章
5628

被折叠的 条评论
为什么被折叠?



