08-12 周一搭建vllm0.5.0和lm-eval评测环境-CSDN博客

本文链接：https://blog.csdn.net/lk142500/article/details/141220707

08-12 周一搭建vllm0.5.0和lm-eval评测环境

时间	版本	修改人	描述
2024年8月12日16:58:30	V0.1	宋全恒	新建文档

简介

本文档主要演示搭建vllm0.5.0的评测环境的过程。这个环境问题，真的很费劲。

使用的镜像

(lmdeploy042) yuzailiang@ubuntu:~$ docker run --name vllm050 --gpus all -v /mnt/self-define/:/mnt/self-define -it 10.101.12.128/schen-zhejianglab.com/vllmcusparselt:1.0-dev-nvidia12.4-cudnn8-jupyter-ssh

注：挂载共享目录，是为了方便，在共享目录中，可以有一些配置信息，自己常用的，进行保存。如缓存目录。

注： --gpus all 则是为了使用GPU。

这样，在环境构建过程中，就不用每次下载同样的一个而包，花费很长的等待时间了。如下述的torch包，779MB，可以保证下载一次之后，之后就可以一直使用缓存了。

Collecting torch==2.3.0
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/43/e5/2ddae60ae999b224aceb74490abeb885ee118227f866cb12046f0481d4c9/torch-2.3.0-cp310-cp310-manylinux1_x86_64.whl (779.1 MB)

同事说，要先提供配置：

export TORCH_CUDA_ARCH_LIST="8.0 8.6 8.9 9.0"

下载源码

root@74d4cc1d5091:/workspace# git clone https://github.com/yanchenmochen/vllm.git
Cloning into 'vllm'...
remote: Enumerating objects: 27420, done.
remote: Counting objects: 100% (9923/9923), done.
remote: Compressing objects: 100% (1072/1072), done.
remote: Total 27420 (delta 9392), reused 8851 (delta 8851), pack-reused 17497
Receiving objects: 100% (27420/27420), 23.84 MiB | 783.00 KiB/s, done.
Resolving deltas: 100% (20780/20780), done.

root@74d4cc1d5091:/workspace# cd vllm/
root@74d4cc1d5091:/workspace/vllm# git checkout v0.5.0
Note: switching to 'v0.5.0'.

vllm v0.5.0执行编译安装

为了查看源码编译安装的详细过程，因此使用了如下命令

root@74d4cc1d5091:/workspace/vllm#  pip install -e . --verbose -i https://pypi.tuna.tsinghua.edu.cn/simple --cache-dir /mnt/self-define/songquanheng/pip_dir/cache/
Using pip 22.0.2 from /usr/lib/python3/dist-packages/pip (python 3.10)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///workspace/vllm
  Running command pip subprocess to install build dependencies
  Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple

 Collecting cmake>=3.21
    Downloading https://pypi.tuna.tsinghua.edu.cn/packages/69/70/242937601f9ff9e6df4c0587b5a7702be4dbfd33420b409d80e2bccc276a/cmake-3.30.2-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)
       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.9/26.9 MB 3.5 MB/s eta 0:00:00
  Collecting ninja
    Downloading https://pypi.tuna.tsinghua.edu.cn/packages/6d/92/8d7aebd4430ab5ff65df2bfee6d5745f95c004284db2d8ca76dcbfd9de47/ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 KB 2.7 MB/s eta 0:00:00

这样可以得到下载缓存。

–verbose，参数为了更详细的打印安装的执行过程。

-i https://pypi.tuna.tsinghua.edu.cn/simple 使用清华源加速构建。

–cache-dir /mnt/self-define/songquanheng/pip_dir/cache/ 方便下次构建，这样能够将下载缓存起来，提升下载效率。

问题和解决方式

源码安装lm-eval

安装失败

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# pip install -e .
Obtaining file:///mnt/self-define/songquanheng/lm-evaluation-harness
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Installing collected packages: UNKNOWN
  Running setup.py develop for UNKNOWN
Successfully installed UNKNOWN-0.0.0

解决方式如下：

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# python -m pip install --upgrade pip       
Requirement already satisfied: pip in /usr/lib/python3/dist-packages (22.0.2)
Collecting pip
  Downloading pip-24.2-py3-none-any.whl (1.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 4.4 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.2
    Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-24.2

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# pip install setuptools --upgrade
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (72.1.0)

再次安装，出现下述基本成功

Obtaining file:///mnt/self-define/songquanheng/lm-evaluation-harness
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Requirement already satisfied: accelerate>=0.26.0 in /usr/local/lib/python3.10/dist-packages (from lm_eval==0.4.3) (0.28.0)
Collecting evaluate (from lm_eval==0.4.3)
  Downloading evaluate-0.4.2-py3-none-any.whl.metadata (9.3 kB)
Requirement already satisfied: datasets>=2.16.0 in /usr/local/lib/python3.10/dist-packages (from lm_eval==0.4.3) (2.18.0)
Collecting jsonlines (from lm_eval==0.4.3)
  Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Collecting numexpr (from lm_eval==0.4.3)
  Downloading numexpr-2.10.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.2 kB)
Collecting peft>=0.2.0 (from lm_eval==0.
...

Successfully built lm_eval rouge-score sqlitedict word2number
Installing collected packages: word2number, sqlitedict, zstandard, threadpoolctl, tcolorpy, tabulate, pybind11, portalocker, pathvalidate, numexpr, nltk, more-itertools, lxml, jsonlines, colorama, chardet, absl-py, tqdm-multiprocess, scikit-learn, sacrebleu, rouge-score, mbstrdecoder, typepy, peft, evaluate, DataProperty, tabledata, pytablewriter, lm_eval
Successfully installed DataProperty-1.0.1 absl-py-2.1.0 chardet-5.2.0 colorama-0.4.6 evaluate-0.4.2 jsonlines-4.0.0 lm_eval-0.4.3 lxml-5.3.0 mbstrdecoder-1.1.3 more-itertools-10.4.0 nltk-3.8.2 numexpr-2.10.1 pathvalidate-3.2.0 peft-0.12.0 portalocker-2.10.1 pybind11-2.13.1 pytablewriter-1.2.0 rouge-score-0.1.2 sacrebleu-2.4.2 scikit-learn-1.5.1 sqlitedict-2.1.0 tabledata-1.3.3 tabulate-0.9.0 tcolorpy-0.1.6 threadpoolctl-3.5.0 tqdm-multiprocess-0.0.11 typepy-1.3.2 word2number-1.1 zstandard-0.23.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

问题与解决方式

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --tasks list
Traceback (most recent call last):
  File "/usr/local/bin/lm-eval", line 5, in <module>
    from lm_eval.__main__ import cli_evaluate
  File "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/__init__.py", line 1, in <module>
    from .evaluator import evaluate, simple_evaluate
  File "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/evaluator.py", line 12, in <module>
    import lm_eval.api.metrics
  File "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/api/metrics.py", line 12, in <module>
    from lm_eval.api.registry import register_aggregation, register_metric
  File "/mnt/self-define/songquanheng/lm-evaluation-harness/lm_eval/api/registry.py", line 4, in <module>
    import evaluate as hf_evaluate
  File "/usr/local/lib/python3.10/dist-packages/evaluate/__init__.py", line 29, in <module>
    from .evaluation_suite import EvaluationSuite
  File "/usr/local/lib/python3.10/dist-packages/evaluate/evaluation_suite/__init__.py", line 10, in <module>
    from ..evaluator import evaluator
  File "/usr/local/lib/python3.10/dist-packages/evaluate/evaluator/__init__.py", line 17, in <module>
    from transformers.pipelines import SUPPORTED_TASKS as SUPPORTED_PIPELINE_TASKS
  File "/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py", line 26, in <module>
    from ..image_processing_utils import BaseImageProcessor
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_utils.py", line 21, in <module>
    from .image_transforms import center_crop, normalize, rescale
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_transforms.py", line 22, in <module>
    from .image_utils import (
  File "/usr/local/lib/python3.10/dist-packages/transformers/image_utils.py", line 58, in <module>
    from torchvision.transforms import InterpolationMode
  File "/usr/local/lib/python3.10/dist-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/usr/local/lib/python3.10/dist-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/usr/local/lib/python3.10/dist-packages/torch/library.py", line 467, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/usr/local/lib/python3.10/dist-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist

解决方式参考RuntimeError: operator torchvision::nms does not exist - vision - PyTorch Forums

功能验证

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --tasks list

|              Group              |                            Config Location                             |
|---------------------------------|------------------------------------------------------------------------|
|aclue                            |lm_eval/tasks/aclue/_aclue.yaml                                         |
|aexams                           |lm_eval/tasks/aexams/_aexams.yaml                                       |
|agieval                          |lm_eval/tasks/agieval/agieval.yaml                                      |
|agieval_cn                       |lm_eval/tasks/agieval/agieval_cn.yaml                                   |
|agieval_en                       |lm_eval/tasks/agieval/agieval_en.yaml                                   |
|agieval_nous                     |lm_eval/tasks/agieval/agieval_nous.yaml                                 |
|arabicmmlu                       |lm_eval/tasks/arabicmmlu/_arabicmmlu.yaml                               |
|arabicmmlu_humanities            |lm_eval/tasks/arabicmmlu/_arabicmmlu_humanities.yaml                    |
|arabicmmlu_language              |lm_eval/tasks/arabicmmlu/_arabicmmlu_language.yaml                      |
|arabicmmlu_other                 |lm_eval/tasks/arabicmmlu/_arabicmmlu_other.yaml                         |
|arabicmmlu_social_science        |lm_eval/tasks/arabicmmlu/_arabicmmlu_social_science.yaml                |
...

这样lm-eval基本算验证成功

验证模型测试精度

root@74d4cc1d5091:/mnt/self-define/songquanheng/lm-evaluation-harness# lm-eval --model vllm --model_args pretrained=/mnt/self-define/zhangweixing/model/llama2-7b-hf,gpu_memory_utilization=0.8 --tasks arc_easy --device cuda:0 
INFO 08-12 12:25:34 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='/mnt/self-define/zhangweixing/model/llama2-7b-hf', speculative_config=None, tokenizer='/mnt/self-define/zhangweixing/model/llama2-7b-hf', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1234, served_model_name=/mnt/self-define/zhangweixing/model/llama2-7b-hf)
INFO 08-12 12:27:41 model_runner.py:159] Loading model weights took 12.5523 GB
INFO 08-12 12:27:42 gpu_executor.py:83] # GPU blocks: 2345, # CPU blocks: 512
INFO 08-12 12:27:44 model_runner.py:878] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 08-12 12:27:44 model_runner.py:882] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 08-12 12:27:57 model_runner.py:954] Graph capturing finished in 13 secs.
Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.00k/9.00k [00:00<00:00, 15.9MB/s]

Generating test split: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:00<00:00, 290324.14 examples/s]
Generating validation split: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 182361.04 examples/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2376/2376 [00:01<00:00, 1406.79it/s]
Running loglikelihood requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9501/9501 [02:15<00:00, 70.26it/s]
fatal: detected dubious ownership in repository at '/mnt/self-define/songquanheng/lm-evaluation-harness'
To add an exception for this directory, call:

	git config --global --add safe.directory /mnt/self-define/songquanheng/lm-evaluation-harness
vllm (pretrained=/mnt/self-define/zhangweixing/model/llama2-7b-hf,gpu_memory_utilization=0.8), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks  |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
|--------|------:|------|-----:|--------|---|-----:|---|-----:|
|arc_easy|      1|none  |     0|acc     |↑  |0.7630|±  |0.0087|
|        |       |none  |     0|acc_norm|↑  |0.7458|±  |0.0089|

至此，就完成了vllm 0.5.0和lm-eval评测环境镜像的搭建了，然后，我们可以基于源码开发镜像，并使用lm-eval来评测量化模型和原始模型的精度，使用vllm 原始的benchmark可以测试首token和推理延时。

镜像备份harbor

(lmdeploy042) yuzailiang@ubuntu:~$ docker push 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh
The push refers to repository [10.200.88.53/framework/vllm]
679b7252e87c: Pushing [>                                                  ]  53.92MB/7.034GB
50bbce084879: Pushing [>                                                  ]  42.99MB/30.22GB
47c82924cf56: Pushed 
ebf20e4fb8d4: Pushing [===>                                               ]  40.63MB/584.8MB
8b9803501a26: Pushing [======>                                            ]  40.87MB/303.9MB
eb4697a44dd2: Pushing [====>                                              ]  35.02MB/404.3MB
b7ad7c045853: Waiting 
816e34807296: Waiting 
09e47d21a1ca: Waiting 
594f9ac14b13: Waiting 
600c676771a0: Waiting 
6ac15100dff6: Waiting 
40f0eb1871b9: Waiting 
8d113b7b997c: Waiting 
cd77f58b80cd: Waiting 
e4b1bddcbe63: Waiting 
765423415d69: Waiting 
7b9433fba79b: Waiting 
256d88da4185: Waiting

镜像要经常备份一下，省的工作成果丢失了。

容器启动挂载端口和存储

docker run -d --name smoothquant --gpus all -v /mnt/self-define:/mnt/self-define -p 8022:22 -it 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh

(base) yuzailiang@ubuntu:~$ docker run -d --name vllm-smoothquant --gpus all -v /mnt/self-define:/mnt/self-define -p 38022:22 -it 10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh
e483f52b9cb0942540b1ff205688e8b6588aa723eae494efe660f25c7846d88a

总结

经过上述的镜像生成，之后，我们之后就可以一直使用源码安装的方式来进行环境的创建，生成镜像，复用镜像，并且在使用过程中，也演示了pip使用缓存的技巧，这是非常方便，非常有效的一种方式。

最后，总结一下，这个文章的主要内容：

源码安装vllm 0.5.0
源码安装lm-evaluate-harness 用来评测大模型和量化大模型的精度。

搭建容器sshd环境

可参考文档印象笔记之07-09 周二镜像启动容器添加openssh，使用vscode断点调试Python工程, 容器配置源码安装vllm

root@74d4cc1d5091:/mnt/self-define/songquanheng# cat /etc/os-release 
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

说明是Ubuntu 22.04 Jammy版本

阿里源为

deb http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse

备份 /etc/apt/sources.list.d

root@74d4cc1d5091:/etc/apt/sources.list.d# ls | xargs -I {} mv {} {}.bak

更新和下载必要的ssh环境

apt-get update
apt-get install openssh-server

修改端口和允许root登录

echo "Port 22" >> /etc/ssh/sshd_config
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config

更新root密码为tianshu@123

root@a9a7cd77c4ee:~# passwd 
New password: 
Retype new password: 
passwd: password updated successfully

启动ssh容器

/etc/init.d/ssh restart

配置vscode免密登录

参见 07-16 VSCode配置 SSH连接远程服务器+免密连接教程

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUZN3Oh46GlQJlG8FGxYWhl9Xvj3Y0gJ2twSpIUA9ukpXySWpVjQ8am3NZjt1lKL5qVFcRobn8hpPwwZ5coFSN8qon228f85eIWCRSMRvqFpoHfLzC5qHG6hwdq0LXKLfj68q5xNKnSZ3MnB7wA4nTBz1bA5vcq//be3nrGzW5DMl8miwmAvJ0P4xasPPB2iePe6Y2DEHtSgTD3yMGTefq1IzaeZaVEGsrSI8J57vzhqFjOpAnwcPFGwXq/RAESchUX/WHJ498bRijDLCrvYPNQlIzwjx8C74Tj6w/cp8QO2sSRVtuKRf3cuHyB7B69+mUYzrgGHqi7JBGuGSNlMCZ zj@DESKTOP-L6VJN12

执行如下命令

root@74d4cc1d5091:/etc/apt/sources.list.d# mkdir ~/.ssh
root@74d4cc1d5091:/etc/apt/sources.list.d# vim ~/.ssh/authorized_keys

总结

该文档详细描述了通过Docker镜像构建vllm框架的整体过程，并且输出成果物

10.200.88.53/framework/vllm:0.5.0-lm_eval-0.4.3-ssh

该镜像基于vllm 0.5.0，并且合入了lm_eval版本0.4.3，两个框架都是源码编译安装的，均位于/workspace,如果基于vllm上再次进行精度和推理的测试，可以基于该镜像进行工作。

另外，就是，该文档也描述了容器配饰sshd的环境和vscode免密登录的过程，方便之后使用vscode直接连接到容器进行开发。