搭建《TensorRT-LLM 技术实战营》大模型运行环境

游戏AI开发者

已于 2024-04-14 19:53:16 修改

阅读量1.1k

点赞数 12

分类专栏： LLM 文章标签：语言模型 gpu算力 tensorflow docker ubuntu pip

于 2024-04-14 18:44:52 首次发布

本文链接：https://blog.csdn.net/word_world/article/details/137744358

版权

LLM 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

通过 TensorRT-LLM 技术实战营活动，了解到 Nvidia 的大语言模型推理加速技术开源框架 TensorRT-LLM（使用 C++ 实现，提供了 python API 包 tensorrt_llm）。看完视频课程《NVIDIA LLM 全栈式方案使用和优化最佳实践》开始动手搭环境，跑通 summarize.py 模型测试，在这里记录一下主要流程和遇到的问题。

安装 NVIDIA Container Toolkit

从 NVIDIA Container Toolkit 架构看，底层依赖 Nvidia 显卡、Docker 环境，应用层依赖 Cuda Toolkit。

NVIDIA Container Toolkit Overview

安装 wsl 2 和 Docker Desktop
推荐 win10 22H2 和 win11 23H2，低版本的 windows 会导致 docker engine 各种问题起不来

在 wsl 中安装 Cuda Toolkit

$ wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
$ sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
$ sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.1-1_amd64.deb
$ sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
$ sudo apt-get update
$ sudo apt-get -y install cuda-toolkit-12-4

安装配置 NVIDIA Container Toolkit

Install

在 docker 中创建 cuda-devel-ubuntu 容器 linux
与官网给出的 docker 选项有所不同：去掉了 --rm 增加了 -d
目的是退出时保留这次新建的容器
```
docker run --runtime=nvidia --gpus all --entrypoint /bin/bash \
      --name TensorRT-LLM \
      -itd nvidia/cuda:12.1.0-devel-ubuntu22.04
```

Install dependencies, TensorRT-LLM requires Python 3.10

$ apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev

tensorrt_llm 0.9.0 的 python 正式包有问题，见 TensorRT-LLM/issues/1442

下载 TensorRT-LLM 工程，安装 python 依赖（其中包括 tensorrt_llm 0.9.0.dev2024040900)

# 安装 git
$ apt-get install git-lfs
$ git lfs install
# 下载 TensorRT-LLM 工程
$ git clone https://github.com/NVIDIA/TensorRT-LLM.git
# 安装 python 依赖
$ cd TensorRT-LLM/
$ pip install -r examples/bloom/requirements.txt

Quick Start

下载 bloom-560m 模型

# 模型放到 TensorRT-LLM/examples/bloom/560M
$ cd TensorRT-LLM/examples/bloom
$ mkdir -p bloom/560M
$ cd bloom
# 从 huggingface下载模型（可能失败）
$ git clone https://huggingface.co/bigscience/bloom-560m 560M
# 如果从 huggingface下载失败，改从 gitee 镜像下载
$ git clone https://gitee.com/modelee/bloom-560m.git 560M

构建 TensorRT 引擎
关于 CUDA lazy loading

$ cd TensorRT-LLM/examples/bloom
# Single GPU on BLOOM 560M
$ python convert_checkpoint.py --model_dir ./bloom/560M/ \
              --dtype float16 \
              --output_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/
# 打开 CUDA lazy loading 
$ export CUDA_MODULE_LOADING=LAZY
# May need to add trtllm-build to PATH, export PATH=/usr/local/bin:$PATH
$ trtllm-build --checkpoint_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/ \
              --gemm_plugin float16 \
              --output_dir ./bloom/560M/trt_engines/fp16/1-gpu/

运行

# 使用 Huggingface 镜像站，否则会报 ConnectionError: Couldn't reach 'ccdv/cnn_dailymail' on the Hub (SSLError)
$ export HF_ENDPOINT=https://hf-mirror.com
# 使用 github 镜像站，否则会报 ConnectionError: Couldn't reach https://raw.githubusercontent.com/abisee/cnn-dailymail/master/url_lists/all_test.txt
$ sed -i 's/githubusercontent/gitmirror/g' /root/.cache/huggingface/modules/datasets_modules/datasets/ccdv--cnn_dailymail/*/cnn_dailymail.py
# 运行大模型对输入内容作总结
$ python ../summarize.py --test_trt_llm \
                     --hf_model_dir ./bloom/560M/ \
                     --data_type fp16 \
                     --engine_dir ./bloom/560M/trt_engines/fp16/1-gpu/

游戏AI开发者

关注

12
点赞
踩
32

收藏

觉得还不错? 一键收藏
打赏
2
评论
搭建《TensorRT-LLM 技术实战营》大模型运行环境

记录了搭建 Nvidia 的大语言模型推理加速技术开源框架 TensorRT-LLM（使用 C++ 实现，提供了 python API）运行环境，跑通 summarize.py 模型测试的 step-by-step 流程，以及过程中遇到问题、解决办法。
复制链接

扫一扫