用TensorRT-LLM跑通chatGLM_6B模型

最新推荐文章于 2024-05-22 09:32:30 发布

心瘾こころ

最新推荐文章于 2024-05-22 09:32:30 发布

阅读量337

点赞数 7

分类专栏： Tensor-LLM部署大模型文章标签： python 语言模型

本文链接：https://blog.csdn.net/weixin_51954774/article/details/136275218

版权

Tensor-LLM部署大模型专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文讲述了如何使用GitLFS安装、构建TensorRT-LLMDocker镜像，配置Python环境，下载模型并转换为TensorRT格式，以及运行模型示例的过程。

摘要由CSDN通过智能技术生成

零、参考资料

NVIDIA官网
 THUDM的Github
NVIDIA的Github

一、构建 TensorRT-LLM的docker镜像

git lfs install
git clone  https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
sudo make -C docker release_build
sudo make -C docker release_run

二、在docker镜像中配置并跑GLM模型

1、配置python环境

cd ./examples/chatglm
pip install -r requirements.txt
apt-get update
apt-get install git-lfs

2、从 HuggingFace 下载模型权重

git clone https://huggingface.co/THUDM/chatglm-6b chatglm_6b 
需要等一段时间 
cp chatglm_6b/tokenization_chatglm.py chatglm_6b/tokenization_chatglm.py-backup 
cp tokenization_chatglm.py chatglm_6b

3、将Hugging Face提供的GLM模型转换成TensorRT格式

python3 convert_checkpoint.py --model_dir chatglm_6b --output_dir trt_ckpt/chatglm_6b/fp16/1-gpu

4、构建 TensorRT 引擎

# ChatGLM-6B: single-gpu engine with dtype float16, GPT Attention plugin, Gemm plugin 
trtllm-build --checkpoint_dir trt_ckpt/chatglm_6b/fp16/1-gpu --gemm_plugin float16 --output_dir trt_engines/chatglm_6b/fp16/1-gpu

5、运行例子

# Run the default engine of ChatGLM-6B on single GPU, other model name is available if built. 
python3 ../run.py --input_text "请输入你的问题" \
         --max_output_len 50 \
         --tokenizer_dir chatglm_6b \
         --engine_dir trt_engines/chatglm_6b/fp16/1-gpu