audo dl上使用tensorrt llm，baichuan7B为例

FocusYang55

已于 2024-06-30 17:31:47 修改

阅读量242

点赞数 1

文章标签：深度学习人工智能 pytorch

于 2024-06-30 14:34:45 首次发布

本文链接：https://blog.csdn.net/boosting1/article/details/140080657

版权

1. 在社区镜像搜索 nvidia 找一个tensorrt llm 0.10 以上的版本，系统盘30g安装软件应该够用，免费的数据盘50G用来存放模型。baichuan7B原始模型应该会占用14G，转换为fp16的 ckpt后再占用14G，build后占用14G。总共需要占用42G，50G的数据盘应该够用。

2. 使用nvidia-smi，检查下自己的显卡 cuda 12.4 ，24G显存可用，编译一个7B的模型最少需要22G。

3. 检查下tensorrt_llm 0.11开发版，没有问题。

4. df看一下，30G的系统盘有19G可用，audodl-tmp目录下有50G可用。

5. 复制huggingce 中baichuan 调用的代码，执行torch模型，从而将模型下载到cache中。注意

from-pretrain的参数都要加上cache_dir="/root/autodl-tmp/huggingface/"。

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", cache_dir="/root/autodl-tmp/huggingface/", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", cache_dir="/root/autodl-tmp/huggingface/", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("baichuan-inc/Baichuan2-7B-Chat", cache_dir="/root/autodl-tmp/huggingface/")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

6.在下面路径下找到模型

/root/autodl-tmp/huggingface/models--baichuan-inc--Baichuan2-7B-Chat/snapshots/ea66ced17780ca3db39bc9f8aa601d8463db3da5

7.到/root/TensorRT-LLM-0.10.0/examples/baichuan 路径下输出

python3 convert_checkpoint.py --model_dir /root/autodl-tmp/huggingface/models--baichuan-inc--Baichuan2-7B-Chat/snapshots/ea66ced17780ca3db39bc9f8aa601d8463db3da5 --output_dir /root/autodl-tmp/ckpt --dtype float16

8. build

trtllm-build --checkpoint_dir /root/autodl-tmp/ckpt --output_dir /root/autodl-tmp/engine

9 到example 路径下 run

python3 run.py --tokenizer_dir /root/autodl-tmp/huggingface/models--baichuan-inc--Baichuan2-7B-Chat/snapshots/ea66ced17780ca3db39bc9f8aa601d8463db3da5/ --engine_dir /root/autodl-tmp/engine/ --input_text "解释一下“温故而知新”"  --max_output_len 1024