[torch] torch profiler使用示例

最新推荐文章于 2024-05-01 16:27:05 发布

农民小飞侠

最新推荐文章于 2024-05-01 16:27:05 发布

阅读量1.1k

点赞数

文章标签： pytorch 人工智能 python

本文链接：https://blog.csdn.net/w5688414/article/details/131349318

版权

今天需要使用profiler来分析LLM的性能，所以特地的尝试了一下，我这里把示例代码分享给搭建，希望大家编程顺利：

import time
import torch
from transformers import AutoTokenizer, AutoModel
import torch.profiler as profiler
model_name_or_path = 'THUDM/chatglm-6b'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name_or_path, trust_remote_code=True).half().cuda()
model = model.eval()
prompt = "你好"
inputs = tokenizer([prompt], return_tensors="pt")
inputs = inputs.to("cuda")
prof = profiler.profile(
            activities=[
                torch.profiler.ProfilerActivity.CPU,
                torch.profiler.ProfilerActivity.CUDA,
            ],
            schedule=torch.profiler.schedule(
                wait=1,
                warmup=1,
                active=2,
                repeat=1),
        )

with torch.no_grad():
    for i in range(5):
        result = model(**inputs)

with torch.no_grad():
    for i in range(10):
        start = time.perf_counter()
        # response, history = model.chat(tokenizer, "你好", history=[])
        # print(response)
        result = model(**inputs)
        hf_cost = (time.perf_counter() - start) * 1000
        print("Speed tokenizer:", hf_cost)
        prof.step()

print(prof.key_averages().table(sort_by="self_cpu_time_total"))

我给的示例是chatglm的，有需要的可以换成其他的模型，原理是一样的。