一、问题:大模型推理时提示cutlassF: no kernel found to launch!
使用chatglm4-6b、qianwen2.5-7b的模型进行推理,推理错误:cutlassF: no kernel found to launch!
二、代码
代码为glm官方提供
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
max_model_len, tp_size = 131072, 1
model_name = "THUDM/glm-4-9b-chat"
prompt = [{
"role": "user", "content": "你好"}]
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
llm = LLM(
model=model_name,
tensor_parallel_size=tp_size,
max_model_len=max_model_len