BUG: ValueError: The model’s max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (1152).
环境
linux
python 3.10
torch 2.1.2+cu118
vllm 0.3.3+cu118
xformers 0.0.23.post1
详情
使用vllm启动大模型出现的错误。
ValueError: The model’s max seq len (32768) is larger than the maximum number of tokens that can be stored in KV