核心判断逻辑
项目通过 lmdeploy/turbomind/supported_models.py 中的 is_supported() 函数来判断模型是否被 TurboMind 引擎支持。 1
判断流程
-
检查是否为 TurboMind 工作空间模型:首先检查模型路径下是否存在
triton_models目录,如果存在则直接判定为支持。 2 -
检查模型架构:通过
get_model_arch()获取模型架构,然后在SUPPORTED_ARCHS字典中查找该架构是否被支持。 3 -
检查量化方法:如果模型使用了
smooth_quant量化方法,则不支持。 4 -
特殊架构的额外检查:
在系统中的使用
这个判断函数在多个地方被调用:
-
自动后端选择:
autoget_backend()函数使用它来决定是否使用 TurboMind 引擎 11 -
TurboMind 初始化:在
TurboMind._from_hf()方法中,会断言模型必须被支持 12 -
测试验证:在
tests/test_lmdeploy/test_auto_backend.py中有完整的测试用例验证这个功能 13
Notes
支持的模型架构列表定义在 SUPPORTED_ARCHS 字典中,包括 Llama、InternLM、Qwen、Baichuan、Mixtral 等主流模型系列。 3 完整的支持模型列表可以在文档 docs/zh_cn/supported_models/supported_models.md 中查看。 14
Wiki pages you might want to explore:
Citations
File: lmdeploy/turbomind/supported_models.py (L7-59)
SUPPORTED_ARCHS = dict(
# baichuan-7b
BaiChuanForCausalLM='baichuan',
# baichuan2-7b, baichuan-13b, baichuan2-13b
BaichuanForCausalLM='baichuan2',
# gpt-oss
GptOssForCausalLM='gpt-oss',
# internlm
InternLMForCausalLM='llama',
# internlm2
InternLM2ForCausalLM='internlm2',
# internlm3
InternLM3ForCausalLM='llama',
# llama, llama2, alpaca, vicuna, codellama, ultracm, yi,
# deepseek-coder, deepseek-llm
LlamaForCausalLM='llama',
# Qwen 7B-72B, Qwen-VL-7B
QWenLMHeadModel='qwen',
# Qwen2
Qwen2ForCausalLM='qwen2',
Qwen2MoeForCausalLM='qwen2-moe',
# Qwen2-VL
Qwen2VLForConditionalGeneration='qwen2',
# Qwen2.5-VL
Qwen2_5_VLForConditionalGeneration='qwen2',
# Qwen3
Qwen3ForCausalLM='qwen3',
Qwen3MoeForCausalLM='qwen3-moe',
# mistral
MistralForCausalLM='llama',
# llava
LlavaLlamaForCausalLM='llama',
LlavaMistralForCausalLM='llama',
LlavaForConditionalGeneration='llava',
# xcomposer2
InternLMXComposer2ForCausalLM='xcomposer2',
# internvl
InternVLChatModel='internvl',
# internvl3
InternVLForConditionalGeneration='internvl',
InternS1ForConditionalGeneration='internvl',
# deepseek-vl
MultiModalityCausalLM='deepseekvl',
DeepseekV2ForCausalLM='deepseek2',
# MiniCPMV
MiniCPMV='minicpmv',
# chatglm2/3, glm4
ChatGLMModel='glm4',
ChatGLMForConditionalGeneration='glm4',
# mixtral
MixtralForCausalLM='mixtral',
MolmoForCausalLM='molmo',
)
File: lmdeploy/turbomind/supported_models.py (L62-81)
def is_supported(model_path: str):
"""Check whether supported by turbomind engine.
Args:
model_path (str): the path of a model.
It could be one of the following options:
- i) A local directory path of a turbomind model which is
converted by `lmdeploy convert` command or download from
ii) and iii).
- ii) The model_id of a lmdeploy-quantized model hosted
inside a model repo on huggingface.co, such as
"InternLM/internlm-chat-20b-4bit",
"lmdeploy/llama2-chat-70b-4bit", etc.
- iii) The model_id of a model hosted inside a model repo
on huggingface.co, such as "internlm/internlm-chat-7b",
"Qwen/Qwen-7B-Chat ", "baichuan-inc/Baichuan2-7B-Chat"
and so on.
Returns:
support_by_turbomind (bool): Whether input model is supported by turbomind engine
""" # noqa: E501
File: lmdeploy/turbomind/supported_models.py (L89-91)
triton_model_path = os.path.join(model_path, 'triton_models')
if os.path.exists(triton_model_path):
support_by_turbomind = True
File: lmdeploy/turbomind/supported_models.py (L95-98)
quant_method = search_nested_config(cfg.to_dict(), 'quant_method')
if quant_method and quant_method in ['smooth_quant']:
# tm hasn't support quantized models by applying smoothquant
return False
File: lmdeploy/turbomind/supported_models.py (L103-107)
if arch == 'BaichuanForCausalLM':
num_attn_head = cfg.num_attention_heads
if num_attn_head == 40:
# baichuan-13B, baichuan2-13B not supported by turbomind
support_by_turbomind = False
File: lmdeploy/turbomind/supported_models.py (L108-109)
elif arch in ['Qwen2ForCausalLM', 'LlamaForCausalLM']:
support_by_turbomind = _is_head_dim_supported(cfg)
File: lmdeploy/turbomind/supported_models.py (L110-115)
elif arch in ('ChatGLMModel', 'ChatGLMForConditionalGeneration'):
# chatglm1/2/3 is not working yet
support_by_turbomind = cfg.num_layers == 40
if getattr(cfg, 'vision_config', None) is not None:
# glm-4v-9b not supported
support_by_turbomind = False
File: lmdeploy/turbomind/supported_models.py (L116-122)
elif arch == 'InternVLChatModel':
llm_arch = cfg.llm_config.architectures[0]
support_by_turbomind = (llm_arch in SUPPORTED_ARCHS and _is_head_dim_supported(cfg.llm_config))
elif arch in ['LlavaForConditionalGeneration', 'InternVLForConditionalGeneration']:
llm_arch = cfg.text_config.architectures[0]
if llm_arch in ['Qwen2ForCausalLM', 'LlamaForCausalLM']:
support_by_turbomind = _is_head_dim_supported(cfg.text_config)
File: lmdeploy/turbomind/supported_models.py (L123-126)
elif arch == 'MolmoForCausalLM':
kv_heads = cfg.num_key_value_heads
# TM hasn't supported allenai/Molmo-7B-O-0924 yet
support_by_turbomind = kv_heads is not None
File: lmdeploy/turbomind/supported_models.py (L127-129)
elif arch == 'DeepseekV2ForCausalLM':
if getattr(cfg, 'vision_config', None) is not None:
support_by_turbomind = False
File: lmdeploy/archs.py (L38-54)
from lmdeploy.turbomind.supported_models import is_supported as is_supported_turbomind
turbomind_has = is_supported_turbomind(model_path)
except ImportError:
is_turbomind_installed = False
if is_turbomind_installed:
if not turbomind_has:
logger.warning('Fallback to pytorch engine because '
f'`{model_path}` not supported by turbomind'
' engine.')
else:
logger.warning('Fallback to pytorch engine because turbomind engine is not '
'installed correctly. If you insist to use turbomind engine, '
'you may need to reinstall lmdeploy from pypi or build from '
'source and try again.')
backend = 'turbomind' if turbomind_has else 'pytorch'
File: lmdeploy/turbomind/turbomind.py (L260-261)
assert is_supported(model_path), (f'turbomind does not support {model_path}. '
'Plz try pytorch engine instead.')
File: tests/test_lmdeploy/test_auto_backend.py (L41-45)
def test_turbomind_is_supported(self, turbomind_workspace, models):
from lmdeploy.turbomind.supported_models import is_supported
assert is_supported(turbomind_workspace) is True
for m, flag in models:
assert is_supported(m) is flag
File: docs/zh_cn/supported_models/supported_models.md (L1-57)
支持的模型
以下列表分别为 LMDeploy TurboMind 引擎和 PyTorch 引擎在不同软硬件平台下支持的模型
TurboMind CUDA 平台
| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
|---|---|---|---|---|---|---|
| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
| Llama3.2[2] | 1B, 3B | LLM | Yes | Yes* | Yes* | Yes |
| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
| Intern-S1 | 241B | MLLM | Yes | Yes | Yes | No |
| Intern-S1-mini | 8.3B | MLLM | Yes | Yes | Yes | No |
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
| Qwen1.5[1] | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
| Qwen2[2] | 0.5B - 72B | LLM | Yes | Yes* | Yes* | Yes |
| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
| Qwen3 | 0.6B-235B | LLM | Yes | Yes | Yes* | Yes |
| Qwen2.5[2] | 0.5B - 72B | LLM | Yes | Yes* | Yes* | Yes |
| Mistral[1] | 7B | LLM | Yes | Yes | Yes | No |
| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes* | Yes* | Yes |
| InternVL2.5(MPO)[2] | 1 - 78B | MLLM | Yes | Yes* | Yes* | Yes |
| InternVL3[2] | 1 - 78B | MLLM | Yes | Yes* | Yes* | Yes |
| InternVL3.5[3] | 1 - 241BA28B | MLLM | Yes | Yes* | Yes* | No |
| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
| gpt-oss | 20B,120B | LLM | Yes | Yes | Yes | Yes |
“-” 表示还没有验证。
* [1] turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
* [2] 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等
2043

被折叠的 条评论
为什么被折叠?



