RuntimeError: data_parallel_size (2) is not divisible by expert_model_parallel_size
./PAI-Megatron-Patch/Megatron-LM-240405/megatron/core/parallel_state.py
data_parallel_size: int = world_size // (
tensor_model_parallel_size * pipeline_model_parallel_size * context_parallel_size
)
if data_parallel_size % expert_model_parallel_size != 0:
raise RuntimeError(
f"data_parallel_size ({data_parallel_size}) is not divisible by expert_model_parallel_size "
)
原因是 world_size=4,开了4卡。。