tokenizer
normization
MHA,GQA,MQA
position embedding
kvcache (TBD)
flashattention (TBD)
finetune
sft
lora (TBD)
p-tuning (TBD)
mgatron (TBD)
deepspeed (TBD)
bert (TBD)
MoE (TBD)
参数估计、显存计算 (TBD)
多精度混合训练 (TBD)
data
tokenizer
normization
MHA,GQA,MQA
position embedding
kvcache (TBD)
flashattention (TBD)
finetune
sft
lora (TBD)
p-tuning (TBD)
mgatron (TBD)
deepspeed (TBD)
bert (TBD)
MoE (TBD)
参数估计、显存计算 (TBD)
多精度混合训练 (TBD)
data