LoRA微调
数据集处理
数据集格式和注册
Alpaca数据集格式:
[
{
"instruction": "人类指令(必填)",
"input": "人类输入(选填)",
"output": "模型回答(必填)",
"system": "系统提示词(选填)",
"history": [
["第一轮指令(选填)", "第一轮回答(选填)"],
["第二轮指令(选填)", "第二轮回答(选填)"]
]
}
]
dataset_info.json中注册数据集:
注意可以自定义数据集路径!
"数据集名称": {
"file_name": "data.json",
"columns": {
"prompt": "instruction",
"query": "input",
"response": "output",
"system": "system",
"history": "history"
}
}
SFT训练及其yaml设置
sft
llama3_lora_sft.yaml
配置中:template可以参考github官方的template,比如qwen1-2.5的template都是qwen。
训练的时候设置一下卡:
CUDA_DEVICE_ORDER='PCI_BUS_ID' CUDA_VISIBLE_DEVICES=2 llamafactory-cli train tmp/llama3_lora_sft.yaml
merge模型
qwen2.5_merge_sft.yaml
### model
model_name_or_path: /data1/ztshao/pretrained_models/Qwen/Qwen2.5-7B-Instruct
adapter_name_or_path: /data1/ztshao/projects/projects428/military/results/lora_sft
template: qwen
finetuning_type: lora
### export
export_dir: /data1/ztshao/projects/projects428/military/results/merge_sft
export_size: 2
export_device: cpu
export_legacy_format: false
merge完在对应的输出文件夹里自带了ollama导入模型需要的Modelfile文件。
merge操作:
llamafactory-cli export /data1/ztshao/projects/projects428/military/src/finetune/qwen2.5_merge_sft.yaml
DPO
数据集处理
数据集格式和注册
Alpaca数据集格式:
[
{
"instruction": "人类指令(必填)",
"input": "人类输入(选填)",
"chosen": "优质回答(必填)",
"rejected": "劣质回答(必填)"
}
]
dataset_info.json中注册数据集:
"数据集名称": {
"file_name": "data.json",
"ranking": true,
"columns": {
"prompt": "instruction",
"query": "input",
"chosen": "chosen",
"rejected": "rejected"
}
}
DPO训练及其yaml设置
我的yaml:
### model
model_name_or_path: /data1/ztshao/pretrained_models/Qwen/Qwen2.5-7B-Instruct
trust_remote_code: true
### method
stage: dpo
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid # choices: [sigmoid (dpo), orpo, simpo]
### dataset
dataset: military_dpo
template: qwen
cutoff_len: 2048
max_samples: null
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4
### output
output_dir: /data1/ztshao/projects/projects428/military/results/models
logging_steps: 2
save_steps: 50
plot_loss: true
overwrite_output_dir: true
save_only_model: false
report_to: none # choices: [none, wandb, tensorboard, swanlab, mlflow]
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
learning_rate: 5.0e-5
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null
训练:
CUDA_DEVICE_ORDER='PCI_BUS_ID' CUDA_VISIBLE_DEVICES=2 llamafactory-cli train tmp/llama3_lora_dpo.yaml