大模型PEFT(二) 之大模型LoRA指令微调实践

WF19980719

已于 2024-06-16 17:27:57 修改

阅读量248

点赞数 2

文章标签： python

于 2024-06-09 18:19:27 首次发布

本文链接：https://blog.csdn.net/WF19980719/article/details/139554758

版权

环境搭建

 git clone -b v0.6.1 --depth=1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
 conda create -n py310 python=3.10 
 source activate py310
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --ignore-installed

疑问

!git lfs install
!git clone https://huggingface.co/Qwen/Qwen1.5-0.5B

出现如下输出，貌似并没有被安装

(py310) root@intern-studio-40072860:~/LLaMA-Factory# !git lfs install
git lfs install lfs install
Updated Git hooks.
Git LFS initialized.
(py310) root@intern-studio-40072860:~/LLaMA-Factory# !git clone https://huggingface.co/Qwen/Qwen1.5-0.5B
git lfs install lfs install clone https://huggingface.co/Qwen/Qwen1.5-0.5B
Updated Git hooks.

直接从huggingface安装

(py310) root@intern-studio-40072860:~/LLaMA-Factory# git clone https://huggingface.co/Qwen/Qwen1.5-0.5B
Cloning into 'Qwen1.5-0.5B'...
fatal: unable to access 'https://huggingface.co/Qwen/Qwen1.5-0.5B/': Received HTTP code 503 from proxy after CONNECT

从命令行(下载成功）

https://hf-mirror.com/

huggingface-cli download --resume-download Qwen/Qwen1.5-0.5B --local-dir Qwen/Qwen1.5-0.5B

推理

微调前（没有checkpoint，先进行微调）

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path path_to_llama_model \--adapter_name_or_path path_to_checkpoint \--template default \--finetuning_type lora

大模型指令监督微调

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train \
    --template default \
    --model_name_or_path ./Qwen/Qwen1.5-0.5B \
    --dataset alpaca_data_zh_demo \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

大模型指令微调

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path ./Qwen/Qwen1.5-0.5B \
    --dataset alpaca_data_zh_demo \
    --dataset_dir data \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir saves/Qwen1.5-0.5B/qlora/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --quantization_bit 4 \
    --plot_loss \
    --fp16

运行日志

(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
>     --stage sft \
>     --do_train \
>     --model_name_or_path ./Qwen/Qwen1.5-0.5B \
>     --dataset alpaca_data_zh_demo \
>     --dataset_dir data \
>     --template default \
>     --finetuning_type lora \
>     --lora_target q_proj,v_proj \
>     --output_dir saves/Qwen1.5-0.5B/qlora/sft \
>     --overwrite_cache \
>     --overwrite_output_dir \
>     --cutoff_len 1024 \
>     --per_device_train_batch_size 1 \
>     --per_device_eval_batch_size 1 \
>     --gradient_accumulation_steps 8 \
>     --lr_scheduler_type cosine \
>     --logging_steps 10 \
>     --save_steps 100 \
>     --eval_steps 100 \
>     --evaluation_strategy steps \
>     --learning_rate 5e-5 \
>     --num_train_epochs 3.0 \
>     --max_samples 3000 \
>     --val_size 0.1 \
>     --quantization_bit 4 \
>     --plot_loss \
>     --fp16
06/16/2024 11:41:47 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
06/16/2024 11:41:47 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-16 11:41:48,019 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/16/2024 11:41:48 - INFO - llmtuner.data.loader - Loading dataset alpaca_data_zh_demo.json...
06/16/2024 11:41:48 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
Converting format of dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.35 examples/s]
Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.03 examples/s]
input_ids:
[33975, 25, 49434, 239, 79478, 100007, 18493, 101254, 102438, 101940, 103135, 94432, 71703, 25, 220, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
inputs:
Human: 我们如何在日常生活中减少用水？
Assistant: 1. 使用节水装置，如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水，例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况，并及时修复它们。 
5. 洗澡时间缩短，使用低流量淋浴头节约用水。 
6. 收集雨水，用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水（来自洗衣机、浴室水槽和淋浴的水）。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
labels:
1. 使用节水装置，如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水，例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况，并及时修复它们。 
5. 洗澡时间缩短，使用低流量淋浴头节约用水。 
6. 收集雨水，用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水（来自洗衣机、浴室水槽和淋浴的水）。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-16 11:41:56,698 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-16 11:41:56,725 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/16/2024 11:41:56 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
[INFO|modeling_utils.py:3473] 2024-06-16 11:41:58,155 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-16 11:42:00,458 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-16 11:42:00,460 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:3615] 2024-06-16 11:42:12,047 >> Detected 4-bit loading: activating 4-bit loading for this model
[INFO|modeling_utils.py:4350] 2024-06-16 11:43:02,163 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-16 11:43:02,163 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-16 11:43:02,202 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-16 11:43:02,203 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/16/2024 11:43:02 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/16/2024 11:43:02 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/16/2024 11:43:02 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
Traceback (most recent call last):
  File "/root/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/root/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/root/LLaMA-Factory/src/llmtuner/train/tuner.py", line 32, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/root/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 60, in run_sft
    **split_dataset(dataset, data_args, training_args),
  File "/root/LLaMA-Factory/src/llmtuner/data/utils.py", line 87, in split_dataset
    dataset = dataset.train_test_split(test_size=val_size, seed=training_args.seed)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/fingerprint.py", line 482, in wrapper
    out = func(dataset, *args, **kwargs)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 4587, in train_test_split
    raise ValueError(
ValueError: With n_samples=1, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py     --stage sft     --do_train     --model_name_or_path ./Qwen/Qwen1.5-0.5B     --dataset alpaca_gpt4_en,glaive_toolcall     --dataset_dir data     --template default     --finetuning_type lora     --lora_target q_proj,v_proj     --output_dir saves/Qwen1.5-0.5B/qlora/sft     --overwrite_cache     --overwrite_output_dir     --cutoff_len 1024     --per_device_train_batch_size 1     --per_device_eval_batch_size 1     --gradient_accumulation_steps 8     --lr_scheduler_type cosine     --logging_steps 10     --save_steps 100     --eval_steps 100     --evaluation_strategy steps     --learning_rate 5e-5     --num_train_epochs 3.0     --max_samples 3000     --val_size 0.1     --quantization_bit 4     --plot_loss     --fp16
06/16/2024 11:50:35 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
06/16/2024 11:50:35 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-16 11:50:35,454 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/16/2024 11:50:35 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json...
Generating train split: 52002 examples [00:00, 103257.03 examples/s]
Converting format of dataset: 100%|██████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 25151.09 examples/s]
06/16/2024 11:50:38 - INFO - llmtuner.data.loader - Loading dataset glaive_toolcall_10k.json...
06/16/2024 11:50:39 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/glaive_toolcall_10k.json.
Generating train split: 10000 examples [00:00, 20636.20 examples/s]
Converting format of dataset: 100%|██████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 14743.53 examples/s]
Running tokenizer on dataset: 100%|████████████████████████████████████████████████████████████████████████████████████| 6000/6000 [00:09<00:00, 658.60 examples/s]
input_ids:
[33975, 25, 20678, 2326, 10414, 369, 19429, 9314, 624, 71703, 25, 220, 16, 13, 44514, 264, 23831, 323, 77116, 9968, 25, 7405, 2704, 697, 20969, 525, 28308, 315, 264, 8045, 315, 25322, 323, 23880, 11, 15651, 12833, 11, 4361, 40836, 11, 323, 9314, 49027, 13, 1096, 8609, 311, 3410, 697, 2487, 448, 279, 7565, 36393, 311, 729, 518, 1181, 1850, 323, 646, 1492, 5358, 20601, 18808, 382, 17, 13, 3285, 424, 304, 5792, 6961, 5702, 25, 32818, 374, 16587, 369, 20337, 3746, 24854, 11, 23648, 11, 323, 40613, 2820, 13, 70615, 369, 518, 3245, 220, 16, 20, 15, 4420, 315, 23193, 90390, 10158, 476, 220, 22, 20, 4420, 315, 70820, 10158, 1817, 2003, 382, 18, 13, 2126, 3322, 6084, 25, 24515, 3322, 4271, 6084, 374, 16587, 369, 6961, 323, 10502, 1632, 32751, 13, 1084, 8609, 311, 36277, 19671, 11, 7269, 24675, 729, 11, 323, 11554, 9314, 6513, 323, 22077, 729, 13, 70615, 369, 220, 22, 12, 24, 4115, 315, 6084, 1817, 3729, 13, 151643]
inputs:
Human: Give three tips for staying healthy.
Assistant: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 44514, 264, 23831, 323, 77116, 9968, 25, 7405, 2704, 697, 20969, 525, 28308, 315, 264, 8045, 315, 25322, 323, 23880, 11, 15651, 12833, 11, 4361, 40836, 11, 323, 9314, 49027, 13, 1096, 8609, 311, 3410, 697, 2487, 448, 279, 7565, 36393, 311, 729, 518, 1181, 1850, 323, 646, 1492, 5358, 20601, 18808, 382, 17, 13, 3285, 424, 304, 5792, 6961, 5702, 25, 32818, 374, 16587, 369, 20337, 3746, 24854, 11, 23648, 11, 323, 40613, 2820, 13, 70615, 369, 518, 3245, 220, 16, 20, 15, 4420, 315, 23193, 90390, 10158, 476, 220, 22, 20, 4420, 315, 70820, 10158, 1817, 2003, 382, 18, 13, 2126, 3322, 6084, 25, 24515, 3322, 4271, 6084, 374, 16587, 369, 6961, 323, 10502, 1632, 32751, 13, 1084, 8609, 311, 36277, 19671, 11, 7269, 24675, 729, 11, 323, 11554, 9314, 6513, 323, 22077, 729, 13, 70615, 369, 220, 22, 12, 24, 4115, 315, 6084, 1817, 3729, 13, 151643]
labels:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-16 11:50:52,228 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-16 11:50:52,232 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/16/2024 11:50:52 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
[INFO|modeling_utils.py:3473] 2024-06-16 11:50:52,535 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-16 11:50:52,548 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-16 11:50:52,550 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:3615] 2024-06-16 11:51:03,127 >> Detected 4-bit loading: activating 4-bit loading for this model
[INFO|modeling_utils.py:4350] 2024-06-16 11:51:39,672 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-16 11:51:39,672 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-16 11:51:39,678 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-16 11:51:39,678 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/16/2024 11:51:39 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/16/2024 11:51:39 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/16/2024 11:51:39 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
/root/.conda/envs/py310/lib/python3.10/site-packages/accelerate/accelerator.py:444: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  warnings.warn(
[INFO|trainer.py:571] 2024-06-16 11:51:40,367 >> Using auto half precision backend
[INFO|trainer.py:1721] 2024-06-16 11:51:40,522 >> ***** Running training *****
[INFO|trainer.py:1722] 2024-06-16 11:51:40,522 >>   Num examples = 5,400
[INFO|trainer.py:1723] 2024-06-16 11:51:40,522 >>   Num Epochs = 3
[INFO|trainer.py:1724] 2024-06-16 11:51:40,522 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1727] 2024-06-16 11:51:40,523 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1728] 2024-06-16 11:51:40,523 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1729] 2024-06-16 11:51:40,523 >>   Total optimization steps = 2,025
[INFO|trainer.py:1730] 2024-06-16 11:51:40,524 >>   Number of trainable parameters = 786,432
{'loss': 1.4165, 'learning_rate': 4.9996991493233693e-05, 'epoch': 0.01}                                                                                           
{'loss': 1.3805, 'learning_rate': 4.998796669702378e-05, 'epoch': 0.03}                                                                                            
{'loss': 1.4455, 'learning_rate': 4.997292778346312e-05, 'epoch': 0.04}                                                                                            
{'loss': 1.2532, 'learning_rate': 4.9951878372125547e-05, 'epoch': 0.06}                                                                                           
{'loss': 1.0753, 'learning_rate': 4.99248235291948e-05, 'epoch': 0.07}                                                                                             
{'loss': 1.0717, 'learning_rate': 4.989176976624511e-05, 'epoch': 0.09}                                                                                            
{'loss': 1.0417, 'learning_rate': 4.985272503867403e-05, 'epoch': 0.1}                                                                                             
{'loss': 1.1164, 'learning_rate': 4.9807698743787744e-05, 'epoch': 0.12}                                                                                           
{'loss': 1.0518, 'learning_rate': 4.975670171853926e-05, 'epoch': 0.13}                                                                                            
{'loss': 1.1058, 'learning_rate': 4.969974623692023e-05, 'epoch': 0.15}                                                                                            
  5%|█████▉                                                                                                                   | 100/2025 [10:25<2:28:34,  4.63s/it][INFO|trainer.py:3242] 2024-06-16 12:02:05,710 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:02:05,711 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:02:05,711 >>   Batch size = 1
{'eval_loss': 1.0310032367706299, 'eval_runtime': 115.7348, 'eval_samples_per_second': 5.184, 'eval_steps_per_second': 5.184, 'epoch': 0.15}                       
  5%|█████▉                                                                                                                   | 100/2025 [12:20<2:28:34,  4.63s/it[INFO|trainer.py:2936] 2024-06-16 12:04:01,477 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:04:01,775 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:04:01,782 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:04:01,786 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/added_tokens.json
{'loss': 1.072, 'learning_rate': 4.963684600700679e-05, 'epoch': 0.16}                                                                                             
{'loss': 1.0128, 'learning_rate': 4.9568016167660334e-05, 'epoch': 0.18}                                                                                           
{'loss': 1.1716, 'learning_rate': 4.9493273284883854e-05, 'epoch': 0.19}                                                                                           
{'loss': 1.0396, 'learning_rate': 4.941263534783482e-05, 'epoch': 0.21}                                                                                            
{'loss': 1.01, 'learning_rate': 4.9326121764495596e-05, 'epoch': 0.22}                                                                                             
{'loss': 0.9211, 'learning_rate': 4.923375335700223e-05, 'epoch': 0.24}                                                                                            
{'loss': 1.0502, 'learning_rate': 4.913555235663305e-05, 'epoch': 0.25}                                                                                            
{'loss': 1.0281, 'learning_rate': 4.9031542398457974e-05, 'epoch': 0.27}                                                                                           
{'loss': 0.8509, 'learning_rate': 4.892174851565004e-05, 'epoch': 0.28}                                                                                            
{'loss': 0.9885, 'learning_rate': 4.880619713346039e-05, 'epoch': 0.3}                                                                                             
 10%|███████████▉                                                                                                             | 200/2025 [20:43<2:14:14,  4.41s/it][INFO|trainer.py:3242] 2024-06-16 12:12:24,024 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:12:24,024 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:12:24,024 >>   Batch size = 1
{'eval_loss': 0.9859623312950134, 'eval_runtime': 97.9886, 'eval_samples_per_second': 6.123, 'eval_steps_per_second': 6.123, 'epoch': 0.3}                         
 10%|███████████▉                                                                                                             | 200/2025 [22:21<2:14:14,  4.41s/it[INFO|trainer.py:2936] 2024-06-16 12:14:02,035 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:14:02,233 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:14:02,239 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:14:02,243 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/added_tokens.json
{'loss': 0.9638, 'learning_rate': 4.868491606285823e-05, 'epoch': 0.31}                                                                                            
{'loss': 0.9803, 'learning_rate': 4.855793449383731e-05, 'epoch': 0.33}                                                                                            
{'loss': 1.0291, 'learning_rate': 4.8425282988390376e-05, 'epoch': 0.34}                                                                                           
{'loss': 0.9204, 'learning_rate': 4.828699347315356e-05, 'epoch': 0.36}                                                                                            
{'loss': 0.9777, 'learning_rate': 4.814309923172227e-05, 'epoch': 0.37}                                                                                            
{'loss': 0.9745, 'learning_rate': 4.7993634896640394e-05, 'epoch': 0.39}                                                                                           
{'loss': 0.9729, 'learning_rate': 4.783863644106502e-05, 'epoch': 0.4}                                                                                             
{'loss': 1.0564, 'learning_rate': 4.7678141170108345e-05, 'epoch': 0.41}                                                                                           
{'loss': 1.1198, 'learning_rate': 4.751218771185906e-05, 'epoch': 0.43}                                                                                            
{'loss': 0.9463, 'learning_rate': 4.734081600808531e-05, 'epoch': 0.44}                                                                                            
 15%|█████████████████▉                                                                                                       | 300/2025 [29:26<2:05:58,  4.38s/it][INFO|trainer.py:3242] 2024-06-16 12:21:07,491 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:21:07,491 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:21:07,491 >>   Batch size = 1
{'eval_loss': 0.9611995220184326, 'eval_runtime': 98.361, 'eval_samples_per_second': 6.1, 'eval_steps_per_second': 6.1, 'epoch': 0.44}                             
 15%|█████████████████▉                                                                                                       | 300/2025 [31:05<2:05:58,  4.38s/it[INFO|trainer.py:2936] 2024-06-16 12:22:45,866 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:22:46,018 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:22:46,023 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:22:46,026 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/added_tokens.json
{'loss': 0.8967, 'learning_rate': 4.7164067304621536e-05, 'epoch': 0.46}                                                                                           
{'loss': 1.0598, 'learning_rate': 4.700043126948131e-05, 'epoch': 0.47}                                                                                            
{'loss': 1.0318, 'learning_rate': 4.683250620667364e-05, 'epoch': 0.49}                                                                                            
{'loss': 1.0222, 'learning_rate': 4.664093230822264e-05, 'epoch': 0.5}                                                                                             
{'loss': 0.9537, 'learning_rate': 4.644414985846934e-05, 'epoch': 0.52}                                                                                            
{'loss': 1.1209, 'learning_rate': 4.624220621912029e-05, 'epoch': 0.53}                                                                                            
{'loss': 0.9613, 'learning_rate': 4.6035149994079896e-05, 'epoch': 0.55}                                                                                           
{'loss': 0.9111, 'learning_rate': 4.5823031017752485e-05, 'epoch': 0.56}                                                                                           
{'loss': 1.0196, 'learning_rate': 4.5605900343048116e-05, 'epoch': 0.58}                                                                                           
{'loss': 1.0488, 'learning_rate': 4.53838102290951e-05, 'epoch': 0.59}                                                                                             
 20%|███████████████████████▉                                                                                                 | 400/2025 [38:47<2:47:24,  6.18s/it][INFO|trainer.py:3242] 2024-06-16 12:30:28,206 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:30:28,206 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:30:28,206 >>   Batch size = 1
{'eval_loss': 0.9470090866088867, 'eval_runtime': 167.2868, 'eval_samples_per_second': 3.587, 'eval_steps_per_second': 3.587, 'epoch': 0.59}                       
 20%|███████████████████████▉                                                                                                 | 400/2025 [41:34<2:47:24,  6.18s/it[INFO|trainer.py:2936] 2024-06-16 12:33:15,509 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:33:15,678 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:33:15,684 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:33:15,687 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/added_tokens.json
{'loss': 0.886, 'learning_rate': 4.5156814128662285e-05, 'epoch': 0.61}                                                                                            
{'loss': 1.0296, 'learning_rate': 4.492496667529399e-05, 'epoch': 0.62}                                                                                            
{'loss': 0.9586, 'learning_rate': 4.468832367016079e-05, 'epoch': 0.64}                                                                                            
{'loss': 0.9578, 'learning_rate': 4.4446942068629284e-05, 'epoch': 0.65}                                                                                           
{'loss': 0.9419, 'learning_rate': 4.420087996655395e-05, 'epoch': 0.67}                                                                                            
{'eval_loss': 0.9470090866088867, 'eval_runtime': 167.2868, 'eval_samples_per_second': 3.587, 'eval_steps_per_second': 3.587, 'epoch': 0.59}                       
 20%|███████████████████████▉                                                                                                 | 400/2025 [41:34<2:47:24,  6.18s/it[INFO|trainer.py:2936] 2024-06-16 12:33:15,509 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:33:15,678 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:33:15,684 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:33:15,687 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/added_tokens.json
{'loss': 0.886, 'learning_rate': 4.5156814128662285e-05, 'epoch': 0.61}                                                                                            
{'loss': 1.0296, 'learning_rate': 4.492496667529399e-05, 'epoch': 0.62}                                                                                            
{'loss': 0.9586, 'learning_rate': 4.468832367016079e-05, 'epoch': 0.64}                                                                                            
{'loss': 0.9578, 'learning_rate': 4.4446942068629284e-05, 'epoch': 0.65}                                                                                           
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'loss': 0.8431, 'learning_rate': 4.235250420699552e-05, 'epoch': 0.77}                                                                                            
{'loss': 1.0225, 'learning_rate': 4.2071221671992086e-05, 'epoch': 0.79}                                                                                           
{'loss': 1.0528, 'learning_rate': 4.1785830426115893e-05, 'epoch': 0.8}                                                                                            
{'loss': 0.9257, 'learning_rate': 4.1496399157486486e-05, 'epoch': 0.81}                                                                                           
 28%|██████████████▉                                       | 559/2025 [5:02:52<45:39:48, 112.13s/it]{'loss': 0.9475, 'learning_rate': 4.1202997526578276e-05, 'epoch': 0.83}                            
{'loss': 0.886, 'learning_rate': 4.09056961494546e-05, 'epoch': 0.84}                               
{'loss': 0.8814, 'learning_rate': 4.060456658077183e-05, 'epoch': 0.86}                             
{'loss': 1.0195, 'learning_rate': 4.029968129655757e-05, 'epoch': 0.87}

--load_best_mode_at_end \

运行截图

(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
>     --stage sft \
>     --do_train \
>     --template default \
>     --model_name_or_path ./Qwen/Qwen1.5-0.5B \
>     --dataset alpaca_data_zh_demo \
>     --finetuning_type lora \
>     --lora_target q_proj,v_proj \
>     --output_dir ./path_to_pt_checkpoint \
>     --overwrite_cache \
>     --per_device_train_batch_size 4 \
>     --gradient_accumulation_steps 4 \
>     --lr_scheduler_type cosine \
>     --logging_steps 10 \
>     --save_steps 1000 \
>     --learning_rate 5e-5 \
>     --num_train_epochs 3.0 \
>     --plot_loss \
>     --fp16
06/09/2024 17:59:15 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,215 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,215 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-09 17:59:15,516 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/09/2024 17:59:15 - INFO - llmtuner.data.loader - Loading dataset alpaca_data_zh_demo.json...
06/09/2024 17:59:15 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
Generating train split: 1 examples [00:00,  2.85 examples/s]
Converting format of dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 17.95 examples/s]
Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.98 examples/s]
input_ids:
[33975, 25, 49434, 239, 79478, 100007, 18493, 101254, 102438, 101940, 103135, 94432, 71703, 25, 220, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
inputs:
Human: 我们如何在日常生活中减少用水？
Assistant: 1. 使用节水装置，如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水，例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况，并及时修复它们。 
5. 洗澡时间缩短，使用低流量淋浴头节约用水。 
6. 收集雨水，用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水（来自洗衣机、浴室水槽和淋浴的水）。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
labels:
1. 使用节水装置，如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水，例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况，并及时修复它们。 
5. 洗澡时间缩短，使用低流量淋浴头节约用水。 
6. 收集雨水，用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水（来自洗衣机、浴室水槽和淋浴的水）。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-09 17:59:24,557 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-09 17:59:24,566 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

[INFO|modeling_utils.py:3473] 2024-06-09 17:59:25,443 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-09 17:59:27,858 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-09 17:59:27,860 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:4350] 2024-06-09 18:00:13,863 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-09 18:00:13,863 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-09 18:00:13,891 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-09 18:00:13,891 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/09/2024 18:00:13 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/09/2024 18:00:13 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/09/2024 18:00:14 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
/root/.conda/envs/py310/lib/python3.10/site-packages/accelerate/accelerator.py:444: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  warnings.warn(
[INFO|trainer.py:571] 2024-06-09 18:00:14,322 >> Using auto half precision backend
[INFO|trainer.py:1721] 2024-06-09 18:00:14,484 >> ***** Running training *****
[INFO|trainer.py:1722] 2024-06-09 18:00:14,484 >>   Num examples = 1
[INFO|trainer.py:1723] 2024-06-09 18:00:14,484 >>   Num Epochs = 3
[INFO|trainer.py:1724] 2024-06-09 18:00:14,484 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1727] 2024-06-09 18:00:14,484 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1728] 2024-06-09 18:00:14,484 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1729] 2024-06-09 18:00:14,484 >>   Total optimization steps = 3
[INFO|trainer.py:1730] 2024-06-09 18:00:14,485 >>   Number of trainable parameters = 786,432
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.21s/it][INFO|trainer.py:1962] 2024-06-09 18:00:19,402 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 4.917, 'train_samples_per_second': 0.61, 'train_steps_per_second': 0.61, 'train_loss': 0.4967418909072876, 'epoch': 3.0}                         
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.64s/it]
[INFO|trainer.py:2936] 2024-06-09 18:00:19,412 >> Saving model checkpoint to ./path_to_pt_checkpoint
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-09 18:00:19,567 >> tokenizer config file saved in ./path_to_pt_checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-09 18:00:19,573 >> Special tokens file saved in ./path_to_pt_checkpoint/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-09 18:00:19,576 >> added tokens file saved in ./path_to_pt_checkpoint/added_tokens.json
***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.4967
  train_runtime            = 0:00:04.91
  train_samples_per_second =       0.61
  train_steps_per_second   =       0.61
06/09/2024 18:00:19 - WARNING - llmtuner.extras.ploting - No metric loss to plot.
06/09/2024 18:00:19 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.
[INFO|modelcard.py:452] 2024-06-09 18:00:19,934 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

推理

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B \--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint \--template default \--finetuning_type lora

(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B \--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint \--template default \--finetuning_type lora
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-09 18:11:41,436 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:727] 2024-06-09 18:11:41,437 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-09 18:11:41,441 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/09/2024 18:11:41 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3473] 2024-06-09 18:11:41,681 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-09 18:11:41,693 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:826] 2024-06-09 18:11:41,695 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:4350] 2024-06-09 18:11:55,341 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-09 18:11:55,341 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-09 18:11:55,345 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-09 18:11:55,345 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Loaded adapter(s): /root/LLaMA-Factory/path_to_pt_checkpoint
06/09/2024 18:11:55 - INFO - llmtuner.model.loader - all params: 463987712
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.

User: 我们如何在日常生活中减少用水？
Assistant: 为了减少用水，我们可以从以下几个方面入手：
1. 减少用水量：我们可以减少洗澡和淋浴的时间，使用节水龙头和淋浴头，关闭水龙头和淋浴头，避免洗完澡后忘记关水龙头，尽可能地使用淋浴喷头。
2. 淋浴时避免浪费水：淋浴时不要让水直接流出，应该将水缓慢地倒入盆中，以避免水流直接滴到地面，同时避免浪费水。
3. 安装节水设备：安装节水器、节水龙头、淋浴头等节水设备可以有效减少用水量。
4. 节约用水：在日常生活中，我们可以选择在不需要使用水时关闭水龙头，将水龙头换成节水型的，这样可以有效地节约用水。
5. 集中用水：将水放在一个地方，集中使用，避免浪费，同时也可以节约用水。
6. 优化用水习惯：养成良好的用水习惯，比如洗手时不要忘记关水龙头，洗完澡后及时关闭水龙头，可以有效减少用水量。
总之，减少用水需要我们从生活中的每个细节做起，从节约用水开始，从小事做起，才能更好地保护水资源，为我们的地球做出贡献。

合并 LoRA 权重并导出模型

CUDA_VISIBLE_DEVICES=0 python src/export_model.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B\--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint  \--template default \--finetuning_type lora \--export_dir path_to_export \--export_size 2 \--export_legacy_format False