大模型PEFT(二) 之 大模型LoRA指令微调实践

环境搭建

 git clone -b v0.6.1 --depth=1 https://github.com/hiyouga/LLaMA-Factory.git
 cd LLaMA-Factory
 conda create -n py310 python=3.10 
 source activate py310
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple --ignore-installed

疑问

!git lfs install
!git clone https://huggingface.co/Qwen/Qwen1.5-0.5B

出现如下输出,貌似并没有被安装

(py310) root@intern-studio-40072860:~/LLaMA-Factory# !git lfs install
git lfs install lfs install
Updated Git hooks.
Git LFS initialized.
(py310) root@intern-studio-40072860:~/LLaMA-Factory# !git clone https://huggingface.co/Qwen/Qwen1.5-0.5B
git lfs install lfs install clone https://huggingface.co/Qwen/Qwen1.5-0.5B
Updated Git hooks.

直接从huggingface安装

(py310) root@intern-studio-40072860:~/LLaMA-Factory# git clone https://huggingface.co/Qwen/Qwen1.5-0.5B
Cloning into 'Qwen1.5-0.5B'...
fatal: unable to access 'https://huggingface.co/Qwen/Qwen1.5-0.5B/': Received HTTP code 503 from proxy after CONNECT

从命令行(下载成功)

https://hf-mirror.com/

huggingface-cli download --resume-download Qwen/Qwen1.5-0.5B --local-dir Qwen/Qwen1.5-0.5B

推理

微调前(没有checkpoint,先进行微调)

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path path_to_llama_model \--adapter_name_or_path path_to_checkpoint \--template default \--finetuning_type lora

大模型指令监督微调

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train \
    --template default \
    --model_name_or_path ./Qwen/Qwen1.5-0.5B \
    --dataset alpaca_data_zh_demo \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./path_to_pt_checkpoint \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

大模型指令微调

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path ./Qwen/Qwen1.5-0.5B \
    --dataset alpaca_data_zh_demo \
    --dataset_dir data \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir saves/Qwen1.5-0.5B/qlora/sft \
    --overwrite_cache \
    --overwrite_output_dir \
    --cutoff_len 1024 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 100 \
    --eval_steps 100 \
    --evaluation_strategy steps \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --max_samples 3000 \
    --val_size 0.1 \
    --quantization_bit 4 \
    --plot_loss \
    --fp16

运行日志

(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
>     --stage sft \
>     --do_train \
>     --model_name_or_path ./Qwen/Qwen1.5-0.5B \
>     --dataset alpaca_data_zh_demo \
>     --dataset_dir data \
>     --template default \
>     --finetuning_type lora \
>     --lora_target q_proj,v_proj \
>     --output_dir saves/Qwen1.5-0.5B/qlora/sft \
>     --overwrite_cache \
>     --overwrite_output_dir \
>     --cutoff_len 1024 \
>     --per_device_train_batch_size 1 \
>     --per_device_eval_batch_size 1 \
>     --gradient_accumulation_steps 8 \
>     --lr_scheduler_type cosine \
>     --logging_steps 10 \
>     --save_steps 100 \
>     --eval_steps 100 \
>     --evaluation_strategy steps \
>     --learning_rate 5e-5 \
>     --num_train_epochs 3.0 \
>     --max_samples 3000 \
>     --val_size 0.1 \
>     --quantization_bit 4 \
>     --plot_loss \
>     --fp16
06/16/2024 11:41:47 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
06/16/2024 11:41:47 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:41:47,515 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-16 11:41:48,019 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/16/2024 11:41:48 - INFO - llmtuner.data.loader - Loading dataset alpaca_data_zh_demo.json...
06/16/2024 11:41:48 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
Converting format of dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.35 examples/s]
Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  8.03 examples/s]
input_ids:
[33975, 25, 49434, 239, 79478, 100007, 18493, 101254, 102438, 101940, 103135, 94432, 71703, 25, 220, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
inputs:
Human: 我们如何在日常生活中减少用水?
Assistant: 1. 使用节水装置,如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水,例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况,并及时修复它们。 
5. 洗澡时间缩短,使用低流量淋浴头节约用水。 
6. 收集雨水,用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水(来自洗衣机、浴室水槽和淋浴的水)。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
labels:
1. 使用节水装置,如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水,例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况,并及时修复它们。 
5. 洗澡时间缩短,使用低流量淋浴头节约用水。 
6. 收集雨水,用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水(来自洗衣机、浴室水槽和淋浴的水)。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-16 11:41:56,698 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-16 11:41:56,725 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/16/2024 11:41:56 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
[INFO|modeling_utils.py:3473] 2024-06-16 11:41:58,155 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-16 11:42:00,458 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-16 11:42:00,460 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:3615] 2024-06-16 11:42:12,047 >> Detected 4-bit loading: activating 4-bit loading for this model
[INFO|modeling_utils.py:4350] 2024-06-16 11:43:02,163 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-16 11:43:02,163 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-16 11:43:02,202 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-16 11:43:02,203 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/16/2024 11:43:02 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/16/2024 11:43:02 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/16/2024 11:43:02 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
Traceback (most recent call last):
  File "/root/LLaMA-Factory/src/train_bash.py", line 14, in <module>
    main()
  File "/root/LLaMA-Factory/src/train_bash.py", line 5, in main
    run_exp()
  File "/root/LLaMA-Factory/src/llmtuner/train/tuner.py", line 32, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/root/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 60, in run_sft
    **split_dataset(dataset, data_args, training_args),
  File "/root/LLaMA-Factory/src/llmtuner/data/utils.py", line 87, in split_dataset
    dataset = dataset.train_test_split(test_size=val_size, seed=training_args.seed)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/fingerprint.py", line 482, in wrapper
    out = func(dataset, *args, **kwargs)
  File "/root/.conda/envs/py310/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 4587, in train_test_split
    raise ValueError(
ValueError: With n_samples=1, test_size=0.1 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py     --stage sft     --do_train     --model_name_or_path ./Qwen/Qwen1.5-0.5B     --dataset alpaca_gpt4_en,glaive_toolcall     --dataset_dir data     --template default     --finetuning_type lora     --lora_target q_proj,v_proj     --output_dir saves/Qwen1.5-0.5B/qlora/sft     --overwrite_cache     --overwrite_output_dir     --cutoff_len 1024     --per_device_train_batch_size 1     --per_device_eval_batch_size 1     --gradient_accumulation_steps 8     --lr_scheduler_type cosine     --logging_steps 10     --save_steps 100     --eval_steps 100     --evaluation_strategy steps     --learning_rate 5e-5     --num_train_epochs 3.0     --max_samples 3000     --val_size 0.1     --quantization_bit 4     --plot_loss     --fp16
06/16/2024 11:50:35 - WARNING - llmtuner.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
06/16/2024 11:50:35 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-16 11:50:35,087 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-16 11:50:35,454 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/16/2024 11:50:35 - INFO - llmtuner.data.loader - Loading dataset alpaca_gpt4_data_en.json...
Generating train split: 52002 examples [00:00, 103257.03 examples/s]
Converting format of dataset: 100%|██████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 25151.09 examples/s]
06/16/2024 11:50:38 - INFO - llmtuner.data.loader - Loading dataset glaive_toolcall_10k.json...
06/16/2024 11:50:39 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/glaive_toolcall_10k.json.
Generating train split: 10000 examples [00:00, 20636.20 examples/s]
Converting format of dataset: 100%|██████████████████████████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 14743.53 examples/s]
Running tokenizer on dataset: 100%|████████████████████████████████████████████████████████████████████████████████████| 6000/6000 [00:09<00:00, 658.60 examples/s]
input_ids:
[33975, 25, 20678, 2326, 10414, 369, 19429, 9314, 624, 71703, 25, 220, 16, 13, 44514, 264, 23831, 323, 77116, 9968, 25, 7405, 2704, 697, 20969, 525, 28308, 315, 264, 8045, 315, 25322, 323, 23880, 11, 15651, 12833, 11, 4361, 40836, 11, 323, 9314, 49027, 13, 1096, 8609, 311, 3410, 697, 2487, 448, 279, 7565, 36393, 311, 729, 518, 1181, 1850, 323, 646, 1492, 5358, 20601, 18808, 382, 17, 13, 3285, 424, 304, 5792, 6961, 5702, 25, 32818, 374, 16587, 369, 20337, 3746, 24854, 11, 23648, 11, 323, 40613, 2820, 13, 70615, 369, 518, 3245, 220, 16, 20, 15, 4420, 315, 23193, 90390, 10158, 476, 220, 22, 20, 4420, 315, 70820, 10158, 1817, 2003, 382, 18, 13, 2126, 3322, 6084, 25, 24515, 3322, 4271, 6084, 374, 16587, 369, 6961, 323, 10502, 1632, 32751, 13, 1084, 8609, 311, 36277, 19671, 11, 7269, 24675, 729, 11, 323, 11554, 9314, 6513, 323, 22077, 729, 13, 70615, 369, 220, 22, 12, 24, 4115, 315, 6084, 1817, 3729, 13, 151643]
inputs:
Human: Give three tips for staying healthy.
Assistant: 1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 44514, 264, 23831, 323, 77116, 9968, 25, 7405, 2704, 697, 20969, 525, 28308, 315, 264, 8045, 315, 25322, 323, 23880, 11, 15651, 12833, 11, 4361, 40836, 11, 323, 9314, 49027, 13, 1096, 8609, 311, 3410, 697, 2487, 448, 279, 7565, 36393, 311, 729, 518, 1181, 1850, 323, 646, 1492, 5358, 20601, 18808, 382, 17, 13, 3285, 424, 304, 5792, 6961, 5702, 25, 32818, 374, 16587, 369, 20337, 3746, 24854, 11, 23648, 11, 323, 40613, 2820, 13, 70615, 369, 518, 3245, 220, 16, 20, 15, 4420, 315, 23193, 90390, 10158, 476, 220, 22, 20, 4420, 315, 70820, 10158, 1817, 2003, 382, 18, 13, 2126, 3322, 6084, 25, 24515, 3322, 4271, 6084, 374, 16587, 369, 6961, 323, 10502, 1632, 32751, 13, 1084, 8609, 311, 36277, 19671, 11, 7269, 24675, 729, 11, 323, 11554, 9314, 6513, 323, 22077, 729, 13, 70615, 369, 220, 22, 12, 24, 4115, 315, 6084, 1817, 3729, 13, 151643]
labels:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.

3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-16 11:50:52,228 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-16 11:50:52,232 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/16/2024 11:50:52 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit.
[INFO|modeling_utils.py:3473] 2024-06-16 11:50:52,535 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-16 11:50:52,548 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-16 11:50:52,550 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:3615] 2024-06-16 11:51:03,127 >> Detected 4-bit loading: activating 4-bit loading for this model
[INFO|modeling_utils.py:4350] 2024-06-16 11:51:39,672 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-16 11:51:39,672 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-16 11:51:39,678 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-16 11:51:39,678 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/16/2024 11:51:39 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/16/2024 11:51:39 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/16/2024 11:51:39 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
/root/.conda/envs/py310/lib/python3.10/site-packages/accelerate/accelerator.py:444: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  warnings.warn(
[INFO|trainer.py:571] 2024-06-16 11:51:40,367 >> Using auto half precision backend
[INFO|trainer.py:1721] 2024-06-16 11:51:40,522 >> ***** Running training *****
[INFO|trainer.py:1722] 2024-06-16 11:51:40,522 >>   Num examples = 5,400
[INFO|trainer.py:1723] 2024-06-16 11:51:40,522 >>   Num Epochs = 3
[INFO|trainer.py:1724] 2024-06-16 11:51:40,522 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1727] 2024-06-16 11:51:40,523 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1728] 2024-06-16 11:51:40,523 >>   Gradient Accumulation steps = 8
[INFO|trainer.py:1729] 2024-06-16 11:51:40,523 >>   Total optimization steps = 2,025
[INFO|trainer.py:1730] 2024-06-16 11:51:40,524 >>   Number of trainable parameters = 786,432
{'loss': 1.4165, 'learning_rate': 4.9996991493233693e-05, 'epoch': 0.01}                                                                                           
{'loss': 1.3805, 'learning_rate': 4.998796669702378e-05, 'epoch': 0.03}                                                                                            
{'loss': 1.4455, 'learning_rate': 4.997292778346312e-05, 'epoch': 0.04}                                                                                            
{'loss': 1.2532, 'learning_rate': 4.9951878372125547e-05, 'epoch': 0.06}                                                                                           
{'loss': 1.0753, 'learning_rate': 4.99248235291948e-05, 'epoch': 0.07}                                                                                             
{'loss': 1.0717, 'learning_rate': 4.989176976624511e-05, 'epoch': 0.09}                                                                                            
{'loss': 1.0417, 'learning_rate': 4.985272503867403e-05, 'epoch': 0.1}                                                                                             
{'loss': 1.1164, 'learning_rate': 4.9807698743787744e-05, 'epoch': 0.12}                                                                                           
{'loss': 1.0518, 'learning_rate': 4.975670171853926e-05, 'epoch': 0.13}                                                                                            
{'loss': 1.1058, 'learning_rate': 4.969974623692023e-05, 'epoch': 0.15}                                                                                            
  5%|█████▉                                                                                                                   | 100/2025 [10:25<2:28:34,  4.63s/it][INFO|trainer.py:3242] 2024-06-16 12:02:05,710 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:02:05,711 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:02:05,711 >>   Batch size = 1
{'eval_loss': 1.0310032367706299, 'eval_runtime': 115.7348, 'eval_samples_per_second': 5.184, 'eval_steps_per_second': 5.184, 'epoch': 0.15}                       
  5%|█████▉                                                                                                                   | 100/2025 [12:20<2:28:34,  4.63s/it[INFO|trainer.py:2936] 2024-06-16 12:04:01,477 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:04:01,775 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:04:01,782 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:04:01,786 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-100/added_tokens.json
{'loss': 1.072, 'learning_rate': 4.963684600700679e-05, 'epoch': 0.16}                                                                                             
{'loss': 1.0128, 'learning_rate': 4.9568016167660334e-05, 'epoch': 0.18}                                                                                           
{'loss': 1.1716, 'learning_rate': 4.9493273284883854e-05, 'epoch': 0.19}                                                                                           
{'loss': 1.0396, 'learning_rate': 4.941263534783482e-05, 'epoch': 0.21}                                                                                            
{'loss': 1.01, 'learning_rate': 4.9326121764495596e-05, 'epoch': 0.22}                                                                                             
{'loss': 0.9211, 'learning_rate': 4.923375335700223e-05, 'epoch': 0.24}                                                                                            
{'loss': 1.0502, 'learning_rate': 4.913555235663305e-05, 'epoch': 0.25}                                                                                            
{'loss': 1.0281, 'learning_rate': 4.9031542398457974e-05, 'epoch': 0.27}                                                                                           
{'loss': 0.8509, 'learning_rate': 4.892174851565004e-05, 'epoch': 0.28}                                                                                            
{'loss': 0.9885, 'learning_rate': 4.880619713346039e-05, 'epoch': 0.3}                                                                                             
 10%|███████████▉                                                                                                             | 200/2025 [20:43<2:14:14,  4.41s/it][INFO|trainer.py:3242] 2024-06-16 12:12:24,024 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:12:24,024 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:12:24,024 >>   Batch size = 1
{'eval_loss': 0.9859623312950134, 'eval_runtime': 97.9886, 'eval_samples_per_second': 6.123, 'eval_steps_per_second': 6.123, 'epoch': 0.3}                         
 10%|███████████▉                                                                                                             | 200/2025 [22:21<2:14:14,  4.41s/it[INFO|trainer.py:2936] 2024-06-16 12:14:02,035 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:14:02,233 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:14:02,239 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:14:02,243 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-200/added_tokens.json
{'loss': 0.9638, 'learning_rate': 4.868491606285823e-05, 'epoch': 0.31}                                                                                            
{'loss': 0.9803, 'learning_rate': 4.855793449383731e-05, 'epoch': 0.33}                                                                                            
{'loss': 1.0291, 'learning_rate': 4.8425282988390376e-05, 'epoch': 0.34}                                                                                           
{'loss': 0.9204, 'learning_rate': 4.828699347315356e-05, 'epoch': 0.36}                                                                                            
{'loss': 0.9777, 'learning_rate': 4.814309923172227e-05, 'epoch': 0.37}                                                                                            
{'loss': 0.9745, 'learning_rate': 4.7993634896640394e-05, 'epoch': 0.39}                                                                                           
{'loss': 0.9729, 'learning_rate': 4.783863644106502e-05, 'epoch': 0.4}                                                                                             
{'loss': 1.0564, 'learning_rate': 4.7678141170108345e-05, 'epoch': 0.41}                                                                                           
{'loss': 1.1198, 'learning_rate': 4.751218771185906e-05, 'epoch': 0.43}                                                                                            
{'loss': 0.9463, 'learning_rate': 4.734081600808531e-05, 'epoch': 0.44}                                                                                            
 15%|█████████████████▉                                                                                                       | 300/2025 [29:26<2:05:58,  4.38s/it][INFO|trainer.py:3242] 2024-06-16 12:21:07,491 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:21:07,491 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:21:07,491 >>   Batch size = 1
{'eval_loss': 0.9611995220184326, 'eval_runtime': 98.361, 'eval_samples_per_second': 6.1, 'eval_steps_per_second': 6.1, 'epoch': 0.44}                             
 15%|█████████████████▉                                                                                                       | 300/2025 [31:05<2:05:58,  4.38s/it[INFO|trainer.py:2936] 2024-06-16 12:22:45,866 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:22:46,018 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:22:46,023 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:22:46,026 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-300/added_tokens.json
{'loss': 0.8967, 'learning_rate': 4.7164067304621536e-05, 'epoch': 0.46}                                                                                           
{'loss': 1.0598, 'learning_rate': 4.700043126948131e-05, 'epoch': 0.47}                                                                                            
{'loss': 1.0318, 'learning_rate': 4.683250620667364e-05, 'epoch': 0.49}                                                                                            
{'loss': 1.0222, 'learning_rate': 4.664093230822264e-05, 'epoch': 0.5}                                                                                             
{'loss': 0.9537, 'learning_rate': 4.644414985846934e-05, 'epoch': 0.52}                                                                                            
{'loss': 1.1209, 'learning_rate': 4.624220621912029e-05, 'epoch': 0.53}                                                                                            
{'loss': 0.9613, 'learning_rate': 4.6035149994079896e-05, 'epoch': 0.55}                                                                                           
{'loss': 0.9111, 'learning_rate': 4.5823031017752485e-05, 'epoch': 0.56}                                                                                           
{'loss': 1.0196, 'learning_rate': 4.5605900343048116e-05, 'epoch': 0.58}                                                                                           
{'loss': 1.0488, 'learning_rate': 4.53838102290951e-05, 'epoch': 0.59}                                                                                             
 20%|███████████████████████▉                                                                                                 | 400/2025 [38:47<2:47:24,  6.18s/it][INFO|trainer.py:3242] 2024-06-16 12:30:28,206 >> ***** Running Evaluation *****
[INFO|trainer.py:3244] 2024-06-16 12:30:28,206 >>   Num examples = 600
[INFO|trainer.py:3247] 2024-06-16 12:30:28,206 >>   Batch size = 1
{'eval_loss': 0.9470090866088867, 'eval_runtime': 167.2868, 'eval_samples_per_second': 3.587, 'eval_steps_per_second': 3.587, 'epoch': 0.59}                       
 20%|███████████████████████▉                                                                                                 | 400/2025 [41:34<2:47:24,  6.18s/it[INFO|trainer.py:2936] 2024-06-16 12:33:15,509 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:33:15,678 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:33:15,684 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:33:15,687 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/added_tokens.json
{'loss': 0.886, 'learning_rate': 4.5156814128662285e-05, 'epoch': 0.61}                                                                                            
{'loss': 1.0296, 'learning_rate': 4.492496667529399e-05, 'epoch': 0.62}                                                                                            
{'loss': 0.9586, 'learning_rate': 4.468832367016079e-05, 'epoch': 0.64}                                                                                            
{'loss': 0.9578, 'learning_rate': 4.4446942068629284e-05, 'epoch': 0.65}                                                                                           
{'loss': 0.9419, 'learning_rate': 4.420087996655395e-05, 'epoch': 0.67}                                                                                            
{'eval_loss': 0.9470090866088867, 'eval_runtime': 167.2868, 'eval_samples_per_second': 3.587, 'eval_steps_per_second': 3.587, 'epoch': 0.59}                       
 20%|███████████████████████▉                                                                                                 | 400/2025 [41:34<2:47:24,  6.18s/it[INFO|trainer.py:2936] 2024-06-16 12:33:15,509 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 12:33:15,678 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 12:33:15,684 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 12:33:15,687 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-400/added_tokens.json
{'loss': 0.886, 'learning_rate': 4.5156814128662285e-05, 'epoch': 0.61}                                                                                            
{'loss': 1.0296, 'learning_rate': 4.492496667529399e-05, 'epoch': 0.62}                                                                                            
{'loss': 0.9586, 'learning_rate': 4.468832367016079e-05, 'epoch': 0.64}                                                                                            
{'loss': 0.9578, 'learning_rate': 4.4446942068629284e-05, 'epoch': 0.65}                                                                                           
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'eval_loss': 0.9372147917747498, 'eval_runtime': 3043.4326, 'eval_samples_per_second': 0.197, 'eval_steps_per_second': 0.197, 'epoch': 0.74}                      
 25%|█████████████████████████████▏                                                                                        | 500/2025 [2:56:33<37:38:05, 88.84s/it[INFO|trainer.py:2936] 2024-06-16 14:48:14,344 >> Saving model checkpoint to saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500                                        
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-16 14:48:14,501 >> tokenizer config file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-16 14:48:14,508 >> Special tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-16 14:48:14,511 >> added tokens file saved in saves/Qwen1.5-0.5B/qlora/sft/tmp-checkpoint-500/added_tokens.json
{'loss': 1.0124, 'learning_rate': 4.262961033189341e-05, 'epoch': 0.76}                                                                                            
{'loss': 0.8431, 'learning_rate': 4.235250420699552e-05, 'epoch': 0.77}                                                                                            
{'loss': 1.0225, 'learning_rate': 4.2071221671992086e-05, 'epoch': 0.79}                                                                                           
{'loss': 1.0528, 'learning_rate': 4.1785830426115893e-05, 'epoch': 0.8}                                                                                            
{'loss': 0.9257, 'learning_rate': 4.1496399157486486e-05, 'epoch': 0.81}                                                                                           
 28%|██████████████▉                                       | 559/2025 [5:02:52<45:39:48, 112.13s/it]{'loss': 0.9475, 'learning_rate': 4.1202997526578276e-05, 'epoch': 0.83}                            
{'loss': 0.886, 'learning_rate': 4.09056961494546e-05, 'epoch': 0.84}                               
{'loss': 0.8814, 'learning_rate': 4.060456658077183e-05, 'epoch': 0.86}                             
{'loss': 1.0195, 'learning_rate': 4.029968129655757e-05, 'epoch': 0.87}
--load_best_mode_at_end \

运行截图

(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
>     --stage sft \
>     --do_train \
>     --template default \
>     --model_name_or_path ./Qwen/Qwen1.5-0.5B \
>     --dataset alpaca_data_zh_demo \
>     --finetuning_type lora \
>     --lora_target q_proj,v_proj \
>     --output_dir ./path_to_pt_checkpoint \
>     --overwrite_cache \
>     --per_device_train_batch_size 4 \
>     --gradient_accumulation_steps 4 \
>     --lr_scheduler_type cosine \
>     --logging_steps 10 \
>     --save_steps 1000 \
>     --learning_rate 5e-5 \
>     --num_train_epochs 3.0 \
>     --plot_loss \
>     --fp16
06/09/2024 17:59:15 - INFO - llmtuner.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,215 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,215 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 17:59:15,216 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-09 17:59:15,516 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
06/09/2024 17:59:15 - INFO - llmtuner.data.loader - Loading dataset alpaca_data_zh_demo.json...
06/09/2024 17:59:15 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json.
Generating train split: 1 examples [00:00,  2.85 examples/s]
Converting format of dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 17.95 examples/s]
Running tokenizer on dataset: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.98 examples/s]
input_ids:
[33975, 25, 49434, 239, 79478, 100007, 18493, 101254, 102438, 101940, 103135, 94432, 71703, 25, 220, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
inputs:
Human: 我们如何在日常生活中减少用水?
Assistant: 1. 使用节水装置,如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水,例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况,并及时修复它们。 
5. 洗澡时间缩短,使用低流量淋浴头节约用水。 
6. 收集雨水,用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水(来自洗衣机、浴室水槽和淋浴的水)。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 16, 13, 85658, 113886, 104919, 3837, 29524, 113886, 101724, 100969, 102125, 64355, 33108, 52510, 102676, 1773, 715, 17, 13, 85658, 52510, 73296, 57191, 52510, 101508, 104412, 101064, 110628, 3837, 77557, 99634, 102565, 33108, 99634, 100969, 1773, 715, 18, 13, 73562, 103935, 15946, 100627, 113886, 100708, 1773, 715, 19, 13, 6567, 96, 222, 32876, 112044, 33108, 112892, 105743, 117624, 99559, 90395, 100667, 104749, 104017, 1773, 715, 20, 13, 6567, 112, 245, 103339, 20450, 107606, 3837, 37029, 99285, 104242, 101724, 100969, 64355, 105455, 103135, 1773, 715, 21, 13, 80090, 114, 42067, 110375, 3837, 100751, 99354, 99434, 105994, 65676, 112147, 100466, 1773, 715, 22, 13, 19468, 115, 100446, 57191, 101432, 44934, 13343, 29256, 100373, 52510, 102676, 1773, 715, 23, 13, 65727, 237, 82647, 118158, 114826, 101975, 1773, 715, 24, 13, 58230, 121, 87267, 111438, 105444, 37029, 100815, 52510, 9909, 101919, 113642, 5373, 113051, 52510, 102776, 33108, 101724, 100969, 9370, 52510, 74276, 715, 16, 15, 13, 26853, 103, 103946, 100727, 101991, 100964, 99634, 102565, 32648, 33108, 113642, 1773, 151643]
labels:
1. 使用节水装置,如节水淋浴喷头和水龙头。 
2. 使用水箱或水桶收集家庭废水,例如洗碗和洗浴。 
3. 在社区中提高节水意识。 
4. 检查水管和灌溉系统的漏水情况,并及时修复它们。 
5. 洗澡时间缩短,使用低流量淋浴头节约用水。 
6. 收集雨水,用于园艺或其他非饮用目的。 
7. 刷牙或擦手时关掉水龙头。 
8. 减少浇水草坪的时间。 
9. 尽可能多地重复使用灰水(来自洗衣机、浴室水槽和淋浴的水)。 
10. 只购买能源效率高的洗碗机和洗衣机。<|endoftext|>
[INFO|configuration_utils.py:727] 2024-06-09 17:59:24,557 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-09 17:59:24,566 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

[INFO|modeling_utils.py:3473] 2024-06-09 17:59:25,443 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-09 17:59:27,858 >> Instantiating Qwen2ForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:826] 2024-06-09 17:59:27,860 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:4350] 2024-06-09 18:00:13,863 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-09 18:00:13,863 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-09 18:00:13,891 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-09 18:00:13,891 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/09/2024 18:00:13 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled.
06/09/2024 18:00:13 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/09/2024 18:00:14 - INFO - llmtuner.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692
/root/.conda/envs/py310/lib/python3.10/site-packages/accelerate/accelerator.py:444: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
  warnings.warn(
[INFO|trainer.py:571] 2024-06-09 18:00:14,322 >> Using auto half precision backend
[INFO|trainer.py:1721] 2024-06-09 18:00:14,484 >> ***** Running training *****
[INFO|trainer.py:1722] 2024-06-09 18:00:14,484 >>   Num examples = 1
[INFO|trainer.py:1723] 2024-06-09 18:00:14,484 >>   Num Epochs = 3
[INFO|trainer.py:1724] 2024-06-09 18:00:14,484 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1727] 2024-06-09 18:00:14,484 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:1728] 2024-06-09 18:00:14,484 >>   Gradient Accumulation steps = 4
[INFO|trainer.py:1729] 2024-06-09 18:00:14,484 >>   Total optimization steps = 3
[INFO|trainer.py:1730] 2024-06-09 18:00:14,485 >>   Number of trainable parameters = 786,432
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.21s/it][INFO|trainer.py:1962] 2024-06-09 18:00:19,402 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


{'train_runtime': 4.917, 'train_samples_per_second': 0.61, 'train_steps_per_second': 0.61, 'train_loss': 0.4967418909072876, 'epoch': 3.0}                         
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.64s/it]
[INFO|trainer.py:2936] 2024-06-09 18:00:19,412 >> Saving model checkpoint to ./path_to_pt_checkpoint
/root/.conda/envs/py310/lib/python3.10/site-packages/peft/utils/save_and_load.py:195: UserWarning: Could not find a config file in ./Qwen/Qwen1.5-0.5B - will assume that the vocabulary was not modified.
  warnings.warn(
[INFO|tokenization_utils_base.py:2433] 2024-06-09 18:00:19,567 >> tokenizer config file saved in ./path_to_pt_checkpoint/tokenizer_config.json
[INFO|tokenization_utils_base.py:2442] 2024-06-09 18:00:19,573 >> Special tokens file saved in ./path_to_pt_checkpoint/special_tokens_map.json
[INFO|tokenization_utils_base.py:2493] 2024-06-09 18:00:19,576 >> added tokens file saved in ./path_to_pt_checkpoint/added_tokens.json
***** train metrics *****
  epoch                    =        3.0
  train_loss               =     0.4967
  train_runtime            = 0:00:04.91
  train_samples_per_second =       0.61
  train_steps_per_second   =       0.61
06/09/2024 18:00:19 - WARNING - llmtuner.extras.ploting - No metric loss to plot.
06/09/2024 18:00:19 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.
[INFO|modelcard.py:452] 2024-06-09 18:00:19,934 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}

推理

CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B \--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint \--template default \--finetuning_type lora
(py310) root@intern-studio-40072860:~/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B \--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint \--template default \--finetuning_type lora
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file vocab.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file merges.txt
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2025] 2024-06-09 18:11:41,109 >> loading file tokenizer.json
[WARNING|logging.py:314] 2024-06-09 18:11:41,436 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|configuration_utils.py:727] 2024-06-09 18:11:41,437 >> loading configuration file ./Qwen/Qwen1.5-0.5B/config.json
[INFO|configuration_utils.py:792] 2024-06-09 18:11:41,441 >> Model config Qwen2Config {
  "_name_or_path": "./Qwen/Qwen1.5-0.5B",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 2816,
  "max_position_embeddings": 32768,
  "max_window_layers": 21,
  "model_type": "qwen2",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "num_key_value_heads": 16,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": true,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.37.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

06/09/2024 18:11:41 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3473] 2024-06-09 18:11:41,681 >> loading weights file ./Qwen/Qwen1.5-0.5B/model.safetensors
[INFO|modeling_utils.py:1426] 2024-06-09 18:11:41,693 >> Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:826] 2024-06-09 18:11:41,695 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643
}

[INFO|modeling_utils.py:4350] 2024-06-09 18:11:55,341 >> All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[INFO|modeling_utils.py:4358] 2024-06-09 18:11:55,341 >> All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at ./Qwen/Qwen1.5-0.5B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-06-09 18:11:55,345 >> loading configuration file ./Qwen/Qwen1.5-0.5B/generation_config.json
[INFO|configuration_utils.py:826] 2024-06-09 18:11:55,345 >> Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048
}

06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
06/09/2024 18:11:55 - INFO - llmtuner.model.adapter - Loaded adapter(s): /root/LLaMA-Factory/path_to_pt_checkpoint
06/09/2024 18:11:55 - INFO - llmtuner.model.loader - all params: 463987712
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.

User: 我们如何在日常生活中减少用水?
Assistant: 为了减少用水,我们可以从以下几个方面入手:
1. 减少用水量:我们可以减少洗澡和淋浴的时间,使用节水龙头和淋浴头,关闭水龙头和淋浴头,避免洗完澡后忘记关水龙头,尽可能地使用淋浴喷头。
2. 淋浴时避免浪费水:淋浴时不要让水直接流出,应该将水缓慢地倒入盆中,以避免水流直接滴到地面,同时避免浪费水。
3. 安装节水设备:安装节水器、节水龙头、淋浴头等节水设备可以有效减少用水量。
4. 节约用水:在日常生活中,我们可以选择在不需要使用水时关闭水龙头,将水龙头换成节水型的,这样可以有效地节约用水。
5. 集中用水:将水放在一个地方,集中使用,避免浪费,同时也可以节约用水。
6. 优化用水习惯:养成良好的用水习惯,比如洗手时不要忘记关水龙头,洗完澡后及时关闭水龙头,可以有效减少用水量。
总之,减少用水需要我们从生活中的每个细节做起,从节约用水开始,从小事做起,才能更好地保护水资源,为我们的地球做出贡献。

合并 LoRA 权重并导出模型

CUDA_VISIBLE_DEVICES=0 python src/export_model.py \--model_name_or_path ./Qwen/Qwen1.5-0.5B\--adapter_name_or_path /root/LLaMA-Factory/path_to_pt_checkpoint  \--template default \--finetuning_type lora \--export_dir path_to_export \--export_size 2 \--export_legacy_format False

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
在实战操作中,PEFT库可以用来微调BERT模型,以进行文本情感分类任务。首先,我们需要准备一个包含大量文本和标签的数据集,用于训练和验证BERT模型。然后,我们需要利用PEFT库中提供的工具和接口,将数据集转换成BERT模型可接受的格式,并进行数据预处理,如分词和填充等操作。 接着,我们可以利用PEFT库中提供的预训练模型,加载BERT模型的参数和网络结构,并在数据集上进行微调微调的过程中,我们可以通过调整学习率、批大小和训练轮数等超参数,来优化模型的性能。在每个训练轮数结束后,我们可以利用PEFT库中的评估工具对模型进行评估,以了解模型在验证集上的性能表现。 最后,当模型在验证集上的性能达到满意的水平后,我们可以使用PEFT库提供的保存模型工具,将微调后的BERT模型保存下来,以备在实际应用中使用。通过PEFT库的实战操作,我们可以有效地利用BERT模型进行文本情感分类任务,提高模型的准确性和泛化能力,从而更好地满足实际应用的需求。 PEFT库的实战操作不仅帮助我们更好地理解和使用BERT模型,也为我们提供了一套完整的工具和流程,使得模型训练和应用变得更加简单和高效。 PEFT库实战(一): lora微调BERT(文本情感分类) 的操作流程清晰,易于上手,为我们在文本情感分类任务中的应用提供了有力的支持。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值