序言
本文以 LLaMA-Factory 为例,在超算互联网平台SCNet上使用国产异构加速卡AI 显存64GB PCIE,对 Llama3-8B-Instruct 模型进行 LoRA 微调、合并和推理。
一、参考资料
github仓库代码:LLaMA-Factory,使用最新的代码分支:v0.8.3
。
二、准备环境
1. 系统镜像
以jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04-py310
镜像为例。
2. 软硬件依赖
特别注意:由 requirements.txt
文件可知,LLaMA-Factory项目要求最低版本 transformers>=4.41.2
,vllm>=0.4.3
。
必需项 | 至少 | 推荐 |
---|---|---|
python | 3.8 | 3.11 |
torch | 1.13.1 | 2.3.0 |
transformers | 4.41.2 | 4.41.2 |
datasets | 2.16.0 | 2.19.2 |
accelerate | 0.30.1 | 0.30.1 |
peft | 0.11.1 | 0.11.1 |
trl | 0.8.6 | 0.9.4 |
可选项 | 至少 | 推荐 |
---|---|---|
CUDA | 11.6 | 12.2 |
deepspeed | 0.10.0 | 0.14.0 |
bitsandbytes | 0.39.0 | 0.43.1 |
vllm | 0.4.3 | 0.4.3 |
flash-attn | 2.3.0 | 2.5.9 |
3. 克隆base环境
root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# conda create -n llama_factory_torch --clone base
Source: /opt/conda
Destination: /opt/conda/envs/llama_factory_torch
The following packages cannot be cloned out of the root environment:
- https://repo.anaconda.com/pkgs/main/linux-64::conda-23.7.4-py310h06a4308_0
Packages: 44
Files: 53489
Downloading and Extracting Packages
Downloading and Extracting Packages
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate llama_factory_torch
#
# To deactivate an active environment, use
#
# $ conda deactivate
4. 安装 LLaMA Factory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# source activate llama_factory_torch
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install -e ".[torch,metrics]"
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
...
Checking if build backend supports build_editable ... done
Building wheels for collected packages: llamafactory
Building editable for llamafactory (pyproject.toml) ... done
Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=70c0480e2b648516e0eac3d39371d4100cbdaa1f277d87b657bf2adec9e0b2be
Stored in directory: /tmp/pip-ephem-wheel-cache-uhypmj_8/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
DEPRECATION: lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of lmdeploy or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: mmcv 2.0.1-gitc0ccf15.abi0.dtk2404.torch2.1. has a non-standard version number. pip 24.1 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of mmcv or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: pydub, jieba, urllib3, tomlkit, shtab, semantic-version, scipy, ruff, rouge-chinese, joblib, importlib-resources, ffmpy, docstring-parser, aiofiles, nltk, tyro, sse-starlette, tokenizers, gradio-client, transformers, trl, peft, gradio, llamafactory
Attempting uninstall: urllib3
Found existing installation: urllib3 1.26.13
Uninstalling urllib3-1.26.13:
Successfully uninstalled urllib3-1.26.13
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.15.0
Uninstalling tokenizers-0.15.0:
Successfully uninstalled tokenizers-0.15.0
Attempting uninstall: transformers
Found existing installation: transformers 4.38.0
Uninstalling transformers-4.38.0:
Successfully uninstalled transformers-4.38.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lmdeploy 0.1.0-git782048c.abi0.dtk2404.torch2.1. requires transformers==4.33.2, but you have transformers 4.43.3 which is incompatible.
Successfully installed aiofiles-23.2.1 docstring-parser-0.16 ffmpy-0.4.0 gradio-4.40.0 gradio-client-1.2.0 importlib-resources-6.4.0 jieba-0.42.1 joblib-1.4.2 llamafactory-0.8.4.dev0 nltk-3.8.1 peft-0.12.0 pydub-0.25.1 rouge-chinese-1.0.3 ruff-0.5.5 scipy-1.14.0 semantic-version-2.10.0 shtab-1.7.1 sse-starlette-2.1.3 tokenizers-0.19.1 tomlkit-0.12.0 transformers-4.43.3 trl-0.9.6 tyro-0.8.5 urllib3-2.2.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
5. 解决依赖包冲突
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-deps -e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Obtaining file:///public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory
Installing build dependencies ... done
Checking if build backend supports build_editable ... done
Getting requirements to build editable ... done
Preparing editable metadata (pyproject.toml) ... done
Building wheels for collected packages: llamafactory
Building editable for llamafactory (pyproject.toml) ... done
Created wheel for llamafactory: filename=llamafactory-0.8.4.dev0-0.editable-py3-none-any.whl size=20781 sha256=f874a791bc9fdca02075cda0459104b48a57d300a077eca00eee7221cde429c3
Stored in directory: /tmp/pip-ephem-wheel-cache-7vjiq3f3/wheels/e9/b4/89/f13e921e37904ee0c839434aad2d7b2951c2c68e596667c7ef
Successfully built llamafactory
Installing collected packages: llamafactory
Attempting uninstall: llamafactory
Found existing installation: llamafactory 0.8.4.dev0
Uninstalling llamafactory-0.8.4.dev0:
Successfully uninstalled llamafactory-0.8.4.dev0
Successfully installed llamafactory-0.8.4.dev0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
6. 安装 vllm==0.4.3
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip list | grep llvm
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies vllm==0.4.3
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting vllm==0.4.3
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/1a/1e/10bcb6566f4fa8b95ff85bddfd1675ff7db33ba861f59bd70aa3b92a46b7/vllm-0.4.3-cp310-cp310-manylinux1_x86_64.whl (131.1 MB)
Installing collected packages: vllm
Attempting uninstall: vllm
Found existing installation: vllm 0.3.3+git3380931.abi0.dtk2404.torch2.1
Uninstalling vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1:
Successfully uninstalled vllm-0.3.3+git3380931.abi0.dtk2404.torch2.1
Successfully installed vllm-0.4.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 24.0 -> 24.2
[notice] To update, run: pip install --upgrade pip
三、关键步骤
1. 获取Access Token
获取Access Token,并登录Hugging Face 账户。
详细步骤,请参考另一篇博客:Hugging Face和ModelScope大模型/数据集的下载加速方法
pip install --upgrade huggingface_hub
huggingface-cli login
2. llamafactory-cli
指令
使用 llamafactory-cli help
显示帮助信息。
(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli help
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or ` libpng` installed before building `torchvision` from source?
warn(
[2024-08-01 15:12:24,629] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
----------------------------------------------------------------------
| Usage: |
| llamafactory-cli api -h: launch an OpenAI-style API server |
| llamafactory-cli chat -h: launch a chat interface in CLI |
| llamafactory-cli eval -h: evaluate models |
| llamafactory-cli export -h: merge LoRA adapters and export model |
| llamafactory-cli train -h: train models |
| llamafactory-cli webchat -h: launch a chat interface in Web UI |
| llamafactory-cli webui: launch LlamaBoard |
| llamafactory-cli version: show version info |
----------------------------------------------------------------------
3. 快速开始
下面三行命令分别对 Llama3-8B-Instruct 模型进行 LoRA 微调、合并和推理。
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
3.1 LoRA 微调
模型微调训练是在国产异构加速卡上进行的。
3.1.1 单卡环境
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:06:41,134] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)
08/01/2024 19:06:44 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri buted training: False, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 19:06:45,194 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 19:06:45,563 >> Special tokens have been added in the voca bulary, make sure the associated word embeddings are fine-tuned or trained.
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/01/2024 19:06:45 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 444.18 examples/s]
08/01/2024 19:06:47 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|██████████████| 1000/1000 [00:00<00:00, 4851.17 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|███████████████| 1091/1091 [00:02<00:00, 375.29 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459 , 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 19:06:53,502 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:06:53,503 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3631] 2024-08-01 19:06:53,534 >> loading weights file /root/.cache/modelscope/hub/LL M-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 19:06:53,534 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 19:06:53,536 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|██████████████████████████████████████████████| 4/4 [00:08<00:00, 2.04s/it]
[INFO|modeling_utils.py:4463] 2024-08-01 19:07:01,775 >> All model checkpoint weights were used when initial izing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-01 19:07:01,775 >> All the weights of LlamaForCausalLM were initialize d from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaFor CausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-01 19:07:01,779 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 19:07:01,780 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementati on.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/01/2024 19:07:01 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/01/2024 19:07:01 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,up_proj,v_pr oj,down_proj,k_proj,o_proj,gate_proj
08/01/2024 19:07:04 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-01 19:07:04,471 >> Using auto half precision backend
[INFO|trainer.py:2134] 2024-08-01 19:07:04,831 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-01 19:07:04,831 >> Num examples = 981
[INFO|trainer.py:2136] 2024-08-01 19:07:04,831 >> Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-01 19:07:04,832 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-01 19:07:04,832 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-08-01 19:07:04,832 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-01 19:07:04,832 >> Total optimization steps = 366
[INFO|trainer.py:2143] 2024-08-01 19:07:04,836 >> Number of trainable parameters = 20,971,520
{'loss': 1.5025, 'grad_norm': 1.3309401273727417, 'learning_rate': 2.702702702702703e-05, 'epoch': 0.08}
{'loss': 1.3424, 'grad_norm': 1.8096668720245361, 'learning_rate': 5.405405405405406e-05, 'epoch': 0.16}
{'loss': 1.1286, 'grad_norm': 1.2990491390228271, 'learning_rate': 8.108108108108109e-05, 'epoch': 0.24}
{'loss': 0.9808, 'grad_norm': 1.1075998544692993, 'learning_rate': 9.997948550797227e-05, 'epoch': 0.33}
{'loss': 0.9924, 'grad_norm': 1.8073676824569702, 'learning_rate': 9.961525153583327e-05, 'epoch': 0.41}
{'loss': 1.0052, 'grad_norm': 1.2079122066497803, 'learning_rate': 9.879896064123961e-05, 'epoch': 0.49}
{'loss': 0.9973, 'grad_norm': 1.7361079454421997, 'learning_rate': 9.753805025397779e-05, 'epoch': 0.57}
{'loss': 0.8488, 'grad_norm': 1.1059085130691528, 'learning_rate': 9.584400884284545e-05, 'epoch': 0.65}
{'loss': 0.9893, 'grad_norm': 0.8711654543876648, 'learning_rate': 9.373227124134888e-05, 'epoch': 0.73}
{'loss': 0.9116, 'grad_norm': 1.3793599605560303, 'learning_rate': 9.122207801708802e-05, 'epoch': 0.82}
{'loss': 1.0429, 'grad_norm': 1.3769993782043457, 'learning_rate': 8.833630016614976e-05, 'epoch': 0.9}
{'loss': 0.9323, 'grad_norm': 1.2503643035888672, 'learning_rate': 8.510123072976239e-05, 'epoch': 0.98}
{'loss': 0.9213, 'grad_norm': 2.449227809906006, 'learning_rate': 8.154634523184388e-05, 'epoch': 1.06}
{'loss': 0.8386, 'grad_norm': 1.009852409362793, 'learning_rate': 7.770403312015721e-05, 'epoch': 1.14}
40%|███████████████████████████▌ | 146/366 [10:19<15:11, 4.14s/it] {'loss': 0.856, 'grad_norm': 0.863474428653717, 'learning_rate': 7.360930265797935e-05, 'epoch': 1.22}
{'loss': 0.838, 'grad_norm': 0.712546169757843, 'learning_rate': 6.929946195508932e-05, 'epoch': 1.3}
{'loss': 0.8268, 'grad_norm': 1.6060960292816162, 'learning_rate': 6.481377904428171e-05, 'epoch': 1.39}
{'loss': 0.7326, 'grad_norm': 0.7863644957542419, 'learning_rate': 6.019312410053286e-05, 'epoch': 1.47}
{'loss': 0.7823, 'grad_norm': 0.8964634537696838, 'learning_rate': 5.547959706265068e-05, 'epoch': 1.55}
{'loss': 0.7599, 'grad_norm': 0.5305138826370239, 'learning_rate': 5.0716144050239375e-05, 'epoch': 1.63}
{'loss': 0.815, 'grad_norm': 0.8153926730155945, 'learning_rate': 4.594616607090028e-05, 'epoch': 1.71}
{'loss': 0.8258, 'grad_norm': 1.3266267776489258, 'learning_rate': 4.121312358283463e-05, 'epoch': 1.79}
{'loss': 0.7446, 'grad_norm': 1.8706341981887817, 'learning_rate': 3.656014051577713e-05, 'epoch': 1.88}
{'loss': 0.7539, 'grad_norm': 1.5148639678955078, 'learning_rate': 3.202961135812437e-05, 'epoch': 1.96}
{'loss': 0.7512, 'grad_norm': 1.3771291971206665, 'learning_rate': 2.7662814890184818e-05, 'epoch': 2.04}
{'loss': 0.7128, 'grad_norm': 1.420331597328186, 'learning_rate': 2.3499538082923606e-05, 'epoch': 2.12}
{'loss': 0.635, 'grad_norm': 0.9235875010490417, 'learning_rate': 1.9577713588953795e-05, 'epoch': 2.2}
{'loss': 0.6628, 'grad_norm': 1.6558737754821777, 'learning_rate': 1.5933074128684332e-05, 'epoch': 2.28}
{'loss': 0.681, 'grad_norm': 0.8138720393180847, 'learning_rate': 1.2598826920598772e-05, 'epoch': 2.36}
{'loss': 0.6707, 'grad_norm': 1.0700312852859497, 'learning_rate': 9.605351122011309e-06, 'epoch': 2.45}
{'loss': 0.6201, 'grad_norm': 1.3334729671478271, 'learning_rate': 6.979921036993042e-06, 'epoch': 2.53}
{'loss': 0.6698, 'grad_norm': 1.440247893333435, 'learning_rate': 4.746457613389904e-06, 'epoch': 2.61}
{'loss': 0.7072, 'grad_norm': 0.9171076416969299, 'learning_rate': 2.925310493105099e-06, 'epoch': 2.69}
{'loss': 0.6871, 'grad_norm': 0.9809044003486633, 'learning_rate': 1.5330726014397668e-06, 'epoch': 2.77}
{'loss': 0.5931, 'grad_norm': 1.7158288955688477, 'learning_rate': 5.824289648152126e-07, 'epoch': 2.85}
{'loss': 0.6827, 'grad_norm': 1.3241132497787476, 'learning_rate': 8.204113433559201e-08, 'epoch': 2.94}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:42<00:00, 4.02s/it] [INFO|trainer.py:3503] 2024-08-01 19:32:47,527 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkp oint-366
[INFO|configuration_utils.py:731] 2024-08-01 19:32:47,556 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:47,557 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:47,675 >> tokenizer config file saved in saves/llama 3-8b/lora/sft/checkpoint-366/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:47,677 >> Special tokens file saved in saves/llama3- 8b/lora/sft/checkpoint-366/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-01 19:32:48,046 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 1543.2099, 'train_samples_per_second': 1.907, 'train_steps_per_second': 0.237, 'train_loss ': 0.8416516305318947, 'epoch': 2.98}
100%|█████████████████████████████████████████████████████████████████████| 366/366 [25:43<00:00, 4.22s/it]
[INFO|trainer.py:3503] 2024-08-01 19:32:48,050 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-01 19:32:48,081 >> loading configuration file /root/.cache/modelsc ope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 19:32:48,082 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-01 19:32:48,191 >> tokenizer config file saved in saves/llama 3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 19:32:48,192 >> Special tokens file saved in saves/llama3- 8b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 2.9847
total_flos = 20619353GF
train_loss = 0.8417
train_runtime = 0:25:43.20
train_samples_per_second = 1.907
train_steps_per_second = 0.237
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/01/2024 19:32:48 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-01 19:32:48,529 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-01 19:32:48,529 >> Num examples = 110
[INFO|trainer.py:3824] 2024-08-01 19:32:48,529 >> Batch size = 1
100%|█████████████████████████████████████████████████████████████████████| 110/110 [00:18<00:00, 6.07it/s]
***** eval metrics *****
epoch = 2.9847
eval_loss = 0.9957
eval_runtime = 0:00:18.23
eval_samples_per_second = 6.031
eval_steps_per_second = 6.031
[INFO|modelcard.py:449] 2024-08-01 19:33:06,773 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
输出结果
root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/saves/
LLaMA-Factory/saves/
`-- llama3-8b
`-- lora
`-- sft
|-- README.md
|-- adapter_config.json
|-- adapter_model.safetensors
|-- all_results.json
|-- checkpoint-366
| |-- README.md
| |-- adapter_config.json
| |-- adapter_model.safetensors
| |-- optimizer.pt
| |-- rng_state.pth
| |-- scheduler.pt
| |-- special_tokens_map.json
| |-- tokenizer.json
| |-- tokenizer_config.json
| |-- trainer_state.json
| `-- training_args.bin
|-- eval_results.json
|-- special_tokens_map.json
|-- tokenizer.json
|-- tokenizer_config.json
|-- train_results.json
|-- trainer_log.jsonl
|-- trainer_state.json
|-- training_args.bin
`-- training_loss.png
运行时的资源占用情况
3.1.2 多卡环境
(llama_factory_torch) root@notebook-1819291427828183041-scnlbe5oi5-51898:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# CUDA_VISIBLE_DEVICES=0,1,2 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-02 18:08:58,775] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/02/2024 18:09:01 - INFO - llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:26472
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING]
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-02 18:09:04,227] torch.distributed.run: [WARNING] *****************************************
[2024-08-02 18:09:09,155] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,269] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-02 18:09:09,457] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.353489 95618 ProcessGroupNCCL.cpp:686] [Rank 2] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=353053408
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:12.555290 95617 ProcessGroupNCCL.cpp:686] [Rank 1] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=369111936
08/02/2024 18:09:12 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:12 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0802 18:09:13.120337 95616 ProcessGroupNCCL.cpp:686] [Rank 0] ProcessGroupNCCL initialization options:NCCL_ASYNC_ERROR_HANDLING: 1, NCCL_DESYNC_DEBUG: 0, NCCL_ENABLE_TIMING: 0, NCCL_BLOCKING_WAIT: 0, TIMEOUT(ms): 180000000000, USE_HIGH_PRIORITY_STREAM: 0, TORCH_DISTRIBUTED_DEBUG: OFF, NCCL_DEBUG: OFF, ID=359553664
08/02/2024 18:09:13 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.
08/02/2024 18:09:13 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:13 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
I0802 18:09:14.158418 95618 ProcessGroupNCCL.cpp:2780] Rank 2 using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:14.165846 95617 ProcessGroupNCCL.cpp:2780] Rank 1 using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 18:09:14,276 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 18:09:14,684 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/02/2024 18:09:14 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Converting format of dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 91/91 [00:00<00:00, 301.60 examples/s]
08/02/2024 18:09:16 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Converting format of dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 3399.93 examples/s]
I0802 18:09:18.295866 95616 ProcessGroupNCCL.cpp:2780] Rank 0 using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
I0802 18:09:19.234498 95616 ProcessGroupNCCL.cpp:1340] NCCL_DEBUG: N/A
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
08/02/2024 18:09:19 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Running tokenizer on dataset (num_proc=16): 0%| | 0/1091 [00:00<?, ? examples/s]08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
08/02/2024 18:09:20 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Running tokenizer on dataset (num_proc=16): 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 273.44 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 18:09:24,080 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:09:24,082 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3631] 2024-08-02 18:09:24,119 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 18:09:24,119 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:24,121 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.49s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: up_proj,gate_proj,o_proj,q_proj,k_proj,v_proj,down_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.59s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 18:09:34,552 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-02 18:09:34,552 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 18:09:34,555 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 18:09:34,555 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: k_proj,o_proj,v_proj,down_proj,q_proj,up_proj,gate_proj
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:10<00:00, 2.52s/it]
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/02/2024 18:09:34 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/02/2024 18:09:34 - INFO - llamafactory.model.model_utils.misc - Found linear modules: gate_proj,down_proj,q_proj,o_proj,up_proj,v_proj,k_proj
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
08/02/2024 18:09:34 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-02 18:09:34,983 >> Using auto half precision backend
08/02/2024 18:09:35 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
[INFO|trainer.py:2134] 2024-08-02 18:09:37,114 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-02 18:09:37,114 >> Num examples = 981
[INFO|trainer.py:2136] 2024-08-02 18:09:37,114 >> Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-02 18:09:37,114 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-02 18:09:37,114 >> Total train batch size (w. parallel, distributed & accumulation) = 24
[INFO|trainer.py:2141] 2024-08-02 18:09:37,114 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-02 18:09:37,114 >> Total optimization steps = 120
[INFO|trainer.py:2143] 2024-08-02 18:09:37,119 >> Number of trainable parameters = 20,971,520
{'loss': 1.4267, 'grad_norm': 1.401288628578186, 'learning_rate': 8.333333333333334e-05, 'epoch': 0.24}
{'loss': 1.1319, 'grad_norm': 1.4780751466751099, 'learning_rate': 9.865224352899119e-05, 'epoch': 0.49}
{'loss': 0.9963, 'grad_norm': 0.532632052898407, 'learning_rate': 9.330127018922194e-05, 'epoch': 0.73}
{'loss': 0.9792, 'grad_norm': 0.7996620535850525, 'learning_rate': 8.43120818934367e-05, 'epoch': 0.98}
{'loss': 0.937, 'grad_norm': 0.4041236639022827, 'learning_rate': 7.243995901002312e-05, 'epoch': 1.22}
{'loss': 0.8805, 'grad_norm': 0.5675532221794128, 'learning_rate': 5.868240888334653e-05, 'epoch': 1.47}
{'loss': 0.8467, 'grad_norm': 0.5038197636604309, 'learning_rate': 4.4195354293738484e-05, 'epoch': 1.71}
{'loss': 0.8612, 'grad_norm': 0.7851077914237976, 'learning_rate': 3.019601169804216e-05, 'epoch': 1.96}
{'loss': 0.818, 'grad_norm': 0.450968474149704, 'learning_rate': 1.7860619515673033e-05, 'epoch': 2.2}
{'loss': 0.8308, 'grad_norm': 0.5961077809333801, 'learning_rate': 8.225609429353187e-06, 'epoch': 2.45}
{'loss': 0.8071, 'grad_norm': 0.5323781371116638, 'learning_rate': 2.100524384225555e-06, 'epoch': 2.69}
{'loss': 0.8061, 'grad_norm': 0.7563619017601013, 'learning_rate': 0.0, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:46<00:00, 4.56s/it][INFO|trainer.py:3503] 2024-08-02 18:19:24,273 >> Saving model checkpoint to saves/llama3-8b/lora/sft/checkpoint-120
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,304 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,305 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:24,432 >> tokenizer config file saved in saves/llama3-8b/lora/sft/checkpoint-120/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:24,434 >> Special tokens file saved in saves/llama3-8b/lora/sft/checkpoint-120/special_tokens_map.json
[INFO|trainer.py:2394] 2024-08-02 18:19:24,832 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 587.7138, 'train_samples_per_second': 5.008, 'train_steps_per_second': 0.204, 'train_loss': 0.9434665679931641, 'epoch': 2.94}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 120/120 [09:47<00:00, 4.90s/it]
[INFO|trainer.py:3503] 2024-08-02 18:19:24,837 >> Saving model checkpoint to saves/llama3-8b/lora/sft
[INFO|configuration_utils.py:731] 2024-08-02 18:19:24,907 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 18:19:24,908 >> Model config LlamaConfig {
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|tokenization_utils_base.py:2702] 2024-08-02 18:19:25,048 >> tokenizer config file saved in saves/llama3-8b/lora/sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-02 18:19:25,055 >> Special tokens file saved in saves/llama3-8b/lora/sft/special_tokens_map.json
***** train metrics *****
epoch = 2.9358
total_flos = 20332711GF
train_loss = 0.9435
train_runtime = 0:09:47.71
train_samples_per_second = 5.008
train_steps_per_second = 0.204
Figure saved at: saves/llama3-8b/lora/sft/training_loss.png
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
08/02/2024 18:19:25 - WARNING - llamafactory.extras.ploting - No metric eval_accuracy to plot.
[INFO|trainer.py:3819] 2024-08-02 18:19:25,357 >>
***** Running Evaluation *****
[INFO|trainer.py:3821] 2024-08-02 18:19:25,357 >> Num examples = 110
[INFO|trainer.py:3824] 2024-08-02 18:19:25,357 >> Batch size = 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:08<00:00, 4.50it/s]
***** eval metrics *****
epoch = 2.9358
eval_loss = 0.9702
eval_runtime = 0:00:08.33
eval_samples_per_second = 13.193
eval_steps_per_second = 4.438
[INFO|modelcard.py:449] 2024-08-02 18:19:33,712 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
运行时的资源占用情况
3.2 模型合并
模型合并实在CPU上进行的。
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
[2024-08-01 21:34:37,394] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 21:34:41,664 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 21:34:42,030 >> Special tokens have been added in the vocabulary, make sure the associa ted word embeddings are fine-tuned or trained.
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 21:34:42 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 21:34:42,031 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla ma-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 21:34:42,032 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
08/01/2024 21:34:42 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-01 21:34:42,058 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-In struct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 21:34:42,058 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:42,059 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 3.40it/s]
[INFO|modeling_utils.py:4463] 2024-08-01 21:34:43,324 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-01 21:34:43,324 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint a t /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions with out further training.
[INFO|configuration_utils.py:991] 2024-08-01 21:34:43,327 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Lla ma-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 21:34:43,327 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/01/2024 21:34:43 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/01/2024 21:40:34 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/01/2024 21:40:34 - INFO - llamafactory.model.loader - all params: 8,030,261,248
08/01/2024 21:40:34 - INFO - llamafactory.train.tuner - Convert model dtype to: torch.bfloat16.
[INFO|configuration_utils.py:472] 2024-08-01 21:40:34,700 >> Configuration saved in models/llama3_lora_sft/config.json
[INFO|configuration_utils.py:807] 2024-08-01 21:40:34,704 >> Configuration saved in models/llama3_lora_sft/generation_config.json
[INFO|modeling_utils.py:2763] 2024-08-01 21:40:49,039 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 9 checkpoint shards. You can find where each parameters has been saved in the index located at models/llama3_lora_sft/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2702] 2024-08-01 21:40:49,046 >> tokenizer config file saved in models/llama3_lora_sft/tokenizer_config.json
[INFO|tokenization_utils_base.py:2711] 2024-08-01 21:40:49,048 >> Special tokens file saved in models/llama3_lora_sft/special_tokens_map.json
输出结果
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models# tree -L 6 LLaMA-Factory/models/llama3_lora_sft/
LLaMA-Factory/models/llama3_lora_sft/
|-- config.json
|-- generation_config.json
|-- model-00001-of-00009.safetensors
|-- model-00002-of-00009.safetensors
|-- model-00003-of-00009.safetensors
|-- model-00004-of-00009.safetensors
|-- model-00005-of-00009.safetensors
|-- model-00006-of-00009.safetensors
|-- model-00007-of-00009.safetensors
|-- model-00008-of-00009.safetensors
|-- model-00009-of-00009.safetensors
|-- model.safetensors.index.json
|-- special_tokens_map.json
|-- tokenizer.json
`-- tokenizer_config.json
运行时的资源占用情况
3.3 LoRA 推理
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-20553:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-02 22:08:48,070] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-02 22:08:52,267 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-02 22:08:52,535 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-02 22:08:52,818 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/02/2024 22:08:52 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-02 22:08:52,820 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-02 22:08:52,821 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
08/02/2024 22:08:52 - INFO - llamafactory.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3631] 2024-08-02 22:08:52,847 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-02 22:08:52,847 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-02 22:08:52,848 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:07<00:00, 1.98s/it]
[INFO|modeling_utils.py:4463] 2024-08-02 22:09:01,148 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-02 22:09:01,148 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-02 22:09:01,151 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-02 22:09:01,152 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/02/2024 22:09:01 - INFO - llamafactory.model.model_utils.attention - Using vanilla attention implementation.
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Merged 1 adapter(s).
08/02/2024 22:09:06 - INFO - llamafactory.model.adapter - Loaded adapter(s): saves/llama3-8b/lora/sft
08/02/2024 22:09:06 - INFO - llamafactory.model.loader - all params: 8,030,261,248
Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.
User: 中国深圳有哪些旅游景点
Assistant: 深圳是一个非常有名的旅游城市,拥有许多名副其名的旅游景点。以下是一些主要的旅游景点:
1. Window of the World:这是一个规模宏大的主题公园,展示了世界各地的风土人情。
2. Splendid China Miniature Theme Park:这个公园展现了中国的历史和文化,拥有许多精致的模型和景观。
3. Dafen Oil Painting Village:这个村庄是中国最大的油画村,拥有数以万计的油画作品,展示了中国油画的技艺。
4. Dameisha Beach:这个沙滩是深圳最为人知的旅游景点之一,拥有洁洁的沙滩和清澈的海水,是一个非常适合休闲的场所。
5. Mangrove Forest Nature Reserve:这个自然保护区拥有广泛的 mangrove 森林,展示了中国的自然景观。
6. Shenzhen Museum:这个博物馆展现了深圳的历史和文化,拥有许多历史和艺术的收藏品。
7. Lianhua Mountain Park:这个公园是深圳最大的公园,拥有许多山路和景观,展示了中国的自然美景。
8. Shenzhen Bay Sports Center:这个体育中心拥有许多不同的运动场所,展示了中国的体育技艺。
9. OCT-LOFT:这个文化区拥有许多艺术和文化的项目,展示了中国的艺术和文化。
10. Fairy Lake Botanical Garden:这个植物园拥有许多不同的植物和花卉,展示了中国的自然美景。
User: 中国广州有哪些旅游景点
Assistant: 广州是一个非常有名的旅游城市,拥有许多名副其名的旅游景点。以下是一些主要的旅游景点:
1. Canton Tower:这是一个位于广州的超高建筑,拥有360度的观景台,展示了广州的全景。
2. Chimelong Paradise:这个主题公园拥有许多不同的游乐设施和景观,展示了中国的游乐技艺。
3. Baiyun Mountain:这个山区拥有许多不同的景观和游乐设施,展示了中国的自然美景。
4. Yuexiu Park:这个公园是广州最大的公园,拥有许多不同的景观和游乐设施,展示了中国的自然美景。
5. Temple of the Six Banyan Trees:这个寺庙拥有许多不同的文化和历史的收藏品,展示了中国的历史和文化。
6. Museum of the Chinese Revolution:这个博物馆展现了中国革命的历史和文化,拥有许多不同的收藏品和展品。
7. Guangzhou Tower:这个塔楼是广州最早的建筑,拥有许多不同的景观和游乐设施,展示了中国的历史和文化。
8. Guangzhou Museum:这个博物馆展现了广州的历史和文化,拥有许多不同的收藏品和展品。
9. Flower Street:这个街区拥有许多不同的花卉和景观,展示了中国的自然美景。
10. Shamian Island:这个岛区拥有许多不同的景观和游乐设施,展示了中国的自然美景和历史文化。
四、FAQ
Q:OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Fail ed to load image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or d irectory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this w arning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or ` libpng` installed before building `torchvision` from source?
warn(
[2024-08-01 15:13:21,242] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/01/2024 15:13:24 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distributed training: False, compute dtype: torch.bfloat16
[INFO|tokenization_auto.py:682] 2024-08-01 15:13:25,152 >> Could not locate the tokenizer configuration file, will try to use the model config instead.
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
return f(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error
raise head_call_error
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata
r = _request_wrapper(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper
response = _request_wrapper(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper
hf_raise_for_status(response)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)
Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 44, in run_sft
tokenizer_module = load_tokenizer(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 853, in from_pretrained
config = AutoConfig.from_pretrained(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 972, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/opt/conda/envs/llama_fct/lib/python3.10/site-packages/transformers/utils/hub.py", line 420, in cached_file
raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-66ab3595-53663c2f4d5cf81405b65b9e;080cfa15-3220-4ab1-b123-4a32ba31a03a)
Cannot access gated repo for url https://hf-mirror.com/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.
错误原因:默认是从Hugging Face中获取模型,由于Hugging Face 模型授权失败,导致获取模型失败。
解决方法:从modelscope下载模型。
export USE_MODELSCOPE_HUB=1 # Windows 使用 `set USE_MODELSCOPE_HUB=1`
将 model_name_or_path
设置为模型 ID 来加载对应的模型。在魔搭社区查看所有可用的模型,例如 LLM-Research/Meta-Llama-3-8B-Instruct
。
修改 llama3_lora_sft.yaml
文件:
# model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
改为
model_name_or_path: LLM-Research/Meta-Llama-3-8B-Instruct
llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
Q:OSError: LLM-Research/Meta-Llama-3-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory# llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
[2024-08-01 21:17:22,212] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 402, in cached_file
resolved_file = hf_hub_download(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
raise head_call_error
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
hf_raise_for_status(response)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status
raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-66ab8ae6-4ed0547e1f86fcb201b723f8;acee559e-0676-48e4-8871-b6eb58e797ca)
Repository Not Found for url: https://hf-mirror.com/LLM-Research/Meta-Llama-3-8B-Instruct/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 81, in main
run_chat()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 125, in run_chat
chat_model = ChatModel()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 44, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 53, in __init__
tokenizer_module = load_tokenizer(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 69, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 833, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 665, in get_tokenizer_config
resolved_config_file = cached_file(
File "/opt/conda/envs/llama_factory_torch/lib/python3.10/site-packages/transformers/utils/hub.py", line 425, in cached_file
raise EnvironmentError(
OSError: LLM-Research/Meta-Llama-3-8B-Instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
错误原因:找不到 LLM-Research/Meta-Llama-3-8B-Instruct
模型。
解决方法:从modelscope下载模型。
export USE_MODELSCOPE_HUB=1
Q:ModuleNotFoundError: No module named 'modelscope'
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
[2024-08-01 19:05:15,320] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)
08/01/2024 19:05:18 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distri buted training: False, compute dtype: torch.bfloat16
Traceback (most recent call last):
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 219, i n try_download_model_from_ms
from modelscope import snapshot_download
ModuleNotFoundError: No module named 'modelscope'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/llama_factory_torch/bin/llamafactory-cli", line 8, in <module>
sys.exit(main())
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
run_exp()
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 44, in run_sft
tokenizer_module = load_tokenizer(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 67, i n load_tokenizer
init_kwargs = _get_init_kwargs(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/model/loader.py", line 52, i n _get_init_kwargs
model_args.model_name_or_path = try_download_model_from_ms(model_args)
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/extras/misc.py", line 224, i n try_download_model_from_ms
raise ImportError("Please install modelscope via `pip install modelscope -U`")
ImportError: Please install modelscope via `pip install modelscope -U`
错误原因:缺少modelscope
依赖包。
解决方法:安装modelscope
。
(llama_factory_torch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/m
odels/LLaMA-Factory# pip install --no-dependencies modelscope
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting modelscope
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/38/37/9fe505ebc67ba5e0345a69d6e8b2ee8630523975b484 d221691ef60182bd/modelscope-1.16.1-py3-none-any.whl (5.7 MB)
Installing collected packages: modelscope
Successfully installed modelscope-1.16.1
Q:ImportError: /PATH/TO/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig
(llama_fct_pytorch) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/mod
els/LLaMA-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
Traceback (most recent call last):
File "/opt/conda/envs/llama_fct_pytorch/bin/llamafactory-cli", line 5, in <module>
from llamafactory.cli import main
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/__init__.py", line 38, in <module>
from .cli import VERSION
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/cli.py", line 21, in <module>
from . import launcher
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/launcher.py", line 15, in <module>
from llamafactory.train.tuner import run_exp
File "/public/home/scnlbe5oi5/Downloads/models/LLaMA-Factory/src/llamafactory/train/tuner.py", line 19, in <module>
import torch
File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
from torch._C import * # noqa: F403
ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/__init__.py", line 237, in <module>
from torch._C import * # noqa: F403
ImportError: /opt/conda/envs/llama_fct_pytorch/lib/python3.10/site-packages/torch/lib/libtorch_hip.so: undefined symbol: ncclCommInitRankConfig
错误原因:当前PyTorch版本不支持国产异构加速卡。
该问题的解决方法,请参考下文的FAQ。
Q:PyTorch版本不支持国产异构加速卡
(llama_fct) root@notebook-1813389960667746306-scnlbe5oi5-17811:/public/home/scnlbe5oi5/Downloads/models/LLaM
A-Factory# llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
No ROCm runtime is found, using ROCM_HOME='/opt/dtk'
/opt/conda/envs/llama_fct/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to lo ad image Python extension: 'libc10_hip.so: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before buildin g `torchvision` from source?
warn(
[2024-08-01 17:49:08,805] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (a uto detect)
08/01/2024 17:49:12 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cpu, n_gpu: 0, distribut ed training: False, compute dtype: torch.bfloat16
Downloading: 100%|█████████████████████████████████████████████████████████| 654/654 [00:00<00:00, 2.56kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 48.0/48.0 [00:00<00:00, 183B/s]
Downloading: 100%|███████████████████████████████████████████████████████████| 187/187 [00:00<00:00, 759B/s]
Downloading: 100%|█████████████████████████████████████████████████████| 7.62k/7.62k [00:00<00:00, 29.9kB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.63G/4.63G [01:33<00:00, 53.4MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.66G/4.66G [01:02<00:00, 79.9MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.58G/4.58G [01:00<00:00, 81.7MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 1.09G/1.09G [00:22<00:00, 51.6MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 23.4k/23.4k [00:00<00:00, 53.6kB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 36.3k/36.3k [00:00<00:00, 125kB/s]
Downloading: 100%|█████████████████████████████████████████████████████████| 73.0/73.0 [00:00<00:00, 293B/s]
Downloading: 100%|█████████████████████████████████████████████████████| 8.66M/8.66M [00:00<00:00, 13.5MB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 49.8k/49.8k [00:00<00:00, 90.0kB/s]
Downloading: 100%|█████████████████████████████████████████████████████| 4.59k/4.59k [00:00<00:00, 18.7kB/s]
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,510 >> loading file tokenizer.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2287] 2024-08-01 17:53:53,511 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-01 17:53:53,854 >> Special tokens have been added in the voca bulary, make sure the associated word embeddings are fine-tuned or trained.
08/01/2024 17:53:53 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
08/01/2024 17:53:53 - INFO - llamafactory.data.template - Add pad token: <|eot_id|>
08/01/2024 17:53:53 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Generating train split: 91 examples [00:00, 10580.81 examples/s]
Converting format of dataset (num_proc=16): 100%|███████████████████| 91/91 [00:00<00:00, 427.78 examples/s]
08/01/2024 17:53:56 - INFO - llamafactory.data.loader - Loading dataset alpaca_en_demo.json...
Generating train split: 1000 examples [00:00, 66788.28 examples/s]
Converting format of dataset (num_proc=16): 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 4688.60 examples/s]
Running tokenizer on dataset (num_proc=16): 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1091/1091 [00:03<00:00, 295.08 examples/s]
training example:
input_ids:
[128000, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 5991, 609, 39254, 459, 15592, 18328, 8040, 555, 5991, 3170, 3500, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am {{name}}, an AI assistant developed by {{author}}. How can I assist you today?<|eot_id|>
[INFO|configuration_utils.py:731] 2024-08-01 17:54:02,106 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/config.json
[INFO|configuration_utils.py:800] 2024-08-01 17:54:02,108 >> Model config LlamaConfig {
"_name_or_path": "/root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 128000,
"eos_token_id": 128009,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 8192,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 500000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.3",
"use_cache": true,
"vocab_size": 128256
}
[INFO|modeling_utils.py:3631] 2024-08-01 17:54:02,139 >> loading weights file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/model.safetensors.index.json
[INFO|modeling_utils.py:1572] 2024-08-01 17:54:02,140 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1038] 2024-08-01 17:54:02,142 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"eos_token_id": 128009
}
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.68it/s]
[INFO|modeling_utils.py:4463] 2024-08-01 17:54:03,708 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4471] 2024-08-01 17:54:03,709 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:991] 2024-08-01 17:54:03,712 >> loading configuration file /root/.cache/modelscope/hub/LLM-Research/Meta-Llama-3-8B-Instruct/generation_config.json
[INFO|configuration_utils.py:1038] 2024-08-01 17:54:03,713 >> Generate config GenerationConfig {
"bos_token_id": 128000,
"do_sample": true,
"eos_token_id": [
128001,
128009
],
"max_length": 4096,
"temperature": 0.6,
"top_p": 0.9
}
08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
08/01/2024 17:54:03 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
08/01/2024 17:54:03 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,down_proj,o_proj,k_proj,gate_proj,up_proj,v_proj
08/01/2024 17:54:08 - INFO - llamafactory.model.loader - trainable params: 20,971,520 || all params: 8,051,232,768 || trainable%: 0.2605
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[INFO|trainer.py:648] 2024-08-01 17:54:08,091 >> Using cpu_amp half precision backend
[INFO|trainer.py:2134] 2024-08-01 17:54:09,008 >> ***** Running training *****
[INFO|trainer.py:2135] 2024-08-01 17:54:09,008 >> Num examples = 981
[INFO|trainer.py:2136] 2024-08-01 17:54:09,008 >> Num Epochs = 3
[INFO|trainer.py:2137] 2024-08-01 17:54:09,008 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2140] 2024-08-01 17:54:09,008 >> Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2141] 2024-08-01 17:54:09,008 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2142] 2024-08-01 17:54:09,008 >> Total optimization steps = 366
[INFO|trainer.py:2143] 2024-08-01 17:54:09,012 >> Number of trainable parameters = 20,971,520
0%| | 0/366 [00:00<?, ?it/s
错误原因:当前PyTorch不支持国产异构加速卡,导致程序卡住,模型无法微调训练。
解决方法:在光合社区中查询并下载安装PyTorch。以 torch-2.1.0+das1.1.git3ac1bdd.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64
为例,尝试安装 torch-2.1.0
。