(三)ChatGLM-6B 的 DeepSpeed/P-Tuning v2微调

模型文件和相关代码准备

安装日期:2023-04-19

模型文件地址:https://huggingface.co/THUDM/chatglm-6b/tree/main
Hash: 35ca523

在这里插入图片描述

相对上一篇文章(04-09),官方更新了文件,也增加了 DeepSpeed支持,所以火速跟进体验一下(他们也好努力,截图3小时前还在更新代码)

参考之前写的(二)ChatGLM-6B模型部署以及ptuning微调详细教程 关于附:下载大文件的的python代码 ,准重新下载模型文件(国外,速度慢没办法)

等待模型文件下载完毕后,下载其他的配置文件,把大的模型文件移动进去

# 根据需要选择一个目录存放模型文件
mkdir /data/thudm2/
cd /data/thudm2/
# 准备下载其他配置文件
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm-6b

# 移动文件进去,存在覆盖即可(上面代码的占位文件,其实是没用的,覆盖了)
mv -f pytor* chatglm-6b/
mv -f ice_text.model chatglm-6b/

ChatGLM6B部署

准备代码

https://github.com/THUDM/ChatGLM-6B
hash 01e6313

git clone https://github.com/THUDM/ChatGLM-6B.git
cd ChatGLM-6B

# 把模型提前准备进来,这里的路径需要根据实际情况修改,你的未必和我一样
ln -s /data/thudm2/ THUDM

修改requirements.txt,

特别注意torch torchvision版本对应问题,看官网

protobuf==3.20.0
transformers==4.28.0
cpm_kernels
gradio
mdtex2html
sentencepiece
rouge_chinese
nltk
jieba
datasets
deepspeed
accelerate
torchvision==0.14.0
torch==1.13.0

我的显卡环境

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P40           On   | 00000000:00:09.0 Off |                    0 |
| N/A   22C    P8     8W / 250W |      0MiB / 22919MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

venv环境准备开始

yum -y install python3.9-devel

# 设置版本都为3.9(要自己手动设置一下哈)
update-alternatives --config python
update-alternatives --config python3
------------------------------------------------------------
* 0            /usr/bin/python3.9   3         自动模式

# 现在开始python都是默认3.9了
python -m venv venv
pip3 install -r requirements.txt

CUDA设置,可见和分割内存大小


修改ds_train_finetune.sh,使用DeepSpeed进行全参数微调。


解决ninja报错

如果出现下面这个提示

/data/chatgml2/ChatGLM-6B/venv/lib/python3.9/site-packages/torch/include/torch/csrc/python_headers.h:10:10: fatal error: Python.h: No such file or directory
   10 | #include <Python.h>
      |          ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/torch/utils/cpp_extension.py", line 1717, in _run_ninja_build
    subprocess.run(
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 

这个好像是ninja版本问题,查看版本命令不是这个,试过升级和降级都没用,所以只能手动干源码了

venv/lib64/python3.9/site-packages/torch/utils/cpp_extension.py文件

将['ninja','-v']改成['ninja','--version']

运行还是继续报错,但是内容已经有些变化了

# 报错内容
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
ImportError: /root/.cache/torch_extensions/py39_cu117/utils/utils.so: cannot open shared object file: No such file or directory

可以手动编译utils.so

# 解决思路,保证python python3 都是指向同一个版本,我这里是python3.9
# 然后手动编译,发现可以过的
# py39_cu117目录可能不同,跟进报错提示看
 cd /root/.cache/torch_extensions/py39_cu117/utils/
# 手动编译
(venv) [root@VM-245-24-centos utils]# ninja
[2/2] g++ flatten_unflatten.o -shared -L/data/chatgml2/Cha...h/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so

通过上面的折腾就可以解决ninja相关问题了

训练开始

bash ds_train_finetune.sh 

内存爆了CUDA out of memory. Tried to allocate 11.50 GiB (GPU 0; 22.38 GiB total capacity; 11.50 GiB already allocated; 10.31 GiB free; 11.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF, 但是PYTORCH_CUDA_ALLOC_CONF的值我是有设置过的

(venv) [root@VM-245-24-centos ptuning]# bash ds_train_finetune.sh 
[2023-04-20 13:04:57,843] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-20 13:04:57,851] [INFO] [runner.py:540:main] cmd = /data/chatgml2/ChatGLM-6B/venv/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=35859 --enable_each_rank_log=None main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path ../THUDM/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-1e-4 --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 4 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --predict_with_generate --max_steps 5000 --logging_steps 10 --save_steps 1000 --learning_rate 1e-4 --fp16
[2023-04-20 13:05:00,282] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-20 13:05:00,282] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-20 13:05:00,282] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-20 13:05:00,282] [INFO] [launch.py:247:main] dist_world_size=1
[2023-04-20 13:05:00,282] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-04-20 13:05:05,024] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/20/2023 13:05:05 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
04/20/2023 13:05:05 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=deepspeed.json,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.0001,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=./output/adgen-chatglm-6b-ft-1e-4/runs/Apr20_13-05-05_VM-245-24-centos,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=5000,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=./output/adgen-chatglm-6b-ft-1e-4,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=4,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=./output/adgen-chatglm-6b-ft-1e-4,
save_on_each_node=False,
save_safetensors=False,
save_steps=1000,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
04/20/2023 13:05:06 - WARNING - datasets.builder - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-dbfa988cf4fa1ea5/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 535.09it/s]
[INFO|configuration_utils.py:666] 2023-04-20 13:05:06,079 >> loading configuration file ../THUDM/chatglm-6b/config.json
[WARNING|configuration_auto.py:925] 2023-04-20 13:05:06,079 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|configuration_utils.py:666] 2023-04-20 13:05:06,082 >> loading configuration file ../THUDM/chatglm-6b/config.json
[INFO|configuration_utils.py:720] 2023-04-20 13:05:06,083 >> Model config ChatGLMConfig {
  "_name_or_path": "../THUDM/chatglm-6b",
  "architectures": [
    "ChatGLMModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
  },
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "gmask_token_id": 130001,
  "hidden_size": 4096,
  "inner_hidden_size": 16384,
  "layernorm_epsilon": 1e-05,
  "mask_token_id": 130000,
  "max_sequence_length": 2048,
  "model_type": "chatglm",
  "num_attention_heads": 32,
  "num_layers": 28,
  "pad_token_id": 3,
  "position_encoding_2d": true,
  "pre_seq_len": null,
  "prefix_projection": false,
  "quantization_bit": 0,
  "torch_dtype": "float16",
  "transformers_version": "4.28.0",
  "use_cache": true,
  "vocab_size": 130528
}

[WARNING|tokenization_auto.py:675] 2023-04-20 13:05:06,083 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:05:06,085 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:05:06,086 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:05:06,086 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:05:06,086 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-20 13:05:06,345 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|modeling_utils.py:2531] 2023-04-20 13:05:06,365 >> loading weights file ../THUDM/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-20 13:05:06,366 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "pad_token_id": 3,
  "transformers_version": "4.28.0"
}

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 8/8 [00:06<00:00,  1.17it/s]
[INFO|modeling_utils.py:3190] 2023-04-20 13:05:13,287 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[INFO|modeling_utils.py:3198] 2023-04-20 13:05:13,288 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at ../THUDM/chatglm-6b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|modeling_utils.py:2839] 2023-04-20 13:05:13,292 >> Generation config file not found, using a generation config created from the model config.
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤*版型#宽松*风格#性感*图案#线条*裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[2023-04-20 13:07:02,968] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown
[2023-04-20 13:07:07,408] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-04-20 13:07:07,409] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-04-20 13:07:07,409] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-04-20 13:07:07,426] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-04-20 13:07:07,426] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>
[2023-04-20 13:07:07,426] [WARNING] [engine.py:1118:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2023-04-20 13:07:07,426] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2023-04-20 13:07:07,426] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500000000
[2023-04-20 13:07:07,426] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500000000
[2023-04-20 13:07:07,426] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False
[2023-04-20 13:07:07,426] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False
Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
1.11.1.git.kitware.jobserver-1
Loading extension module utils...
Time to load utils op: 0.2386620044708252 seconds
Traceback (most recent call last):
  File "/data/chatgml2/ChatGLM-6B/ptuning/main.py", line 434, in <module>
    main()
  File "/data/chatgml2/ChatGLM-6B/ptuning/main.py", line 373, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/data/chatgml2/ChatGLM-6B/ptuning/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/data/chatgml2/ChatGLM-6B/ptuning/trainer.py", line 1705, in _inner_training_loop
    deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
    deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/deepspeed/__init__.py", line 156, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/deepspeed/runtime/engine.py", line 328, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/deepspeed/runtime/engine.py", line 1187, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/deepspeed/runtime/engine.py", line 1418, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 346, in __init__
    self.single_partition_of_fp32_groups.append(self.parallel_partitioned_bit16_groups[i][partition_id].to(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 11.50 GiB (GPU 0; 22.38 GiB total capacity; 11.50 GiB already allocated; 10.31 GiB free; 11.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

怀疑是DeepSpeed参数设置的问题,先找点自信看看以前使用P-Tuning v2对ChatGLM-6B进行参数微调还有没作用

(venv) [root@VM-245-24-centos ptuning]# bash train.sh 
04/20/2023 13:09:39 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False
04/20/2023 13:09:39 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=no,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'fsdp_min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=16,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
include_inputs_for_metrics=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=0.02,
length_column_name=length,
load_best_model_at_end=False,
local_rank=-1,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=output/adgen-chatglm-6b-pt-128-2e-2/runs/Apr20_13-09-39_VM-245-24-centos,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=3000,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_hf,
optim_args=None,
output_dir=output/adgen-chatglm-6b-pt-128-2e-2,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=1,
per_device_train_batch_size=4,
predict_with_generate=True,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
ray_scope=last,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=output/adgen-chatglm-6b-pt-128-2e-2,
save_on_each_node=False,
save_safetensors=False,
save_steps=1000,
save_strategy=steps,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.0,
warmup_steps=0,
weight_decay=0.0,
xpu_backend=None,
)
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-d4e829b82a89c7e1/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...
Downloading data files: 100%|███████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 9187.96it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1529.93it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-d4e829b82a89c7e1/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 537.63it/s]
[INFO|configuration_utils.py:666] 2023-04-20 13:09:41,765 >> loading configuration file ../THUDM/chatglm-6b/config.json
[WARNING|configuration_auto.py:925] 2023-04-20 13:09:41,766 >> Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|configuration_utils.py:666] 2023-04-20 13:09:41,768 >> loading configuration file ../THUDM/chatglm-6b/config.json
[INFO|configuration_utils.py:720] 2023-04-20 13:09:41,769 >> Model config ChatGLMConfig {
  "_name_or_path": "../THUDM/chatglm-6b",
  "architectures": [
    "ChatGLMModel"
  ],
  "auto_map": {
    "AutoConfig": "configuration_chatglm.ChatGLMConfig",
    "AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
    "AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
  },
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "gmask_token_id": 130001,
  "hidden_size": 4096,
  "inner_hidden_size": 16384,
  "layernorm_epsilon": 1e-05,
  "mask_token_id": 130000,
  "max_sequence_length": 2048,
  "model_type": "chatglm",
  "num_attention_heads": 32,
  "num_layers": 28,
  "pad_token_id": 3,
  "position_encoding_2d": true,
  "pre_seq_len": null,
  "prefix_projection": false,
  "quantization_bit": 0,
  "torch_dtype": "float16",
  "transformers_version": "4.28.0",
  "use_cache": true,
  "vocab_size": 130528
}

[WARNING|tokenization_auto.py:675] 2023-04-20 13:09:41,769 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:09:41,772 >> loading file ice_text.model
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:09:41,772 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:09:41,772 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:1807] 2023-04-20 13:09:41,772 >> loading file tokenizer_config.json
[WARNING|auto_factory.py:456] 2023-04-20 13:09:42,044 >> Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
[INFO|modeling_utils.py:2531] 2023-04-20 13:09:42,065 >> loading weights file ../THUDM/chatglm-6b/pytorch_model.bin.index.json
[INFO|configuration_utils.py:575] 2023-04-20 13:09:42,066 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 130004,
  "eos_token_id": 130005,
  "pad_token_id": 3,
  "transformers_version": "4.28.0"
}

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 8/8 [00:06<00:00,  1.17it/s]
[INFO|modeling_utils.py:3190] 2023-04-20 13:09:49,180 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[WARNING|modeling_utils.py:3192] 2023-04-20 13:09:49,180 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at ../THUDM/chatglm-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2839] 2023-04-20 13:09:49,185 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
input_ids [5, 65421, 61, 67329, 32, 98339, 61, 72043, 32, 65347, 61, 70872, 32, 69768, 61, 68944, 32, 67329, 64103, 61, 96914, 130001, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
inputs 类型#裤*版型#宽松*风格#性感*图案#线条*裤型#阔腿裤 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
label_ids [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 130004, 5, 87052, 96914, 81471, 64562, 65759, 64493, 64988, 6, 65840, 65388, 74531, 63825, 75786, 64009, 63823, 65626, 63882, 64619, 65388, 6, 64480, 65604, 85646, 110945, 10, 64089, 65966, 87052, 67329, 65544, 6, 71964, 70533, 64417, 63862, 89978, 63991, 63823, 77284, 88473, 64219, 63848, 112012, 6, 71231, 65099, 71252, 66800, 85768, 64566, 64338, 100323, 75469, 63823, 117317, 64218, 64257, 64051, 74197, 6, 63893, 130005, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
labels 宽松的阔腿裤这两年真的吸粉不少,明星时尚达人的心头爱。毕竟好穿时尚,谁都能穿出腿长2米的效果宽松的裤腿,当然是遮肉小能手啊。上身随性自然不拘束,面料亲肤舒适贴身体验感棒棒哒。系带部分增加设计看点,还
/data/chatgml2/ChatGLM-6B/venv/lib64/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                          | 0/3000 [00:00<?, ?it/s]04/20/2023 13:13:51 - WARNING - transformers_modules.chatglm-6b.modeling_chatglm - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
{'loss': 5.8336, 'learning_rate': 0.019933333333333334, 'epoch': 0.01}                                                  
{'loss': 5.0271, 'learning_rate': 0.019866666666666668, 'epoch': 0.01}                                                  
{'loss': 5.3076, 'learning_rate': 0.0198, 'epoch': 0.02}                                                                
{'loss': 4.9874, 'learning_rate': 0.019733333333333335, 'epoch': 0.02}                                                  
{'loss': 4.9656, 'learning_rate': 0.019666666666666666, 'epoch': 0.03}                                                  
{'loss': 5.157, 'learning_rate': 0.0196, 'epoch': 0.03}                                                                 
{'loss': 5.2796, 'learning_rate': 0.019533333333333333, 'epoch': 0.04}                                                  
{'loss': 5.151, 'learning_rate': 0.019466666666666667, 'epoch': 0.04} 

OK, 到这里发现以前的方法是可以继续使用的,只能说是暂时环境跟不上,单机跑不了,或者要研究一下他的参数了,后续找到更大的机器看看deepspeed并行调参效果。

如果看到这里你知道怎么搞,或者我说错了的,欢迎留言一起探讨下

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
ChatGLM-6B源码是基于GLM的2D位置编码实现的。该位置编码的详细原理可以在原文《GLM: General Language Model Pretraining with Autoregressive Blank Infilling》中找到。在GitHub上,有一个微调ChatGLM-6B项目的代码库,作者是mymusise。该项目使用Stanford Alpaca的52K数据集,并通过LoRA(低秩适应)的方式进行微调。在评测时,使用中文Rouge分数和BLEU-4指标,并将生成的结果保存在"./output/adgen-chatglm-6b-pt-8-1e-2/generated_predictions.txt"文件中。 以上是关于ChatGLM-6B源码的一些解读。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [ChatGLM-6B模型结构组件源码阅读](https://blog.csdn.net/yjh_SE007/article/details/130728164)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [ChatGLM-6B的基座/部署/微调/实现:从GLM6B的LoRA/P-Tuning微调、及6B源码解读](https://blog.csdn.net/v_JULY_v/article/details/129880836)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v92^chatsearchT0_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值