HF transformers中Deepspeed分析

HuggingFase的transformers库里集成了Deepspeed,

参考Qwen的finetune.py脚本:

from transformers import Trainer, GPTQConfig, deepspeed

注意,transformers里集成的deepspeed,有些特性不完全支持,这里有集成的说明以及很多实用经验的介绍transformers/docs/source/zh/main_classes/deepspeed.md at eb5b968c5d80271ecb29917dffecc8f4c00247a8 · huggingface/transformers · GitHub

跟到transformers/trainer.py: trainer.train()里,

看起来self.model_wrapped就是为deepspeed转换的新model,见2122行的注释:

        # important: at this point:
        # self.model         is the Transformers Model
        # self.model_wrapped is DDP(Transformers Model), Deepspeed(Transformers Model),
        # FSDP(Transformers Model), Dynamo Optimized Module(Transformers Model) etc.

model = self._wrap_model(self.model_wrapped),

调试torchrun

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        
        {
            "name": "Python: Module",
            "type": "python",
            "request": "launch",
            "module": "torch.distributed.run",
            "cwd": "/mnt/workspace/workgroup/dlz/tmp2/Qwen",
            "args": [
                "--nproc_per_node", "2",
                "--nnodes", "1",
                "--node_rank", "0",
                "--master_addr", "localhost",
                "--master_port", "6819",
                "finetune.py",
                "--model_name_or_path", "/path/to/models/Qwen1.5-7B",
                "--data_path", "/path/to/finetune_full/data_100000.json",
                "--bf16", "True",
                "--output_dir", "output_qwen_full_1gpu_bs1",
                "--num_train_epochs", "2",
                "--per_device_train_batch_size", "2",
                "--per_device_eval_batch_size", "1",
                "--gradient_accumulation_steps", "4",
                "--evaluation_strategy", "no",
                "--save_strategy", "steps",
                "--save_steps", "1000",
                "--save_total_limit", "10",
                "--learning_rate", "1e-5",
                "--weight_decay", "0.1",
                "--adam_beta2", "0.95",
                "--warmup_ratio", "0.01",
                "--lr_scheduler_type", "cosine",
                "--logging_steps", "1",
                "--report_to", "none",
                "--model_max_length", "2048",
                "--gradient_checkpointing", "True",
                "--lazy_preprocess", "True",
                "--deepspeed", "finetune/ds_config_zero2.json"
             ],
             "env": {
                "CUDA_VISIBLE_DEVICES": "0,1"  // 添加环境变量
            },
            "justMyCode": false
        }
    ]
}

开两张卡的话,会出现两个子线程,想跟进哪个线程里面查数据,直接点进去对应的函数那行就行

index0的model.optimizer

这块儿分成了两个组,大小完全一样,16位的参数

与这里面的data相对。。

这是整个模型的总参数量(这个是进到_inner_traning_loop之后,环境参数里的,不是model里的):

还有两个参数,一个是params_in_partition另一个是params_not_in_partition代表在和不在这个分区上的参数

index1的model.optimizer,可以看到

看model中的param_names,可以看到,一共1+32*12+2=387个参数模块,所以看到上面的两个是按第193个节点来划分的,地中第193个节点各占一部分?

4张GPU的时候:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值