LORA-SDXL训练

最近拿到一台n卡4090ti的机器,lora-sdxl训练跑的动,好兴奋啊

还没碰到机器,被告知是ubuntu的,以前windows的经验供参考了,还好模型和工程都是相通,果断折腾起来。经过大概一周的时间(上周开始的),将环境啥啥都跑通了。xformers==0.0.26.post1 torch==2.3.0+cu121,这个accelerate要安装一下,剩下的就是pip install -r requrements.txt,当然跟前文windows一样,需要改一些内容以避免每次启动的时候反复安装卸载等等,总之就是跑通了。这次也是GitHub - bmaltais/kohya_ss  加 GitHub - LarryJane491/Lora-Training-in-Comfy: This custom node lets you train LoRA directly in ComfyUI!

安装没太大问题,先开测一下SD1.5的训练,很丝滑,几秒几完成了。等等,lora-training-in-comfyui在哪里选sdxl模型的?呃,没找到,使用sdxl的checkpoints,类似sd1.5,好像也能跑,就是怎么跑都出错而已。原来它没开启sdxl模式,还是调用train_netword.py(正常的应该会切换到sdxl_train_network.py的),好吧,这个问题应该不难解决,直接改一下train.py让它只支持sdxl也可以解决的,先放一边吧。

回过头来玩玩kohya-ss-gui吧,在调用的时候需要注意一下

./gui.sh --listen={ip} --server_port 7860 --headless

为啥加--listen呢,主要是为了局域网也能玩,不同的机器嘛(默认只能是本机调用的)。

好了。这时候的GPU显存只用了19M,没错,Ubuntu这些linux系列的占用资源很少,不扯了,开动。浏览器打开http://{ip}:7860,可以看到一个这样的界面

首先我尝试了默认的参数,只修改model路径,图片路径和输出路径,resolution="1024,1024"(听说是标准配置),点了“start training”,果断给我报错了,还是那句话“...CUDA out of memory...”,这24G的显存都搞不定???好吧,我改,将resolution改到"768,768",通了!这很明显就是吃显存的大坑啊。

经过两三天的资料整理和尝试,今天总算能跑标准SDXL的训练流程了。下面就是几张图,然后我会啰嗦一下关联的东东。

1.配置混合视觉(这个听说是优化过的,听说显卡占用是full float>bp16>fp16这样的关系,我没法验证,只知道使用no就瞬间破24G,训练GG)

上回说了,将mixed percision改成no,其实这个是一个坑,虽然在低端显卡环境下保证能进入cache latent,但是没啥用啊,在4090面前就不要选择no了,它也扛不住啊,我也是经过了无数次(大概30次吧)的失败后,才将这个改成fp16的。为什么要改成fp16?大概因为它占用显存最少吧!

2.配置模块和图片位置(ssh将图片送过去)

3.配置meta(我随便写的)

4.输出路径,这里会输出过渡文件类似xxx_0000001.safetensors之类的,这些文件的个数跟epoch和图片个数、最大训练梯度累计(max train epoch),最大训练步数(max train steps)相关。一般来说,将"最大训练梯度累计"和“最大训练步数”都配置为0,则得到训练步数=epoch*图片张数*文件夹设定的数值(例如文件夹名是20_title,表示当前的文件夹每个图片都训练20次),举例说明一下。我的训练图片路径下就只有一个子文件夹“20_girls”,里面放了100张美女照片,epoch设置为10,那么训练步数=20*100*10,即2万次。假设填上了"最大训练梯度累计"和“最大训练步数”,就会先比较填写的epoch和"最大训练梯度累计"的值,拿到最大值epoch_max,这里我假设"最大训练梯度累计"=10,那么tmp_train_steps=epoch_max*100*10=10000; 训练步数大于最大训练步数时,好像没啥用,还是按sum_a=sum(文件名前缀数值*张数);epoch_max=max(epoch,最大训练梯度累计),然后sum_a*epoch_max得到总训练步数,啊啊啊啊啊啊~~尴尬...最大训练步数==0时,不起作用,当最大训练步数>0时,就要比较计算出来的训练总步数,然后max(最大训练步数,训练总步数)? 有点混乱啊。好吧,说一个真实的案例,训练目录下有两个文件夹,150_aa有4张图,150_bb有8张,epoch=10,max epoch=20 计算的步数=(150*4+150*8)*20=36000!

5.配置批次(train batch size)和梯度累计(epoch),cache latents和cache latents to disk,可以自行选择,建议点上cache latents,不建议开cache latents to disk。这里重点是批次和梯度累计!

经过反复试验,网上说的显卡8G选择1,16G+可以选择6等等,这个版本都行不通,只有1才能过,估摸是这个版本的显存没优化好,占用特别多,后面我会贴一个显存使用的情况图。其次就是epoch,网上一堆说这个增加了就可以减少显存占用,但是在这里行不通!

这个设置了不同的epoch数值的,但是显存占用一直都是16.6G附近,很明显这个版本是不太可能通过调梯度累计实现的。我这里有几张图片,训练都是1-2次的,得到一个简单表格

梯度累计与显存占用,训练时间的关系
Epoch显存占用训练时间
1016.6G1:43
2016.6G3:27
4016.6G6:57
8016.6G13:51

很明显吧,这个版本的显存占用是固定的,训练时间和epoch就线性关系。

6.优化器这里选择PageAdamW8bit,默认是AdamW8bit,我开始以为是一样的,结果不一样。选择AdamW8bit死活就过去,估计显存还要很多吧。。。

7.配置分辨率

8.选上no half VAE,看小字说明吹得挺好的。

9.终于可以开始训练了。

先写到这里,后续魔改好Comfyui版本再补充。

20240523 魔改版的Lora-Training-in-Comfy for ComfyUI来了。

首先进入{Comfyui_install_path}/ComfyUI/custom_nodes/Lora-Training-in-Comfy目录,找到train.py

1.修改“class LoraTraininginComfy”下的“INPUT_TYPES”和“loratraining”,我也懒得截图,直接上程序

这里增加了optimizerType和is_sdxml两个参数,旧版的默认优化器是AdamW8bit,这个优化器我在训练Lora的时候吃尽苦头,4090Ti 24G的显卡一下子撑爆,所以还是将那些优化器全部移过来吧。这个参数是在https://github.com/bmaltais/kohya_ss.git里面找到的,不保证全部能用,选择好自己适合的来用,例如“PagedAdamW8bit”😄函数loratraining就增加参数和INPUT_TYPES一一对应,将优化器传递过去“optimizer_type = optimizerType”,还做了is_sdxl的判断,因为sdxl训练跑的是sdxl_train_network.py,┗|`O′|┛ 嗷~~,忘记了,这个只支持network.lora,其他模式可能出问题,没测试过,所以不确定哦。

嗯,先上图需要修改哪里,后面再抄作业

    def INPUT_TYPES(s):
         return {
            "required": {
            "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
            #"theseed": ("INT", {"default": 0, "min": 0, "max": 0xffffffffffffffff}),
            "data_path": ("STRING", {"default": "Insert path of image folders"}),
			"batch_size": ("INT", {"default": 1, "min":1}),
            "max_train_epoches": ("INT", {"default":10, "min":1}),
            "save_every_n_epochs": ("INT", {"default":10, "min":1}),
            #"lr": ("INT": {"default":"1e-4"}),
            #"optimizer_type": ("STRING", {["AdamW8bit", "Lion8bit", "SGDNesterov8bit", "AdaFactor", "prodigy"]}),
            "optimizerType": (["AdamW","AdamW8bit","Adafactor","DAdaptation","DAdaptAdaGrad","DAdaptAdam","DAdaptAdan","DAdaptAdanIP","DAdaptAdamPreprint","DAdaptLion","DAdaptSGD","Lion","Lion8bit","PagedAdamW8bit","PagedAdamW32bit","PagedLion8bit","Prodigy","SGDNesterov","SGDNesterov8bit",], ),
            "output_name": ("STRING", {"default":'Desired name for LoRA.'}),
            "clip_skip": ("INT", {"default":2, "min":1}),
            "output_dir": ("STRING", {"default":'models/loras'}),
            "is_sdxl": (["No", "Yes"], ),
            },
        }

    def loratraining(self, ckpt_name, data_path, batch_size, max_train_epoches, save_every_n_epochs,optimizerType, output_name, clip_skip, output_dir,is_sdxl):
        #free memory first of all
        loadedmodels=model_management.current_loaded_models
        unloaded_model = False
        for i in range(len(loadedmodels) -1, -1, -1):
            m = loadedmodels.pop(i)
            m.model_unload()
            del m
            unloaded_model = True
        if unloaded_model:
            model_management.soft_empty_cache()
            
        print(model_management.current_loaded_models)
        #loadedmodel = model_management.LoadedModel()
        #loadedmodel.model_unload(self, current_loaded_models)
        #transform backslashes into slashes for user convenience.
        train_data_dir = data_path.replace( "\\", "/")
        #print(train_data_dir)
				optimizer_type = optimizerType  #get optimizer
        #generates a random seed
        theseed = random.randint(0, 2^32-1)
        
        if multi_gpu:
            launch_args.append("--multi_gpu")

        if lowram:
            ext_args.append("--lowram")

        if is_v2_model:
            ext_args.append("--v2")
        else:
            ext_args.append(f"--clip_skip={clip_skip}")

        if parameterization:
            ext_args.append("--v_parameterization")

        if train_unet_only:
            ext_args.append("--network_train_unet_only")

        if train_text_encoder_only:
            ext_args.append("--network_train_text_encoder_only")

        if network_weights:
            ext_args.append(f"--network_weights={network_weights}")

        if reg_data_dir:
            ext_args.append(f"--reg_data_dir={reg_data_dir}")

        if optimizer_type:
            ext_args.append(f"--optimizer_type={optimizer_type}")

        if optimizer_type == "DAdaptation":
            ext_args.append("--optimizer_args")
            ext_args.append("decouple=True")

        if network_module == "lycoris.kohya":
            ext_args.extend([
                f"--network_args",
                f"conv_dim={conv_dim}",
                f"conv_alpha={conv_alpha}",
                f"algo={algo}",
                f"dropout={dropout}"
            ])

        if noise_offset != 0:
            ext_args.append(f"--noise_offset={noise_offset}")

        if stop_text_encoder_training != 0:
            ext_args.append(f"--stop_text_encoder_training={stop_text_encoder_training}")

        if save_state == 1:
            ext_args.append("--save_state")

        if resume:
            ext_args.append(f"--resume={resume}")

        if min_snr_gamma != 0:
            ext_args.append(f"--min_snr_gamma={min_snr_gamma}")

        if persistent_data_loader_workers:
            ext_args.append("--persistent_data_loader_workers")

        if use_wandb == 1:
            ext_args.append("--log_with=all")
            if wandb_api_key:
                ext_args.append(f"--wandb_api_key={wandb_api_key}")
            if log_tracker_name:
                ext_args.append(f"--log_tracker_name={log_tracker_name}")
        else:
            ext_args.append("--log_with=tensorboard")

        launchargs=' '.join(launch_args)
        extargs=' '.join(ext_args)

        pretrained_model = folder_paths.get_full_path("checkpoints", ckpt_name)
        
        #Looking for the training script.
        progpath = os.getcwd()
        nodespath=''
        #auto sdxl
        if not is_sdxl:
            for dirpath, dirnames, filenames in os.walk(progpath):
                if 'sd-scripts' in dirnames:
                    nodespath= dirpath + '/sd-scripts/train_network.py'
                    print(nodespath)
        else:
            for dirpath, dirnames, filenames in os.walk(progpath):
                if 'sd-scripts' in dirnames:
                    nodespath= dirpath + '/sd-scripts/sdxl_train_network.py'
                    print(nodespath)
            
        nodespath = nodespath.replace( "\\", "/")
        command = "python -m accelerate.commands.launch " + launchargs + f'--num_cpu_threads_per_process=2 "{nodespath}" --enable_bucket --pretrained_model_name_or_path={pretrained_model} --train_data_dir="{train_data_dir}" --output_dir="{output_dir}" --logging_dir="./logs" --log_prefix={output_name} --resolution={resolution} --network_module={network_module} --max_train_epochs={max_train_epoches} --learning_rate={lr} --unet_lr={unet_lr} --text_encoder_lr={text_encoder_lr} --lr_scheduler={lr_scheduler} --lr_warmup_steps={lr_warmup_steps} --lr_scheduler_num_cycles={lr_restart_cycles} --network_dim={network_dim} --network_alpha={network_alpha} --output_name={output_name} --train_batch_size={batch_size} --save_every_n_epochs={save_every_n_epochs} --mixed_precision="fp16" --save_precision="fp16" --seed={theseed} --cache_latents --prior_loss_weight=1 --max_token_length=225 --caption_extension=".txt" --save_model_as={save_model_as} --min_bucket_reso={min_bucket_reso} --max_bucket_reso={max_bucket_reso} --keep_tokens={keep_tokens} --xformers --shuffle_caption ' + extargs
        #print(command)
        subprocess.run(command, shell=True)
        print("Train finished")
        #input()
        return ()

        

另外Advance版本也类似修改,还是直接上代码凑数吧

    def INPUT_TYPES(s):
         return {
            "required": {
            "ckpt_name": (folder_paths.get_filename_list("checkpoints"), ),
            "v2": (["No", "Yes"], ),
            "networkmodule": (["networks.lora", "lycoris.kohya"], ),
            "networkdimension": ("INT", {"default": 32, "min":0}),
            "networkalpha": ("INT", {"default":32, "min":0}),
            "trainingresolution": ("INT", {"default":512, "step":8}),
            "data_path": ("STRING", {"default": "Insert path of image folders"}),
			"batch_size": ("INT", {"default": 1, "min":1}),
            "max_train_epoches": ("INT", {"default":10, "min":1}),
            "save_every_n_epochs": ("INT", {"default":10, "min":1}),
            "keeptokens": ("INT", {"default":0, "min":0}),
            "minSNRgamma": ("FLOAT", {"default":0, "min":0, "step":0.1}),
            "learningrateText": ("FLOAT", {"default":0.0001, "min":0, "step":0.00001}),
            "learningrateUnet": ("FLOAT", {"default":0.0001, "min":0, "step":0.00001}),
            "learningRateScheduler": (["cosine_with_restarts", "linear", "cosine", "polynomial", "constant", "constant_with_warmup"], ),
            "lrRestartCycles": ("INT", {"default":1, "min":1}),
            "optimizerType": (["AdamW","AdamW8bit","Adafactor","DAdaptation","DAdaptAdaGrad","DAdaptAdam","DAdaptAdan","DAdaptAdanIP","DAdaptAdamPreprint","DAdaptLion","DAdaptSGD","Lion","Lion8bit","PagedAdamW8bit","PagedAdamW32bit","PagedLion8bit","Prodigy","SGDNesterov","SGDNesterov8bit",], ),
            "output_name": ("STRING", {"default":'Desired name for LoRA.'}),
            "algorithm": (["lora","loha","lokr","ia3","dylora", "locon"], ),
            "networkDropout": ("FLOAT", {"default": 0, "step":0.1}),
            "clip_skip": ("INT", {"default":2, "min":1}),
            "output_dir": ("STRING", {"default":'models/loras'}),
            "is_sdxl": (["No", "Yes"], ),
            },
        }

    def loratraining(self, ckpt_name, v2, networkmodule, networkdimension, networkalpha, trainingresolution, data_path, batch_size, max_train_epoches, save_every_n_epochs, keeptokens, minSNRgamma, learningrateText, learningrateUnet, learningRateScheduler, lrRestartCycles, optimizerType, output_name, algorithm, networkDropout, clip_skip, output_dir,is_sdxl):
        #free memory first of all
        loadedmodels=model_management.current_loaded_models
        unloaded_model = False
        for i in range(len(loadedmodels) -1, -1, -1):
            m = loadedmodels.pop(i)
            m.model_unload()
            del m
            unloaded_model = True
        if unloaded_model:
            model_management.soft_empty_cache()
            
        #print(model_management.current_loaded_models)
        #loadedmodel = model_management.LoadedModel()
        #loadedmodel.model_unload(self, current_loaded_models)
        
        #transform backslashes into slashes for user convenience.
        train_data_dir = data_path.replace( "\\", "/")
        
        
        
        #ADVANCED parameters initialization
        is_v2_model=0
        network_moduke="networks.lora"
        network_dim=32
        network_alpha=32
        resolution = "512,512"
        keep_tokens = 0
        min_snr_gamma = 0
        unet_lr = "1e-4"
        text_encoder_lr = "1e-5"
        lr_scheduler = "cosine_with_restarts"
        lr_restart_cycles = 0
        #optimizer_type = "AdamW8bit"
        algo= "lora"
        dropout = 0.0
        
        if v2 == "Yes":
            is_v2_model = 1
        
        network_module = networkmodule
        network_dim = networkdimension
        network_alpha = networkalpha
        resolution = f"{trainingresolution},{trainingresolution}"
        
        formatted_value = str(format(learningrateText, "e")).rstrip('0').rstrip()
        text_encoder_lr = ''.join(c for c in formatted_value if not (c == '0'))
        
        formatted_value2 = str(format(learningrateUnet, "e")).rstrip('0').rstrip()
        unet_lr = ''.join(c for c in formatted_value2 if not (c == '0'))
        
        keep_tokens = keeptokens
        min_snr_gamma = minSNRgamma
        lr_scheduler = learningRateScheduler
        lr_restart_cycles = lrRestartCycles
        optimizer_type = optimizerType
        algo = algorithm
        dropout = f"{networkDropout}"

        #generates a random seed
        theseed = random.randint(0, 2^32-1)
        
        if multi_gpu:
            launch_args.append("--multi_gpu")

        if lowram:
            ext_args.append("--lowram")

        if is_v2_model:
            ext_args.append("--v2")
        else:
            ext_args.append(f"--clip_skip={clip_skip}")

        if parameterization:
            ext_args.append("--v_parameterization")

        if train_unet_only:
            ext_args.append("--network_train_unet_only")

        if train_text_encoder_only:
            ext_args.append("--network_train_text_encoder_only")

        if network_weights:
            ext_args.append(f"--network_weights={network_weights}")

        if reg_data_dir:
            ext_args.append(f"--reg_data_dir={reg_data_dir}")

        if optimizer_type:
            ext_args.append(f"--optimizer_type={optimizer_type}")

        if optimizer_type == "DAdaptation":
            ext_args.append("--optimizer_args")
            ext_args.append("decouple=True")

        if network_module == "lycoris.kohya":
            ext_args.extend([
                f"--network_args",
                f"conv_dim={conv_dim}",
                f"conv_alpha={conv_alpha}",
                f"algo={algo}",
                f"dropout={dropout}"
            ])

        if noise_offset != 0:
            ext_args.append(f"--noise_offset={noise_offset}")

        if stop_text_encoder_training != 0:
            ext_args.append(f"--stop_text_encoder_training={stop_text_encoder_training}")

        if save_state == 1:
            ext_args.append("--save_state")

        if resume:
            ext_args.append(f"--resume={resume}")

        if min_snr_gamma != 0:
            ext_args.append(f"--min_snr_gamma={min_snr_gamma}")

        if persistent_data_loader_workers:
            ext_args.append("--persistent_data_loader_workers")

        if use_wandb == 1:
            ext_args.append("--log_with=all")
            if wandb_api_key:
                ext_args.append(f"--wandb_api_key={wandb_api_key}")
            if log_tracker_name:
                ext_args.append(f"--log_tracker_name={log_tracker_name}")
        else:
            ext_args.append("--log_with=tensorboard")

        launchargs=' '.join(launch_args)
        extargs=' '.join(ext_args)

        pretrained_model = folder_paths.get_full_path("checkpoints", ckpt_name)
        
        #Looking for the training script.
        progpath = os.getcwd()
        nodespath=''

        #auto sdxl
        if not is_sdxl:
            for dirpath, dirnames, filenames in os.walk(progpath):
                if 'sd-scripts' in dirnames:
                    nodespath= dirpath + '/sd-scripts/train_network.py'
                    print(nodespath)
        else:
            for dirpath, dirnames, filenames in os.walk(progpath):
                if 'sd-scripts' in dirnames:
                    nodespath= dirpath + '/sd-scripts/sdxl_train_network.py'
                    print(nodespath)
            

        nodespath = nodespath.replace( "\\", "/")
        
        command = "python -m accelerate.commands.launch " + launchargs + f'--num_cpu_threads_per_process=8 "{nodespath}" --enable_bucket --pretrained_model_name_or_path={pretrained_model} --train_data_dir="{train_data_dir}" --output_dir="{output_dir}" --logging_dir="./logs" --log_prefix={output_name} --resolution={resolution} --network_module={network_module} --max_train_epochs={max_train_epoches} --learning_rate={lr} --unet_lr={unet_lr} --text_encoder_lr={text_encoder_lr} --lr_scheduler={lr_scheduler} --lr_warmup_steps={lr_warmup_steps} --lr_scheduler_num_cycles={lr_restart_cycles} --network_dim={network_dim} --network_alpha={network_alpha} --output_name={output_name} --train_batch_size={batch_size} --save_every_n_epochs={save_every_n_epochs} --mixed_precision="fp16" --save_precision="fp16" --seed={theseed} --cache_latents --prior_loss_weight=1 --max_token_length=225 --caption_extension=".txt" --save_model_as={save_model_as} --min_bucket_reso={min_bucket_reso} --max_bucket_reso={max_bucket_reso} --keep_tokens={keep_tokens} --xformers --shuffle_caption ' + extargs
        #print(command)
        subprocess.run(command, shell=True)
        print("Train finished")
        #input()
        return ()
        
        

以上99%代码来自LarryJane491的Lora-Training-in-Comfy,我是抄作业的,Python不是我强项。

最后节点图凑一下

有些本来已经调用了Lora Rraining in ComfyUI这个节点的,需要删掉重新创建,不然就配置对不上。另外Lora Rraining in ComfyUI节点还要加一个resolution的参数,不然默认就是512,完全不属于lora-sdxl的菜,修改方式很简单,不想重复灌水了。

打完收工,呼~~

  • 25
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值