Datawhale X 魔搭 AI夏令营 task03


Datawhale X 魔搭 AI夏令营第四期-AIGC文生图方向

Task3:进阶上分-实战优化


打卡链接:
电脑端打开 https://linklearner.com/activity/14/10/37
学习手册:
https://datawhaler.feishu.cn/wiki/UM7awcAuQicI4ukd2qtccT51nug
魔搭学习链接:
在魔搭使用ComfyUI,玩转AIGC!
https://modelscope.cn/headlines/article/429
ComfyUI的官方地址:
https://github.com/comfyanonymous/ComfyUI
ComfyUI官方示范:
https://comfyanonymous.github.io/ComfyUI_examples/
别人的基础工作流示范:
https://github.com/cubiq/ComfyUI_Workflows
https://github.com/wyrde/wyrde-comfyui-workflows
工作流分享网站:
https://comfyworkflows.com/
comfyui的github仓库网站:
https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO?tab=readme-ov-file


任务内容

  1. 精读 baseline2、在群里分享思考
  2. 图像工作流、Lora微调基本原理
  3. 写笔记,完成 Task3 打卡

ComfyUI工作流

ComfyUI 是GUI的一种,是基于节点工作的用户界面,主要用于操作图像的生成技术,ComfyUI 的特别之处在于它采用了一种模块化的设计,把图像生成的过程分解成了许多小的步骤,每个步骤都是一个节点。这些节点可以连接起来形成一个工作流程,这样用户就可以根据需要定制自己的图像生成过程。
在这里插入图片描述


启动ComfyUI

在这里插入图片描述


不带Lora的工作流

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

工作流脚本

{
  "last_node_id": 15,
  "last_link_id": 18,
  "nodes": [
    {
      "id": 11,
      "type": "VAELoader",
      "pos": [
        1323,
        240
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            12
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "sdxl.vae.safetensors"
      ]
    },
    {
      "id": 10,
      "type": "VAEDecode",
      "pos": [
        1368,
        369
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 12,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            13
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 14,
      "type": "KolorsSampler",
      "pos": [
        1011,
        371
      ],
      "size": {
        "0": 315,
        "1": 222
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 16
        },
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "link": 17
        }
      ],
      "outputs": [
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsSampler"
      },
      "widgets_values": [
        1024,
        1024,
        1000102404233412,
        "fixed",
        25,
        5,
        "EulerDiscreteScheduler"
      ]
    },
    {
      "id": 6,
      "type": "DownloadAndLoadKolorsModel",
      "pos": [
        201,
        368
      ],
      "size": {
        "0": 315,
        "1": 82
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            16
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadKolorsModel"
      },
      "widgets_values": [
        "Kwai-Kolors/Kolors",
        "fp16"
      ]
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1366,
        468
      ],
      "size": [
        535.4001724243165,
        562.2001106262207
      ],
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 13
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 12,
      "type": "KolorsTextEncode",
      "pos": [
        519,
        529
      ],
      "size": [
        457.2893696934723,
        225.28656056301645
      ],
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "link": 14,
          "slot_index": 0
        }
      ],
      "outputs": [
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "links": [
            17
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsTextEncode"
      },
      "widgets_values": [
        "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf  |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",
        "",
        1
      ]
    },
    {
      "id": 15,
      "type": "Note",
      "pos": [
        200,
        636
      ],
      "size": [
        273.5273818969726,
        149.55464588512064
      ],
      "flags": {},
      "order": 2,
      "mode": 0,
      "properties": {
        "text": ""
      },
      "widgets_values": [
        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "DownloadAndLoadChatGLM3",
      "pos": [
        206,
        522
      ],
      "size": [
        274.5334274291992,
        58
      ],
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "links": [
            14
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadChatGLM3"
      },
      "widgets_values": [
        "fp16"
      ]
    }
  ],
  "links": [
    [
      12,
      11,
      0,
      10,
      1,
      "VAE"
    ],
    [
      13,
      10,
      0,
      3,
      0,
      "IMAGE"
    ],
    [
      14,
      13,
      0,
      12,
      0,
      "CHATGLM3MODEL"
    ],
    [
      16,
      6,
      0,
      14,
      0,
      "KOLORSMODEL"
    ],
    [
      17,
      12,
      0,
      14,
      1,
      "KOLORS_EMBEDS"
    ],
    [
      18,
      14,
      0,
      10,
      0,
      "LATENT"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.1,
      "offset": {
        "0": -114.73954010009766,
        "1": -139.79705810546875
      }
    }
  },
  "version": 0.4
}

带Lora的工作流

在这里插入图片描述

在这里插入图片描述

工作流脚本

{
  "last_node_id": 16,
  "last_link_id": 20,
  "nodes": [
    {
      "id": 11,
      "type": "VAELoader",
      "pos": [
        1323,
        240
      ],
      "size": {
        "0": 315,
        "1": 58
      },
      "flags": {},
      "order": 0,
      "mode": 0,
      "outputs": [
        {
          "name": "VAE",
          "type": "VAE",
          "links": [
            12
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "VAELoader"
      },
      "widgets_values": [
        "sdxl.vae.safetensors"
      ]
    },
    {
      "id": 10,
      "type": "VAEDecode",
      "pos": [
        1368,
        369
      ],
      "size": {
        "0": 210,
        "1": 46
      },
      "flags": {},
      "order": 7,
      "mode": 0,
      "inputs": [
        {
          "name": "samples",
          "type": "LATENT",
          "link": 18
        },
        {
          "name": "vae",
          "type": "VAE",
          "link": 12,
          "slot_index": 1
        }
      ],
      "outputs": [
        {
          "name": "IMAGE",
          "type": "IMAGE",
          "links": [
            13
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "VAEDecode"
      }
    },
    {
      "id": 15,
      "type": "Note",
      "pos": [
        200,
        636
      ],
      "size": {
        "0": 273.5273742675781,
        "1": 149.5546417236328
      },
      "flags": {},
      "order": 1,
      "mode": 0,
      "properties": {
        "text": ""
      },
      "widgets_values": [
        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
      ],
      "color": "#432",
      "bgcolor": "#653"
    },
    {
      "id": 13,
      "type": "DownloadAndLoadChatGLM3",
      "pos": [
        206,
        522
      ],
      "size": {
        "0": 274.5334167480469,
        "1": 58
      },
      "flags": {},
      "order": 2,
      "mode": 0,
      "outputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "links": [
            14
          ],
          "shape": 3
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadChatGLM3"
      },
      "widgets_values": [
        "fp16"
      ]
    },
    {
      "id": 6,
      "type": "DownloadAndLoadKolorsModel",
      "pos": [
        201,
        368
      ],
      "size": {
        "0": 315,
        "1": 82
      },
      "flags": {},
      "order": 3,
      "mode": 0,
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            19
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "DownloadAndLoadKolorsModel"
      },
      "widgets_values": [
        "Kwai-Kolors/Kolors",
        "fp16"
      ]
    },
    {
      "id": 12,
      "type": "KolorsTextEncode",
      "pos": [
        519,
        529
      ],
      "size": {
        "0": 457.28936767578125,
        "1": 225.28656005859375
      },
      "flags": {},
      "order": 4,
      "mode": 0,
      "inputs": [
        {
          "name": "chatglm3_model",
          "type": "CHATGLM3MODEL",
          "link": 14,
          "slot_index": 0
        }
      ],
      "outputs": [
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "links": [
            17
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsTextEncode"
      },
      "widgets_values": [
        "二次元,长发,少女,白色背景",
        "",
        1
      ]
    },
    {
      "id": 3,
      "type": "PreviewImage",
      "pos": [
        1366,
        469
      ],
      "size": {
        "0": 535.400146484375,
        "1": 562.2001342773438
      },
      "flags": {},
      "order": 8,
      "mode": 0,
      "inputs": [
        {
          "name": "images",
          "type": "IMAGE",
          "link": 13
        }
      ],
      "properties": {
        "Node name for S&R": "PreviewImage"
      }
    },
    {
      "id": 16,
      "type": "LoadKolorsLoRA",
      "pos": [
        606,
        368
      ],
      "size": {
        "0": 317.4000244140625,
        "1": 82
      },
      "flags": {},
      "order": 5,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 19
        }
      ],
      "outputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "links": [
            20
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "LoadKolorsLoRA"
      },
      "widgets_values": [
        "/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt",
        2
      ]
    },
    {
      "id": 14,
      "type": "KolorsSampler",
      "pos": [
        1011,
        371
      ],
      "size": {
        "0": 315,
        "1": 266
      },
      "flags": {},
      "order": 6,
      "mode": 0,
      "inputs": [
        {
          "name": "kolors_model",
          "type": "KOLORSMODEL",
          "link": 20
        },
        {
          "name": "kolors_embeds",
          "type": "KOLORS_EMBEDS",
          "link": 17
        },
        {
          "name": "latent",
          "type": "LATENT",
          "link": null
        }
      ],
      "outputs": [
        {
          "name": "latent",
          "type": "LATENT",
          "links": [
            18
          ],
          "shape": 3,
          "slot_index": 0
        }
      ],
      "properties": {
        "Node name for S&R": "KolorsSampler"
      },
      "widgets_values": [
        1024,
        1024,
        0,
        "fixed",
        25,
        5,
        "EulerDiscreteScheduler",
        1
      ]
    }
  ],
  "links": [
    [
      12,
      11,
      0,
      10,
      1,
      "VAE"
    ],
    [
      13,
      10,
      0,
      3,
      0,
      "IMAGE"
    ],
    [
      14,
      13,
      0,
      12,
      0,
      "CHATGLM3MODEL"
    ],
    [
      17,
      12,
      0,
      14,
      1,
      "KOLORS_EMBEDS"
    ],
    [
      18,
      14,
      0,
      10,
      0,
      "LATENT"
    ],
    [
      19,
      6,
      0,
      16,
      0,
      "KOLORSMODEL"
    ],
    [
      20,
      16,
      0,
      14,
      0,
      "KOLORSMODEL"
    ]
  ],
  "groups": [],
  "config": {},
  "extra": {
    "ds": {
      "scale": 1.2100000000000002,
      "offset": {
        "0": -183.91309381910426,
        "1": -202.11110769225016
      }
    }
  },
  "version": 0.4
}

LoRA相关知识

LoRA通过在预训练模型的关键层中添加低秩矩阵来实现。这些低秩矩阵通常被设计成具有较低维度的参数空间,这样它们就可以在不改变模型整体结构的情况下进行微调。在训练过程中,只有这些新增的低秩矩阵被更新,而原始模型的大部分权重保持不变。

示例:

import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型
  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder
  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型
  --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时,选择了使用 16 作为秩,适合在不显著降低模型性能的前提下,通过 LoRA 减少计算和内存的需求
  --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值,影响调整的强度
  --dataset_path data/lora_dataset_processed \ # 指定数据集路径,用于训练模型
  --output_path ./models \ # 指定输出路径,用于保存模型
  --max_epochs 1 \ # 设置最大训练轮数为 1
  --center_crop \ # 启用中心裁剪,这通常用于图像预处理
  --use_gradient_checkpointing \ # 启用梯度检查点技术,以节省内存
  --precision "16-mixed" # 指定训练时的精度为混合 16 位精度(half precision),这可以加速训练并减少显存使用
""".strip()
os.system(cmd) # 执行可图Lora训练    

参数详细说明:
在这里插入图片描述


心得

这次的“Datawhale X 魔搭 AI夏令营”让我第一次真正的接触了什么是AIGC,当模型跑通的时候还是很令人兴奋的,虽然是一键baseline哈哈哈~
也燃起了我对AIGC的兴趣,希望以后能参加夏令营更多的方向!
剩下的内容我还要再多研究研究,也推荐各位小伙伴没事多去Datawhale社区和魔搭社区逛逛,说不定就会有什么惊喜发现~

https://linklearner.com/home

https://www.modelscope.cn/topic/cf0de97eb6284e16812d7c54fbe29fe7/pub/summary

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值