小白的Datawhale X 魔塔AI夏令营最后一期的晕晕晕日记

最新推荐文章于 2024-09-26 17:23:00 发布

三秋sleeping

最新推荐文章于 2024-09-26 17:23:00 发布

阅读量1k

点赞数 38

文章标签：人工智能

本文链接：https://blog.csdn.net/2301_81656040/article/details/141280159

版权

今天是强化学习，有点恶心，大家就是尽力而为，可以后面多花时间去看看。

理解背景知识

初识ComfyUI

GUI 是 "Graphical User Interface"（图形用户界面）的缩写。简单来说，GUI 就是你在电脑屏幕上看到的那种有图标、按钮和菜单的交互方式。

ComfyUI 是GUI的一种，是基于节点工作的用户界面，主要用于操作图像的生成技术，ComfyUI 的特别之处在于它采用了一种模块化的设计，把图像生成的过程分解成了许多小的步骤，每个步骤都是一个节点。这些节点可以连接起来形成一个工作流程，这样用户就可以根据需要定制自己的图像生成过程。

ComfyUI核心模块

核心模块由模型加载器、提示词管理器、采样器、解码器。

本小节内容来自魔搭社区，具体内容可点击查看：魔搭官方教程。

Stable Diffusion的基本原理是通过降噪的方式（如完全的噪声图像），将一个原本的噪声信号变为无噪声的信号（如人可以理解的图像）。其中的降噪过程涉及到多次的采样。

采样的系数在KSampler中配置：

（大家可以后面自己整整看，我个人感觉纯看是看不明白的）

seed：控制噪声产生的随机种子

control_after_generate：控制seed在每次生成后的变化

steps：降噪的迭代步数，越多则信号越精准，相对的生成时间也越长

cfg：classifier free guidance决定了prompt对于最终生成图像的影响有多大。更高的值代表更多地展现prompt中的描述。

denoise: 多少内容会被噪声覆盖 sampler_name、scheduler：降噪参数。

我感觉这个稍微有点英语基础应该还是好看懂的，就是需要理解一下背后的一些意思。

实践部分

1.20分钟速通安装ComfyUI

在这里，我们依旧选择使用魔搭社区提供的Notebook和免费的GPU算力体验来体验ComfyUI。

2.下载脚本代码文件

下载安装ComfyUI的执行文件和task1中微调完成Lora文件

git lfs install

git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git

mv kolors_test_comfyui/* ./

rm -rf kolors_test_comfyui/

mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/

mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/

一键执行安装程序（大约10min）

我自己运行的时候发现是比较慢的，可能已经有下图但是如果直接复制链接到网页很有可能会白屏，大家别着急，再等等再法制就好了

复制链接到浏览器中访问

PS：如果链接访问白屏，或者报错，就等一会再访问重试，程序可能没有正常启动完毕

{

  "last_node_id": 15,

  "last_link_id": 18,

  "nodes": [

    {

      "id": 11,

      "type": "VAELoader",

      "pos": [

        1323,

        240

      ],

      "size": {

        "0": 315,

        "1": 58

      },

      "flags": {},

      "order": 0,

      "mode": 0,

      "outputs": [

        {

          "name": "VAE",

          "type": "VAE",

          "links": [

            12

          ],

          "shape": 3

        }

      ],

      "properties": {

        "Node name for S&R": "VAELoader"

      },

      "widgets_values": [

        "sdxl.vae.safetensors"

      ]

    },

    {

      "id": 10,

      "type": "VAEDecode",

      "pos": [

        1368,

        369

      ],

      "size": {

        "0": 210,

        "1": 46

      },

      "flags": {},

      "order": 6,

      "mode": 0,

      "inputs": [

        {

          "name": "samples",

          "type": "LATENT",

          "link": 18

        },

        {

          "name": "vae",

          "type": "VAE",

          "link": 12,

          "slot_index": 1

        }

      ],

      "outputs": [

        {

          "name": "IMAGE",

          "type": "IMAGE",

          "links": [

            13

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "VAEDecode"

      }

    },

    {

      "id": 14,

      "type": "KolorsSampler",

      "pos": [

        1011,

        371

      ],

      "size": {

        "0": 315,

        "1": 222

      },

      "flags": {},

      "order": 5,

      "mode": 0,

      "inputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "link": 16

        },

        {

          "name": "kolors_embeds",

          "type": "KOLORS_EMBEDS",

          "link": 17

        }

      ],

      "outputs": [

        {

          "name": "latent",

          "type": "LATENT",

          "links": [

            18

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "KolorsSampler"

      },

      "widgets_values": [

        1024,

        1024,

        1000102404233412,

        "fixed",

        25,

        5,

        "EulerDiscreteScheduler"

      ]

    },

    {

      "id": 6,

      "type": "DownloadAndLoadKolorsModel",

      "pos": [

        201,

        368

      ],

      "size": {

        "0": 315,

        "1": 82

      },

      "flags": {},

      "order": 1,

      "mode": 0,

      "outputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "links": [

            16

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "DownloadAndLoadKolorsModel"

      },

      "widgets_values": [

        "Kwai-Kolors/Kolors",

        "fp16"

      ]

    },

    {

      "id": 3,

      "type": "PreviewImage",

      "pos": [

        1366,

        468

      ],

      "size": [

        535.4001724243165,

        562.2001106262207

      ],

      "flags": {},

      "order": 7,

      "mode": 0,

      "inputs": [

        {

          "name": "images",

          "type": "IMAGE",

          "link": 13

        }

      ],

      "properties": {

        "Node name for S&R": "PreviewImage"

      }

    },

    {

      "id": 12,

      "type": "KolorsTextEncode",

      "pos": [

        519,

        529

      ],

      "size": [

        457.2893696934723,

        225.28656056301645

      ],

      "flags": {},

      "order": 4,

      "mode": 0,

      "inputs": [

        {

          "name": "chatglm3_model",

          "type": "CHATGLM3MODEL",

          "link": 14,

          "slot_index": 0

        }

      ],

      "outputs": [

        {

          "name": "kolors_embeds",

          "type": "KOLORS_EMBEDS",

          "links": [

            17

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "KolorsTextEncode"

      },

      "widgets_values": [

        "cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf  |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",

        "",

        1

      ]

    },

    {

      "id": 15,

      "type": "Note",

      "pos": [

        200,

        636

      ],

      "size": [

        273.5273818969726,

        149.55464588512064

      ],

      "flags": {},

      "order": 2,

      "mode": 0,

      "properties": {

        "text": ""

      },

      "widgets_values": [

        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"

      ],

      "color": "#432",

      "bgcolor": "#653"

    },

    {

      "id": 13,

      "type": "DownloadAndLoadChatGLM3",

      "pos": [

        206,

        522

      ],

      "size": [

        274.5334274291992,

        58

      ],

      "flags": {},

      "order": 3,

      "mode": 0,

      "outputs": [

        {

          "name": "chatglm3_model",

          "type": "CHATGLM3MODEL",

          "links": [

            14

          ],

          "shape": 3

        }

      ],

      "properties": {

        "Node name for S&R": "DownloadAndLoadChatGLM3"

      },

      "widgets_values": [

        "fp16"

      ]

    }

  ],

  "links": [

    [

      12,

      11,

      0,

      10,

      1,

      "VAE"

    ],

    [

      13,

      10,

      0,

      3,

      0,

      "IMAGE"

    ],

    [

      14,

      13,

      0,

      12,

      0,

      "CHATGLM3MODEL"

    ],

    [

      16,

      6,

      0,

      14,

      0,

      "KOLORSMODEL"

    ],

    [

      17,

      12,

      0,

      14,

      1,

      "KOLORS_EMBEDS"

    ],

    [

      18,

      14,

      0,

      10,

      0,

      "LATENT"

    ]

  ],

  "groups": [],

  "config": {},

  "extra": {

    "ds": {

      "scale": 1.1,

      "offset": {

        "0": -114.73954010009766,

        "1": -139.79705810546875

      }

    }

  },

  "version": 0.4

}

下载工作流脚本(不带lora模型)

然后就是加载模型，并完成第一次生图

PS：首次点击生成图片会加载资源，时间较长，大家耐心等待

{

  "last_node_id": 16,

  "last_link_id": 20,

  "nodes": [

    {

      "id": 11,

      "type": "VAELoader",

      "pos": [

        1323,

        240

      ],

      "size": {

        "0": 315,

        "1": 58

      },

      "flags": {},

      "order": 0,

      "mode": 0,

      "outputs": [

        {

          "name": "VAE",

          "type": "VAE",

          "links": [

            12

          ],

          "shape": 3

        }

      ],

      "properties": {

        "Node name for S&R": "VAELoader"

      },

      "widgets_values": [

        "sdxl.vae.safetensors"

      ]

    },

    {

      "id": 10,

      "type": "VAEDecode",

      "pos": [

        1368,

        369

      ],

      "size": {

        "0": 210,

        "1": 46

      },

      "flags": {},

      "order": 7,

      "mode": 0,

      "inputs": [

        {

          "name": "samples",

          "type": "LATENT",

          "link": 18

        },

        {

          "name": "vae",

          "type": "VAE",

          "link": 12,

          "slot_index": 1

        }

      ],

      "outputs": [

        {

          "name": "IMAGE",

          "type": "IMAGE",

          "links": [

            13

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "VAEDecode"

      }

    },

    {

      "id": 15,

      "type": "Note",

      "pos": [

        200,

        636

      ],

      "size": {

        "0": 273.5273742675781,

        "1": 149.5546417236328

      },

      "flags": {},

      "order": 1,

      "mode": 0,

      "properties": {

        "text": ""

      },

      "widgets_values": [

        "Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"

      ],

      "color": "#432",

      "bgcolor": "#653"

    },

    {

      "id": 13,

      "type": "DownloadAndLoadChatGLM3",

      "pos": [

        206,

        522

      ],

      "size": {

        "0": 274.5334167480469,

        "1": 58

      },

      "flags": {},

      "order": 2,

      "mode": 0,

      "outputs": [

        {

          "name": "chatglm3_model",

          "type": "CHATGLM3MODEL",

          "links": [

            14

          ],

          "shape": 3

        }

      ],

      "properties": {

        "Node name for S&R": "DownloadAndLoadChatGLM3"

      },

      "widgets_values": [

        "fp16"

      ]

    },

    {

      "id": 6,

      "type": "DownloadAndLoadKolorsModel",

      "pos": [

        201,

        368

      ],

      "size": {

        "0": 315,

        "1": 82

      },

      "flags": {},

      "order": 3,

      "mode": 0,

      "outputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "links": [

            19

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "DownloadAndLoadKolorsModel"

      },

      "widgets_values": [

        "Kwai-Kolors/Kolors",

        "fp16"

      ]

    },

    {

      "id": 12,

      "type": "KolorsTextEncode",

      "pos": [

        519,

        529

      ],

      "size": {

        "0": 457.28936767578125,

        "1": 225.28656005859375

      },

      "flags": {},

      "order": 4,

      "mode": 0,

      "inputs": [

        {

          "name": "chatglm3_model",

          "type": "CHATGLM3MODEL",

          "link": 14,

          "slot_index": 0

        }

      ],

      "outputs": [

        {

          "name": "kolors_embeds",

          "type": "KOLORS_EMBEDS",

          "links": [

            17

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "KolorsTextEncode"

      },

      "widgets_values": [

        "二次元，长发，少女，白色背景",

        "",

        1

      ]

    },

    {

      "id": 3,

      "type": "PreviewImage",

      "pos": [

        1366,

        469

      ],

      "size": {

        "0": 535.400146484375,

        "1": 562.2001342773438

      },

      "flags": {},

      "order": 8,

      "mode": 0,

      "inputs": [

        {

          "name": "images",

          "type": "IMAGE",

          "link": 13

        }

      ],

      "properties": {

        "Node name for S&R": "PreviewImage"

      }

    },

    {

      "id": 16,

      "type": "LoadKolorsLoRA",

      "pos": [

        606,

        368

      ],

      "size": {

        "0": 317.4000244140625,

        "1": 82

      },

      "flags": {},

      "order": 5,

      "mode": 0,

      "inputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "link": 19

        }

      ],

      "outputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "links": [

            20

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "LoadKolorsLoRA"

      },

      "widgets_values": [

        "/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt",

        2

      ]

    },

    {

      "id": 14,

      "type": "KolorsSampler",

      "pos": [

        1011,

        371

      ],

      "size": {

        "0": 315,

        "1": 266

      },

      "flags": {},

      "order": 6,

      "mode": 0,

      "inputs": [

        {

          "name": "kolors_model",

          "type": "KOLORSMODEL",

          "link": 20

        },

        {

          "name": "kolors_embeds",

          "type": "KOLORS_EMBEDS",

          "link": 17

        },

        {

          "name": "latent",

          "type": "LATENT",

          "link": null

        }

      ],

      "outputs": [

        {

          "name": "latent",

          "type": "LATENT",

          "links": [

            18

          ],

          "shape": 3,

          "slot_index": 0

        }

      ],

      "properties": {

        "Node name for S&R": "KolorsSampler"

      },

      "widgets_values": [

        1024,

        1024,

        0,

        "fixed",

        25,

        5,

        "EulerDiscreteScheduler",

        1

      ]

    }

  ],

  "links": [

    [

      12,

      11,

      0,

      10,

      1,

      "VAE"

    ],

    [

      13,

      10,

      0,

      3,

      0,

      "IMAGE"

    ],

    [

      14,

      13,

      0,

      12,

      0,

      "CHATGLM3MODEL"

    ],

    [

      17,

      12,

      0,

      14,

      1,

      "KOLORS_EMBEDS"

    ],

    [

      18,

      14,

      0,

      10,

      0,

      "LATENT"

    ],

    [

      19,

      6,

      0,

      16,

      0,

      "KOLORSMODEL"

    ],

    [

      20,

      16,

      0,

      14,

      0,

      "KOLORSMODEL"

    ]

  ],

  "groups": [],

  "config": {},

  "extra": {

    "ds": {

      "scale": 1.2100000000000002,

      "offset": {

        "0": -183.91309381910426,

        "1": -202.11110769225016

      }

    }

  },

  "version": 0.4

}

因为我不知道怎么把文件分享过来，所以要麻烦大家赋值粘贴了，有需要的可以dd我

这个是带Lora模型的版本

上面的描述词语和参数都是可以自己调节的，我是建议大家玩一下，可以理解更好一点。

到这里整个流程已经比较明了了，下面是一些流程细节和学西资料的东西。大家可以看一下后面lora的详细解释，我感觉是比较有用的。

一些学习资源

名称	链接地址
在魔搭使用ComfyUI，玩转AIGC！	https://modelscope.cn/headlines/article/429
ComfyUI的官方地址	https://github.com/comfyanonymous/ComfyUI
ComfyUI官方示范	https://comfyanonymous.github.io/ComfyUI_examples/
别人的基础工作流示范	https://github.com/cubiq/ComfyUI_Workflows
别人的基础工作流示范	https://github.com/wyrde/wyrde-comfyui-workflows
工作流分享网站	https://comfyworkflows.com/
推荐一个比较好的comfyui的github仓库网站	https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO?tab=readme-ov-file

Lora详解

import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
--pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型
--pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder
--pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型
--lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时，选择了使用 16 作为秩，适合在不显著降低模型性能的前提下，通过 LoRA 减少计算和内存的需求
--lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值，影响调整的强度
--dataset_path data/lora_dataset_processed \ # 指定数据集路径，用于训练模型
--output_path ./models \ # 指定输出路径，用于保存模型
--max_epochs 1 \ # 设置最大训练轮数为 1
--center_crop \ # 启用中心裁剪，这通常用于图像预处理
--use_gradient_checkpointing \ # 启用梯度检查点技术，以节省内存
--precision "16-mixed" # 指定训练时的精度为混合 16 位精度（half precision），这可以加速训练并减少显存使用
""".strip()
os.system(cmd) # 执行可图Lora训练

参数名称	参数值	说明
pretrained_unet_path	models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors	指定预训练UNet模型的路径
pretrained_text_encoder_path	models/kolors/Kolors/text_encoder	指定预训练文本编码器的路径
pretrained_fp16_vae_path	models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors	指定预训练VAE模型的路径
lora_rank	16	设置LoRA的秩（rank），影响模型的复杂度和性能
lora_alpha	4	设置LoRA的alpha值，控制微调的强度
dataset_path	data/lora_dataset_processed	指定用于训练的数据集路径
output_path	./models	指定训练完成后保存模型的路径
max_epochs	1	设置最大训练轮数为1
center_crop		启用中心裁剪，用于图像预处理
use_gradient_checkpointing		启用梯度检查点，节省显存
precision	"16-mixed"	设置训练时的精度为混合16位精度（half precision）

UNet、VAE和文本编码器的协作关系

UNet：负责根据输入的噪声和文本条件生成图像。在Stable Diffusion模型中，UNet接收由VAE编码器产生的噪声和文本编码器转换的文本向量作为输入，并预测去噪后的噪声，从而生成与文本描述相符的图像
VAE：生成模型，用于将输入数据映射到潜在空间，并从中采样以生成新图像。在Stable Diffusion中，VAE编码器首先生成带有噪声的潜在表示，这些表示随后与文本条件一起输入到UNet中
文本编码器：将文本输入转换为模型可以理解的向量表示。在Stable Diffusion模型中，文本编码器使用CLIP模型将文本提示转换为向量，这些向量与VAE生成的噪声一起输入到UNet中，指导图像的生成过程

数据集来源整理

以下渠道来源均需要考虑合规性问题，请大家在使用数据集过程中谨慎选择。

来源类型	推荐
公开的数据平台	魔搭社区内开放了近3000个数据集，涉及文本、图像、音频、视频和多模态等多种场景，左侧有标签栏帮助快速导览，大家可以看看有没有自己需要的数据集。其他数据平台推荐： ImageNet：包含数百万张图片，广泛用于分类任务，也可以用于生成任务。 Open Images：由Google维护，包含数千万张带有标签的图片。 Flickr：特别是Flickr30kK和Flickr8K数据集，常用于图像描述任务。 CelebA：专注于人脸图像的数据集。 LSUN (Large-scale Scene Understanding)：包含各种场景类别的大规模数据集。
使用API或爬虫获取	如果需要特定类型的内容，可以利用API从图库网站抓取图片，如Unsplash、Pexels等。使用网络爬虫技术从互联网上抓取图片，但需要注意版权问题。
数据合成	利用现有的图形引擎（如Unity、Unreal Engine）或特定软件生成合成数据，这在训练某些类型的模型时非常有用。最近Datawhale联合阿里云天池，做了一整套多模态大模型数据合成的学习，欢迎大家一起交流。从零入门多模态大模型数据合成
数据增强	对于较小的数据集，可以通过旋转、翻转、缩放、颜色变换等方式进行数据增强。
购买或定制	如果你的应用是特定领域的，比如医学影像、卫星图像等，建议从靠谱的渠道购买一些数据集。