今天是强化学习,有点恶心,大家就是尽力而为,可以后面多花时间去看看。
理解背景知识
- 初识ComfyUI
GUI 是 "Graphical User Interface"(图形用户界面)的缩写。简单来说,GUI 就是你在电脑屏幕上看到的那种有图标、按钮和菜单的交互方式。
ComfyUI 是GUI的一种,是基于节点工作的用户界面,主要用于操作图像的生成技术,ComfyUI 的特别之处在于它采用了一种模块化的设计,把图像生成的过程分解成了许多小的步骤,每个步骤都是一个节点。这些节点可以连接起来形成一个工作流程,这样用户就可以根据需要定制自己的图像生成过程。
- ComfyUI核心模块
核心模块由模型加载器、提示词管理器、采样器、解码器。
本小节内容来自魔搭社区,具体内容可点击查看:魔搭官方教程。
Stable Diffusion的基本原理是通过降噪的方式(如完全的噪声图像),将一个原本的噪声信号变为无噪声的信号(如人可以理解的图像)。其中的降噪过程涉及到多次的采样。
采样的系数在KSampler中配置:
(大家可以后面自己整整看,我个人感觉纯看是看不明白的)
seed:控制噪声产生的随机种子
control_after_generate:控制seed在每次生成后的变化
steps:降噪的迭代步数,越多则信号越精准,相对的生成时间也越长
cfg:classifier free guidance决定了prompt对于最终生成图像的影响有多大。更高的值代表更多地展现prompt中的描述。
denoise: 多少内容会被噪声覆盖 sampler_name、scheduler:降噪参数。
我感觉这个稍微有点英语基础应该还是好看懂的,就是需要理解一下背后的一些意思。
实践部分
1.20分钟速通安装ComfyUI
在这里,我们依旧选择使用魔搭社区提供的Notebook和免费的GPU算力体验来体验ComfyUI。
2.下载脚本代码文件
下载安装ComfyUI的执行文件和task1中微调完成Lora文件
git lfs install
git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git
mv kolors_test_comfyui/* ./
rm -rf kolors_test_comfyui/
mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/
mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/
一键执行安装程序(大约10min)
我自己运行的时候发现是比较慢的,可能已经有下图但是如果直接复制链接到网页很有可能会白屏,大家别着急,再等等再法制就好了
复制链接到浏览器中访问
PS:如果链接访问白屏,或者报错,就等一会再访问重试,程序可能没有正常启动完毕
{
"last_node_id": 15,
"last_link_id": 18,
"nodes": [
{
"id": 11,
"type": "VAELoader",
"pos": [
1323,
240
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
12
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"sdxl.vae.safetensors"
]
},
{
"id": 10,
"type": "VAEDecode",
"pos": [
1368,
369
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 18
},
{
"name": "vae",
"type": "VAE",
"link": 12,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
13
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 14,
"type": "KolorsSampler",
"pos": [
1011,
371
],
"size": {
"0": 315,
"1": 222
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 16
},
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"link": 17
}
],
"outputs": [
{
"name": "latent",
"type": "LATENT",
"links": [
18
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsSampler"
},
"widgets_values": [
1024,
1024,
1000102404233412,
"fixed",
25,
5,
"EulerDiscreteScheduler"
]
},
{
"id": 6,
"type": "DownloadAndLoadKolorsModel",
"pos": [
201,
368
],
"size": {
"0": 315,
"1": 82
},
"flags": {},
"order": 1,
"mode": 0,
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
16
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadKolorsModel"
},
"widgets_values": [
"Kwai-Kolors/Kolors",
"fp16"
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
1366,
468
],
"size": [
535.4001724243165,
562.2001106262207
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 13
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 12,
"type": "KolorsTextEncode",
"pos": [
519,
529
],
"size": [
457.2893696934723,
225.28656056301645
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"link": 14,
"slot_index": 0
}
],
"outputs": [
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"links": [
17
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsTextEncode"
},
"widgets_values": [
"cinematic photograph of an astronaut riding a horse in space |\nillustration of a cat wearing a top hat and a scarf |\nphotograph of a goldfish in a bowl |\nanime screencap of a red haired girl",
"",
1
]
},
{
"id": 15,
"type": "Note",
"pos": [
200,
636
],
"size": [
273.5273818969726,
149.55464588512064
],
"flags": {},
"order": 2,
"mode": 0,
"properties": {
"text": ""
},
"widgets_values": [
"Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "DownloadAndLoadChatGLM3",
"pos": [
206,
522
],
"size": [
274.5334274291992,
58
],
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"links": [
14
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadChatGLM3"
},
"widgets_values": [
"fp16"
]
}
],
"links": [
[
12,
11,
0,
10,
1,
"VAE"
],
[
13,
10,
0,
3,
0,
"IMAGE"
],
[
14,
13,
0,
12,
0,
"CHATGLM3MODEL"
],
[
16,
6,
0,
14,
0,
"KOLORSMODEL"
],
[
17,
12,
0,
14,
1,
"KOLORS_EMBEDS"
],
[
18,
14,
0,
10,
0,
"LATENT"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.1,
"offset": {
"0": -114.73954010009766,
"1": -139.79705810546875
}
}
},
"version": 0.4
}
下载工作流脚本(不带lora模型)
然后就是加载模型,并完成第一次生图
PS:首次点击生成图片会加载资源,时间较长,大家耐心等待
{
"last_node_id": 16,
"last_link_id": 20,
"nodes": [
{
"id": 11,
"type": "VAELoader",
"pos": [
1323,
240
],
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 0,
"mode": 0,
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
12
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"sdxl.vae.safetensors"
]
},
{
"id": 10,
"type": "VAEDecode",
"pos": [
1368,
369
],
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 18
},
{
"name": "vae",
"type": "VAE",
"link": 12,
"slot_index": 1
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
13
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 15,
"type": "Note",
"pos": [
200,
636
],
"size": {
"0": 273.5273742675781,
"1": 149.5546417236328
},
"flags": {},
"order": 1,
"mode": 0,
"properties": {
"text": ""
},
"widgets_values": [
"Text encoding takes the most VRAM, quantization can reduce that a lot.\n\nApproximate values I have observed:\nfp16 - 12 GB\nquant8 - 8-9 GB\nquant4 - 4-5 GB\n\nquant4 reduces the quality quite a bit, 8 seems fine"
],
"color": "#432",
"bgcolor": "#653"
},
{
"id": 13,
"type": "DownloadAndLoadChatGLM3",
"pos": [
206,
522
],
"size": {
"0": 274.5334167480469,
"1": 58
},
"flags": {},
"order": 2,
"mode": 0,
"outputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"links": [
14
],
"shape": 3
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadChatGLM3"
},
"widgets_values": [
"fp16"
]
},
{
"id": 6,
"type": "DownloadAndLoadKolorsModel",
"pos": [
201,
368
],
"size": {
"0": 315,
"1": 82
},
"flags": {},
"order": 3,
"mode": 0,
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
19
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "DownloadAndLoadKolorsModel"
},
"widgets_values": [
"Kwai-Kolors/Kolors",
"fp16"
]
},
{
"id": 12,
"type": "KolorsTextEncode",
"pos": [
519,
529
],
"size": {
"0": 457.28936767578125,
"1": 225.28656005859375
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "chatglm3_model",
"type": "CHATGLM3MODEL",
"link": 14,
"slot_index": 0
}
],
"outputs": [
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"links": [
17
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsTextEncode"
},
"widgets_values": [
"二次元,长发,少女,白色背景",
"",
1
]
},
{
"id": 3,
"type": "PreviewImage",
"pos": [
1366,
469
],
"size": {
"0": 535.400146484375,
"1": 562.2001342773438
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 13
}
],
"properties": {
"Node name for S&R": "PreviewImage"
}
},
{
"id": 16,
"type": "LoadKolorsLoRA",
"pos": [
606,
368
],
"size": {
"0": 317.4000244140625,
"1": 82
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 19
}
],
"outputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"links": [
20
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "LoadKolorsLoRA"
},
"widgets_values": [
"/mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt",
2
]
},
{
"id": 14,
"type": "KolorsSampler",
"pos": [
1011,
371
],
"size": {
"0": 315,
"1": 266
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "kolors_model",
"type": "KOLORSMODEL",
"link": 20
},
{
"name": "kolors_embeds",
"type": "KOLORS_EMBEDS",
"link": 17
},
{
"name": "latent",
"type": "LATENT",
"link": null
}
],
"outputs": [
{
"name": "latent",
"type": "LATENT",
"links": [
18
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "KolorsSampler"
},
"widgets_values": [
1024,
1024,
0,
"fixed",
25,
5,
"EulerDiscreteScheduler",
1
]
}
],
"links": [
[
12,
11,
0,
10,
1,
"VAE"
],
[
13,
10,
0,
3,
0,
"IMAGE"
],
[
14,
13,
0,
12,
0,
"CHATGLM3MODEL"
],
[
17,
12,
0,
14,
1,
"KOLORS_EMBEDS"
],
[
18,
14,
0,
10,
0,
"LATENT"
],
[
19,
6,
0,
16,
0,
"KOLORSMODEL"
],
[
20,
16,
0,
14,
0,
"KOLORSMODEL"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 1.2100000000000002,
"offset": {
"0": -183.91309381910426,
"1": -202.11110769225016
}
}
},
"version": 0.4
}
因为我不知道怎么把文件分享过来,所以要麻烦大家赋值粘贴了,有需要的可以dd我
这个是带Lora模型的版本
上面的描述词语和参数都是可以自己调节的,我是建议大家玩一下,可以理解更好一点。
到这里整个流程已经比较明了了,下面是一些流程细节和学西资料的东西。大家可以看一下后面lora的详细解释,我感觉是比较有用的。
一些学习资源
名称 | 链接地址 |
在魔搭使用ComfyUI,玩转AIGC! | https://modelscope.cn/headlines/article/429 |
ComfyUI的官方地址 | https://github.com/comfyanonymous/ComfyUI |
ComfyUI官方示范 | https://comfyanonymous.github.io/ComfyUI_examples/ |
别人的基础工作流示范 | https://github.com/cubiq/ComfyUI_Workflows |
https://github.com/wyrde/wyrde-comfyui-workflows | |
工作流分享网站 | https://comfyworkflows.com/ |
推荐一个比较好的comfyui的github仓库网站 | https://github.com/ZHO-ZHO-ZHO/ComfyUI-Workflows-ZHO?tab=readme-ov-file |
登录 · 语雀(这个是学习ppt)
Lora详解
- import os
- cmd = """
- python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
- --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型
- --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder
- --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型
- --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时,选择了使用 16 作为秩,适合在不显著降低模型性能的前提下,通过 LoRA 减少计算和内存的需求
- --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值,影响调整的强度
- --dataset_path data/lora_dataset_processed \ # 指定数据集路径,用于训练模型
- --output_path ./models \ # 指定输出路径,用于保存模型
- --max_epochs 1 \ # 设置最大训练轮数为 1
- --center_crop \ # 启用中心裁剪,这通常用于图像预处理
- --use_gradient_checkpointing \ # 启用梯度检查点技术,以节省内存
- --precision "16-mixed" # 指定训练时的精度为混合 16 位精度(half precision),这可以加速训练并减少显存使用
- """.strip()
- os.system(cmd) # 执行可图Lora训练
参数名称 | 参数值 | 说明 |
pretrained_unet_path | models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors | 指定预训练UNet模型的路径 |
pretrained_text_encoder_path | models/kolors/Kolors/text_encoder | 指定预训练文本编码器的路径 |
pretrained_fp16_vae_path | models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors | 指定预训练VAE模型的路径 |
lora_rank | 16 | 设置LoRA的秩(rank),影响模型的复杂度和性能 |
lora_alpha | 4 | 设置LoRA的alpha值,控制微调的强度 |
dataset_path | data/lora_dataset_processed | 指定用于训练的数据集路径 |
output_path | ./models | 指定训练完成后保存模型的路径 |
max_epochs | 1 | 设置最大训练轮数为1 |
center_crop | 启用中心裁剪,用于图像预处理 | |
use_gradient_checkpointing | 启用梯度检查点,节省显存 | |
precision | "16-mixed" | 设置训练时的精度为混合16位精度(half precision) |
- UNet、VAE和文本编码器的协作关系
- UNet:负责根据输入的噪声和文本条件生成图像。在Stable Diffusion模型中,UNet接收由VAE编码器产生的噪声和文本编码器转换的文本向量作为输入,并预测去噪后的噪声,从而生成与文本描述相符的图像
- VAE:生成模型,用于将输入数据映射到潜在空间,并从中采样以生成新图像。在Stable Diffusion中,VAE编码器首先生成带有噪声的潜在表示,这些表示随后与文本条件一起输入到UNet中
- 文本编码器:将文本输入转换为模型可以理解的向量表示。在Stable Diffusion模型中,文本编码器使用CLIP模型将文本提示转换为向量,这些向量与VAE生成的噪声一起输入到UNet中,指导图像的生成过程
- 数据集来源整理
以下渠道来源均需要考虑合规性问题,请大家在使用数据集过程中谨慎选择。
来源类型 | 推荐 |
公开的数据平台 | 魔搭社区内开放了近3000个数据集,涉及文本、图像、音频、视频和多模态等多种场景,左侧有标签栏帮助快速导览,大家可以看看有没有自己需要的数据集。 其他数据平台推荐:
|
使用API或爬虫获取 |
|
数据合成 | 利用现有的图形引擎(如Unity、Unreal Engine)或特定软件生成合成数据,这在训练某些类型的模型时非常有用。 最近Datawhale联合阿里云天池,做了一整套多模态大模型数据合成的学习,欢迎大家一起交流。从零入门多模态大模型数据合成 |
数据增强 | 对于较小的数据集,可以通过旋转、翻转、缩放、颜色变换等方式进行数据增强。 |
购买或定制 | 如果你的应用是特定领域的,比如医学影像、卫星图像等,建议从靠谱的渠道购买一些数据集。 |
坚持坚持,fighting!!!!