3.Datawhale AI夏令营 AIGC Task3：了解微调的基本原理，微调的各种参数，实现一个更好的效果，文生图的工作流平台工具ComfyUI

最新推荐文章于 2024-09-26 17:23:00 发布

神迹小卒

最新推荐文章于 2024-09-26 17:23:00 发布

阅读量874

点赞数 12

文章标签：人工智能 AIGC

本文链接：https://blog.csdn.net/aiqq136/article/details/141280380

版权

0817

1.回顾与展望

Task1：跑一遍代码，感受一下AIGC的过程

Task2：了解AIGC的代码架构，了解如何使用GPT来进行代码解释和背景知识了解

Task3：了解微调的基本原理，微调的各种参数，实现一个更好的效果，文生图的工作流平台工具ComfyUI

2.part1：ComfyUI应用场景探索

2.1 什么是ComfyUI

GUI 是 “Graphical User Interface”（图形用户界面）的缩写

简单来说，GUI 就是你在电脑屏幕上看到的那种有图标、按钮和菜单的交互方式。

ComfyUI 是GUI的一种，用于操作图像的生成技术，将AIGC模块化，类似思维导图的流程图一样，控制图像生成。

2.2 ComfyUI核心模块

1.模型加载器Load Checkpoint：加载基础模型文件，也就是loading a pre-trained model state (checkpoint)，Model：模型文件

2.提示词管理器：CLIP(Contrastive Language–Image Pre-training 对比语言-形象预训练)：将文字转化为模型可以理解的隐空间嵌入，也就是high-dimensional space where both text and images are projected in order to learn a shared representation。关于什么是隐空间呢？说白了就是从高维转化为的保留有重要特征信息的低维信息，high-dimensional data is transformed into a lower-dimensional space that captures the essential aspects or features of the original data。

3.采样器：用于控制模型生成图像，不同的采样取值会影响最终输出图像的质量和多样性。采样器可以调节生成过程的速度和质量之间的平衡。Stable Diffusion的基本原理是通过降噪的方式（如完全的噪声图像），将一个原本的噪声信号变为无噪声的信号（如人可以理解的图像）。The basic principle of Stable Diffusion is to change an original noisy signal into a noise-free signal (such as an image that can be understood by people) through the most efficient way of noise reduction (such as a completely noisy image).模型是如何训练的呢？During training, the model learns the most efficient way to reverse the diffusion process. It essentially learns how an image gradually becomes noisier and then uses this knowledge to do the opposite: start with noise and reduce it step by step until a coherent image emerges.其中的降噪过程涉及到多次的采样。采样的系数在KSampler中配置：

seed：控制噪声产生的随机种子
control_after_generate：控制seed在每次生成后的变化
steps：降噪的迭代步数，越多则信号越精准，相对的生成时间也越长
cfg：classifier free guidance决定了prompt对于最终生成图像的影响有多大。更高的值代表更多地展现prompt中的描述。
denoise: 多少内容会被噪声覆盖 sampler_name、scheduler：降噪参数。

4.解码器：VAE解码器Variational Autoencoder，变分自编码器：变分自编码器（Variational Autoencoder，VAE）是一种人工神经网络结构，用于机器学习中的概率图模式和变分贝叶斯方法：VAE模块的作用是将Latent space中的embedding解码为像素级别的图像，也就是The VAE module decodes the embedding in Latent space into pixel-level images。VAE由两个部分组成，Encoder: This part of the VAE takes input data (like images) and compresses it into a lower-dimensional representation called the latent space. The encoder learns to represent the data in this compact form by discovering the underlying patterns or features. Decoder: This component takes the encoded data from the latent space and reconstructs the original data. In the case of images, this means transforming the latent representations back into pixel-level images.

2.3 20分钟速通安装ComfyUI

请以python注释的形式，逐行解释以下代码

Please explain the following code line by line in the form of python comments.

1下载脚本代码文件

下载安装ComfyUI的执行文件和task1中微调完成Lora文件

# Installs Git Large File Storage (LFS). This is useful for handling large files (like models or datasets) that are too big for standard git operations.
git lfs install

# Clones a Git repository from the provided URL. This repository likely contains large files tracked under Git LFS, like datasets or model files.从提供的URL克隆一个Git存储库。该存储库可能包含在Git LFS下跟踪的大型文件，如数据集或模型文件。
git clone https://www.modelscope.cn/datasets/maochase/kolors_test_comfyui.git

# Moves all the contents from the cloned repository directory 'kolors_test_comfyui' into the current directory.
mv kolors_test_comfyui/* ./

# Removes the now-empty directory 'kolors_test_comfyui' after its contents have been moved out.
rm -rf kolors_test_comfyui/

# Creates a new directory structure '/mnt/workspace/models/lightning_logs/version_0/checkpoints/'. This is likely a path where model checkpoints will be stored.
mkdir -p /mnt/workspace/models/lightning_logs/version_0/checkpoints/

# Moves a specific model checkpoint file 'epoch=0-step=500.ckpt' to the newly created directory. This checkpoint file is probably a saved state of a machine learning model at a certain training epoch and step.
mv epoch=0-step=500.ckpt /mnt/workspace/models/lightning_logs/version_0/checkpoints/

2.进入ComfyUI的安装文件

3.一键执行安装程序（大约10min）

4.下载并安装 ComfyUI

# Import the Path class from the pathlib module for handling filesystem paths.
from pathlib import Path

# Initialize an empty dictionary to store options for different installations.
OPTIONS = {}

# Define boolean variables to control the installation and updating of components.
UPDATE_COMFY_UI = True  # Whether to update ComfyUI.
INSTALL_COMFYUI_MANAGER = True  # Whether to install the ComfyUI Manager.
INSTALL_KOLORS = True  # Whether to install the KOLORS module.
INSTALL_CUSTOM_NODES_DEPENDENCIES = True  # Whether to install dependencies for custom nodes.

# Populate the OPTIONS dictionary with the above boolean values.
OPTIONS['UPDATE_COMFY_UI'] = UPDATE_COMFY_UI
OPTIONS['INSTALL_COMFYUI_MANAGER'] = INSTALL_COMFYUI_MANAGER
OPTIONS['INSTALL_KOLORS'] = INSTALL_KOLORS
OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES'] = INSTALL_CUSTOM_NODES_DEPENDENCIES

# Get the current working directory and store it.
current_dir = !pwd
# Define the workspace path by appending '/ComfyUI' to the current directory.
WORKSPACE = f"{current_dir[0]}/ComfyUI"

# Change directory to '/mnt/workspace/'.
%cd /mnt/workspace/

# Check if the WORKSPACE directory does not exist, then clone the ComfyUI repository from GitHub.
![ ! -d $WORKSPACE ] && echo -= Initial setup ComfyUI =- && git clone https://github.com/comfyanonymous/ComfyUI
# Change directory to the defined WORKSPACE.
%cd $WORKSPACE

# If the option to update ComfyUI is true, pull the latest changes from the repository.
if OPTIONS['UPDATE_COMFY_UI']:
  !echo "-= Updating ComfyUI =-"
  !git pull

# If the option to install ComfyUI Manager is true, change to the custom_nodes directory, check if ComfyUI-Manager directory does not exist, then clone and pull the latest changes.
if OPTIONS['INSTALL_COMFYUI_MANAGER']:
  %cd custom_nodes
  ![ ! -d ComfyUI-Manager ] && echo -= Initial setup ComfyUI-Manager =- && git clone https://github.com/ltdrdata/ComfyUI-Manager
  %cd ComfyUI-Manager
  !git pull

# If the option to install KOLORS is true, navigate to the relevant directory, check if the directory does not exist, then clone and pull the latest changes.
if OPTIONS['INSTALL_KOLORS']:
  %cd ../
  ![ ! -d ComfyUI-KwaiKolorsWrapper ] && echo -= Initial setup KOLORS =- && git clone https://github.com/kijai/ComfyUI-KwaiKolorsWrapper.git
  %cd ComfyUI-KwaiKolorsWrapper
  !git pull

# Navigate back to the WORKSPACE directory.
%cd $WORKSPACE

# If the option to install custom nodes dependencies is true, navigate to the dependencies script and execute it.
if OPTIONS['INSTALL_CUSTOM_NODES_DEPENDENCIES']:
  !pwd
  !echo "-= Install custom nodes dependencies =-"
  ![ -f "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py" ] && python "custom_nodes/ComfyUI-Manager/scripts/colab-dependencies.py"

# Download the cloudflared package from Aliyun OSS and install it using dpkg.
!wget "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/cloudflared-linux-amd64.deb"
!dpkg -i cloudflared-linux-amd64.deb

5.下载模型

# Initialize an empty dictionary named OPTIONS, possibly used to store configuration options.
OPTIONS = {}

# The following lines are using the wget command with the '-c' option, which enables continuation of partially downloaded files.

# Download the U-Net model in the safe-tensors format optimized for PyTorch and save it in a specific directory.
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/diffusion_pytorch_model.fp16.safetensors" -P ./models/diffusers/Kolors/unet/

# Download the configuration file for the U-Net model and save it in the same directory as the model.
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/unet/config.json" -P ./models/diffusers/Kolors/unet/

# Use the 'modelscope download' command to download the text encoder model named 'chatglm3-6b-base' from ZhipuAI and save it in a specific directory.
!modelscope download --model=ZhipuAI/chatglm3-6b-base --local_dir ./models/diffusers/Kolors/text_encoder/

# Download the VAE (Variational Autoencoder) model in the safe-tensors format and save it in the designated directory.
!wget -c "https://modelscope.cn/models/AI-ModelScope/sdxl-vae-fp16-fix/resolve/master/sdxl.vae.safetensors" -P ./models/vae/

# Download the configuration file for the scheduler and save it in the specified directory.
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/scheduler/scheduler_config.json" -P ./models/diffusers/Kolors/scheduler/

# Download the model index JSON file which may contain metadata or references to various models in the system and save it in the designated directory.
!wget -c "https://modelscope.cn/models/Kwai-Kolors/Kolors/resolve/master/model_index.json" -P ./models/diffusers/Kolors/

6.安装 LoRA 节点 (Low-Rank Adaptation)

This script defines a Python class for applying LoRA to a model, then writes this class definition into an initialization file in a specified directory, ensuring that the directory structure exists beforehand. This setup is likely part of a larger framework or application that utilizes dynamically loaded Python modules for model manipulation. 该脚本定义了一个Python类，用于将LoRA应用于模型，然后将这个类定义写入指定目录中的初始化文件，确保事先存在该目录结构。这种设置可能是使用动态加载的Python模块进行模型操作的大型框架或应用程序的一部分。

请概括一下lora_node具体做了什么事情。Please summarize what lora node does specifically.

封装在lora_node字符串中的LoadKolorsLoRA类专门用于处理低秩适应(Low-Rank Adaptation, LoRA)技术与神经网络模型的集成和应用。

The LoadKolorsLoRA class encapsulated within the lora_node string is specifically designed to handle the integration and application of the Low-Rank Adaptation (LoRA) technique to a neural network model.

以下是对其功能的简要总结:
Here’s a concise summary of its functionalities:

1.初始化和配置:
Initialization and Configuration:

该类定义了一个方法INPUT_TYPES，概述了操作所需的输入:
The class defines a method INPUT_TYPES that outlines the required inputs for the operation:
kolors_model:这指示将应用LoRA的模型。
kolors_model: This indicates the model to which LoRA will be applied.
lora_path:保存的包含LoRA权重的状态字典的文件路径。
lora_path: The file path to a saved state dictionary that contains the LoRA weights.
lora_alpha:一个浮点值，决定LoRA权重的缩放。
lora_alpha: A floating-point value that determines the scaling of the LoRA weights.
设置诸如RETURN_TYPES、FUNCTION和CATEGORY之类的常量是为了对该模块进行分类，并便于在更大的系统或框架中引用该模块。
Constants such as RETURN_TYPES, FUNCTION, and CATEGORY are set to categorize and facilitate the referencing of this module within a larger system or framework.

2.状态字典转换:
State Dictionary Conversion:

convert_state_dict方法用于修改模型状态字典中的键。The method convert_state_dict is used to modify the keys in a model’s state dictionary. 此修改确保在应用LoRA之后，键与目标模型体系结构的预期格式匹配。
This modification ensures that the keys match the expected format for the target model architecture after applying LoRA.
该方法包括针对字典键的前缀和后缀的特定重命名规则，以适应LoRA带来的体系结构更改。
This method includes specific renaming rules for both prefixes and suffixes of the dictionary keys to accommodate the architectural changes brought by LoRA.

3.模型更新:
Model Updating:

load_lora方法通过配置特定的LoRA参数(如rank和alpha值)并针对特定的模型层，将LoRA集成到模型中。
The load_lora method integrates LoRA into the model by configuring specific LoRA parameters (like rank and alpha values) and targeting specific model layers.
该方法使用inject_adapter_in_model函数将LoRA适配插入到指定的层中，然后加载修改后的状态字典。
This method uses the inject_adapter_in_model function to insert the LoRA adaptation into the designated layers and then loads the modified state dictionary.

4.应用方面很有特色
Applying LoRA:

add_lora方法编排了将LoRA应用于模型的过程:
The add_lora method orchestrates the process of applying LoRA to the model:
它从文件中加载LoRA参数。
It loads the LoRA parameters from a file.
转换状态字典以匹配目标模型结构。
Converts the state dictionary to match the targeted model structure.
使用处理过的状态字典应用LoRA适配。
Applies the LoRA adaptation using the processed state dictionary.
返回集成了LoRA适应性的更新模型。
Returns the updated model with LoRA adaptations integrated.

5.部署和可访问性:
Deployment and Accessibility:

这个类及其方法被封装为一个节点，可以在自定义建模环境中动态加载和执行，如类名和显示名的字典映射所示。
This class and its methods are encapsulated as a node that can be dynamically loaded and executed within a custom modeling environment, as indicated by the dictionary mappings for class and display names.

总的来说，LoadKolorsLoRA类作为一个专用组件，用于使用LoRA自适应增强神经网络模型，其目的是通过在最小的额外计算开销下实现更有效的大型模型微调来提高模型性能。Overall, the LoadKolorsLoRA class serves as a specialized component for enhancing a neural network model with LoRA adaptations, which are designed to improve model performance by enabling more efficient fine-tuning of large models with minimal extra computational overhead.

这种设置可能是更大的机器学习框架的一部分，该框架允许对预训练模型进行模块化调整和增强。 This setup is likely part of a larger machine learning framework that allows for modular adjustments and enhancements to pre-trained models.

# Define a multiline string containing Python code. This string includes a class definition and other configurations.
lora_node = """
import torch  # Import the PyTorch library for tensor computations and neural network operations.
from peft import LoraConfig, inject_adapter_in_model  # Import specific functions for LoRA configuration and model manipulation from the 'peft' package.

# Define a class responsible for loading and applying LoRA configurations to a Kolors model.
class LoadKolorsLoRA:
    @classmethod
    def INPUT_TYPES(s):  # Define a class method to specify the input types for the class methods.
        return {
            "required": {
                "kolors_model": ("KOLORSMODEL", ),
                "lora_path": ("STRING", {"multiline": False, "default": "",}),
                "lora_alpha": ("FLOAT", {"default": 2.0, "min": 0.0, "max": 4.0, "step": 0.01}),
            },
        }

    RETURN_TYPES = ("KOLORSMODEL",)  # Define the types of values returned by the class methods.
    RETURN_NAMES = ("kolors_model",)  # Define the names of the return values.
    FUNCTION = "add_lora"  # Specify the function name that will be associated with this class in the system.
    CATEGORY = "KwaiKolorsWrapper"  # Define the category under which this class is registered.

    # Define a method to adjust the naming within a model's state dictionary to match expected names.
    def convert_state_dict(self, state_dict):
        # Define dictionaries for renaming prefixes and suffixes within state dictionary keys.
        prefix_rename_dict = { ... }
        suffix_rename_dict = { ... }
        state_dict_ = {}
        for name, param in state_dict.items():  # Iterate through each parameter in the original state dictionary.
            for prefix in prefix_rename_dict:  # Check and replace prefixes.
                if name.startswith(prefix):
                    name = name.replace(prefix, prefix_rename_dict[prefix])
            for suffix in suffix_rename_dict:  # Check and replace suffixes.
                if name.endswith(suffix):
                    name = name.replace(suffix, suffix_rename_dict[suffix])
            state_dict_[name] = param  # Store the renamed parameter in the new state dictionary.
        lora_rank = state_dict_["..."].shape[0]  # Extract LoRA rank from the updated state dictionary.
        return state_dict_, lora_rank  # Return the updated state dictionary and LoRA rank.

    # Define a method to load the LoRA model given its configuration and state dictionary.
    def load_lora(self, model, lora_rank, lora_alpha, state_dict):
        # Configure LoRA settings.
        lora_config = LoraConfig( ... )
        model = inject_adapter_in_model(lora_config, model)  # Inject the LoRA adapter into the model.
        model.load_state_dict(state_dict, strict=False)  # Load the updated state dictionary into the model.
        return model  # Return the updated model.

    # Define a method to add LoRA to a given model using specified configurations.
    def add_lora(self, kolors_model, lora_path, lora_alpha):
        state_dict = torch.load(lora_path, map_location="cpu")  # Load the model's state dictionary.
        state_dict, lora_rank = self.convert_state_dict(state_dict)  # Convert the state dictionary and get LoRA rank.
        kolors_model["pipeline"].unet = self.load_lora(kolors_model["pipeline"].unet, lora_rank, lora_alpha, state_dict)  # Apply LoRA to the model.
        return (kolors_model,)  # Return the updated model.

# Define mappings to associate the class with a specific node name.
NODE_CLASS_MAPPINGS = {
    "LoadKolorsLoRA": LoadKolorsLoRA,
}
NODE_DISPLAY_NAME_MAPPINGS = {
    "LoadKolorsLoRA": "Load Kolors LoRA",
}
__all__ = ["NODE_CLASS_MAPPINGS", "NODE_DISPLAY_NAME_MAPPINGS"]
""".strip()  # Strip any leading or trailing whitespace from the multiline string.

# Import the os module for operating system dependent functionality.
import os

# Create a directory if it does not exist, using the path specified.
os.makedirs("/mnt/workspace/ComfyUI/custom_nodes/ComfyUI-LoRA", exist_ok=True)

# Open a file for writing in the specified directory and write the content of 'lora_node' to it.
with open("/mnt/workspace/ComfyUI/custom_nodes/ComfyUI-LoRA/__init__.py", "w", encoding="utf-8") as f:
    f.write(lora_node)

7.启动 ComfyUI

启动后，通过代码输出的链接查看 UI 页面
点击右侧“Load”，加载“kolors_example.json”（不带 LoRA）或者 “kolors_with_lora_example.json”（带 LoRA）
加载 LoRA 时，请在“lora_path”处填入 LoRA 模型的路径，例如 /mnt/workspace/models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt

# Change the current working directory to '/mnt/workspace/ComfyUI'.
%cd /mnt/workspace/ComfyUI

# Import necessary modules for running subprocesses, managing threads, handling time, working with sockets, and sending requests over HTTP.
import subprocess
import threading
import time
import socket
import urllib.request

# Define a function to check when a specific port is available and then launch a Cloudflare tunnel to expose that port.
def iframe_thread(port):
    while True:  # Start an infinite loop.
        time.sleep(0.5)  # Pause the loop for 0.5 seconds to avoid high CPU usage.
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  # Create a new socket using IPv4 and TCP.
        result = sock.connect_ex(('127.0.0.1', port))  # Attempt to connect to the localhost on the specified port.
        if result == 0:  # If the connection is successful, break the loop.
            break
        sock.close()  # Close the socket if the port is not yet open.
    # Announce that the UI has finished loading and Cloudflare tunnel is starting.
    print("\nComfyUI finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n")

    # Start a subprocess to run the Cloudflare tunnel command that exposes the local server on the given port.
    p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    for line in p.stderr:  # Iterate over each line of the standard error output from the subprocess.
        l = line.decode()  # Decode the line from bytes to string.
        if "trycloudflare.com " in l:  # Check if the expected Cloudflare URL is in the decoded string.
            # Print the URL that can be used to access the local server through Cloudflare.
            print("This is the URL to access ComfyUI:", l[l.find("http"):], end='')
        # The next line is commented out. It would print every line of output, useful for debugging but noisy for regular use.
        # print(l, end='')

# Start a new thread to run the iframe_thread function with the specified port 8188 and mark it as a daemon.
threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start()

# Execute the Python script 'main.py' with a command-line argument to potentially suppress some output.
!python main.py --dont-print-server

8…进入预览界面

2.4ComfyUI工作流

1.不带Lora的工作流样例

2.带Lora的工作流样例

3.Part2：Lora微调

3.1 Lora简介

LoRA (Low-Rank Adaptation) 微调是一种用于在预训练模型上进行高效微调的技术。它可以通过高效且灵活的方式实现模型的个性化调整，使其能够适应特定的任务或领域，同时保持良好的泛化能力和较低的资源消耗。这对于推动大规模预训练模型的实际应用至关重要。

3.2 Lora微调的原理

LoRA通过在预训练模型的关键层中添加低秩矩阵来实现。这些低秩矩阵通常被设计成具有较低维度的参数空间，这样它们就可以在不改变模型整体结构的情况下进行微调。在训练过程中，只有这些新增的低秩矩阵被更新，而原始模型的大部分权重保持不变。

3.3 Lora微调的优势

快速适应新任务
在特定领域有少量标注数据的情况下，也可以有效地对模型进行个性化调整，可以迅速适应新的领域或特定任务。
保持泛化能力
LoRA通过微调模型的一部分，有助于保持模型在未见过的数据上的泛化能力，同时还能学习到特定任务的知识。
资源效率
LoRA旨在通过仅微调模型的部分权重，而不是整个模型，从而减少所需的计算资源和存储空间。

3.4 Lora详解

import os
cmd = """
python DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py \ # 选择使用可图的Lora训练脚本DiffSynth-Studio/examples/train/kolors/train_kolors_lora.py
  --pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \ # 选择unet模型,负责根据输入的噪声和文本条件生成图像。在Stable Diffusion模型中，UNet接收由VAE编码器产生的噪声和文本编码器转换的文本向量作为输入，并预测去噪后的噪声，从而生成与文本描述相符的图像
  --pretrained_text_encoder_path models/kolors/Kolors/text_encoder \ # 选择text_encoder,将文本输入转换为模型可以理解的向量表示。在Stable Diffusion模型中，文本编码器使用CLIP模型将文本提示转换为向量，这些向量与VAE生成的噪声一起输入到UNet中，指导图像的生成过程
  --pretrained_fp16_vae_path models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \ # 选择vae模型,生成模型，用于将输入数据映射到潜在空间，并从中采样以生成新图像。在Stable Diffusion中，VAE编码器首先生成带有噪声的潜在表示，这些表示随后与文本条件一起输入到UNet中
  --lora_rank 16 \ # lora_rank 16 表示在权衡模型表达能力和训练效率时，选择了使用 16 作为秩，适合在不显著降低模型性能的前提下，通过 LoRA 减少计算和内存的需求
  --lora_alpha 4.0 \ # 设置 LoRA 的 alpha 值，影响调整的强度
  --dataset_path data/lora_dataset_processed \ # 指定数据集路径，用于训练模型
  --output_path ./models \ # 指定输出路径，用于保存模型
  --max_epochs 1 \ # 设置最大训练轮数为 1
  --center_crop \ # 启用中心裁剪，这通常用于图像预处理
  --use_gradient_checkpointing \ # 启用梯度检查点技术，以节省内存
  --precision "16-mixed" # 指定训练时的精度为混合 16 位精度（half precision），这可以加速训练并减少显存使用
""".strip()
os.system(cmd) # 执行可图Lora训练

参数名称	参数值	说明
`pretrained_unet_path`	models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors	指定预训练UNet模型的路径
`pretrained_text_encoder_path`	models/kolors/Kolors/text_encoder	指定预训练文本编码器的路径
`pretrained_fp16_vae_path`	models/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors	指定预训练VAE模型的路径
`lora_rank`	16	设置LoRA的秩（rank），影响模型的复杂度和性能
`lora_alpha`	4	设置LoRA的alpha值，控制微调的强度
`dataset_path`	data/lora_dataset_processed	指定用于训练的数据集路径
`output_path`	./models	指定训练完成后保存模型的路径
`max_epochs`	1	设置最大训练轮数为1
`center_crop`		启用中心裁剪，用于图像预处理
`use_gradient_checkpointing`		启用梯度检查点，节省显存
`precision`	“16-mixed”	设置训练时的精度为混合16位精度（half precision）

4.Part3 如何准备一个高质量的数据集

4.1 明确你的需求和目标

应用场景：艺术风格转换、产品图像生成、医疗影像合成等

数据类型：你需要什么样的图片？比如是真实世界的照片还是合成图像？是黑白的还是彩色的？是高分辨率还是低分辨率？

数据量：考虑你的任务应该需要多少图片来支持训练和验证。

4.2 数据集来源整理

来源类型	推荐
公开的数据平台	魔搭社区内开放了近3000个数据集，涉及文本、图像、音频、视频和多模态等多种场景，左侧有标签栏帮助快速导览，大家可以看看有没有自己需要的数据集。https://www.modelscope.cn/datasets?Tags=object-tracking&dataType=video&page=1 ；其他数据平台推荐：ImageNet：包含数百万张图片，广泛用于分类任务，也可以用于生成任务。Open Images：由Google维护，包含数千万张带有标签的图片。Flickr：特别是Flickr30kK和Flickr8K数据集，常用于图像描述任务。CelebA：专注于人脸图像的数据集。LSUN (Large-scale Scene Understanding)：包含各种场景类别的大规模数据集。
使用API或爬虫获取	如果需要特定类型的内容，可以利用API从图库网站抓取图片，如Unsplash、Pexels等。使用网络爬虫技术从互联网上抓取图片，但需要注意版权问题。
数据合成	利用现有的图形引擎（如Unity、Unreal Engine）或特定软件生成合成数据，这在训练某些类型的模型时非常有用。最近Datawhale联合阿里云天池，做了一整套多模态大模型数据合成的学习，欢迎大家一起交流。从零入门多模态大模型数据合成
数据增强	对于较小的数据集，可以通过旋转、翻转、缩放、颜色变换等方式进行数据增强。
购买或定制	如果你的应用是特定领域的，比如医学影像、卫星图像等，建议从靠谱的渠道购买一些数据集。