Datawhale X 魔搭 AI夏令营task 2笔记

本文主要为Datawhale暑期夏令营第四期AIGC方向Task02学习笔记,官方学习连接:Datawhale夏令营第四期AIGC方向Task02

1 baseline.ipynb代码注解

1.1 基本库的安装

!pip install simple-aesthetics-predictor

!pip install -v -e data-juicer

!pip uninstall pytorch-lightning -y
!pip install peft lightning pandas torchvision

!pip install -e DiffSynth-Studio

!pip install对一些库进行安装,实现对环境的基本配置,以便后续程序运行

1.2 数据集及其处理

from modelscope.msdatasets import MsDataset

ds = MsDataset.load(
    'AI-ModelScope/lowres_anime',
    subset_name='default',
    split='train',
    cache_dir="/mnt/workspace/kolors/data"
)
import json, os
from data_juicer.utils.mm_utils import SpecialTokens
from tqdm import tqdm


os.makedirs("./data/lora_dataset/train", exist_ok=True)
os.makedirs("./data/data-juicer/input", exist_ok=True)
with open("./data/data-juicer/input/metadata.jsonl", "w") as f:
    for data_id, data in enumerate(tqdm(ds)):
        image = data["image"].convert("RGB")
        image.save(f"/mnt/workspace/kolors/data/lora_dataset/train/{data_id}.jpg")
        metadata = {"text": "二次元", "image": [f"/mnt/workspace/kolors/data/lora_dataset/train/{data_id}.jpg"]}
        f.write(json.dumps(metadata))
        f.write("\n")

此两段代码下载并保存数据集中的图片及元数据

1.3 使用 data-juicer 处理数据

data_juicer_config = """
# global parameters
project_name: 'data-process'
dataset_path: './data/data-juicer/input/metadata.jsonl'  # path to your dataset directory or file
np: 4  # number of subprocess to process your dataset

text_keys: 'text'
image_key: 'image'
image_special_token: '<__dj__image>'

export_path: './data/data-juicer/output/result.jsonl'

# process schedule
# a list of several process operators with their arguments
process:
    - image_shape_filter:
        min_width: 1024
        min_height: 1024
        any_or_all: any
    - image_aspect_ratio_filter:
        min_ratio: 0.5
        max_ratio: 2.0
        any_or_all: any
"""
with open("data/data-juicer/data_juicer_config.yaml", "w") as file:
    file.write(data_juicer_config.strip())

!dj-process --config data/data-juicer/data_juicer_config.yaml

保存处理好的数据

import pandas as pd
import os, json
from PIL import Image
from tqdm import tqdm


texts, file_names = [], []
os.makedirs("./data/lora_dataset_processed/train", exist_ok=True)
with open("./data/data-juicer/output/result.jsonl", "r") as file:
    for data_id, data in enumerate(tqdm(file.readlines())):
        data = json.loads(data)
        text = data["text"]
        texts.append(text)
        image = Image.open(data["image"][0])
        image_path = f"./data/lora_dataset_processed/train/{data_id}.jpg"
        image.save(image_path)
        file_names.append(f"{data_id}.jpg")
data_frame = pd.DataFrame()
data_frame["file_name"] = file_names
data_frame["text"] = texts
data_frame.to_csv("./data/lora_dataset_processed/train/metadata.csv", index=False, encoding="utf-8-sig")
data_frame

1.4 开始训练模型

from diffsynth import download_models

download_models(["Kolors", "SDXL-vae-fp16-fix"])

利用diffsynth库下载模型

from diffsynth import ModelManager, SDXLImagePipeline
from peft import LoraConfig, inject_adapter_in_model
import torch


def load_lora(model, lora_rank, lora_alpha, lora_path):
    lora_config = LoraConfig(
        r=lora_rank,
        lora_alpha=lora_alpha,
        init_lora_weights="gaussian",
        target_modules=["to_q", "to_k", "to_v", "to_out"],
    )
    model = inject_adapter_in_model(lora_config, model)
    state_dict = torch.load(lora_path, map_location="cpu")
    model.load_state_dict(state_dict, strict=False)
    return model


# Load models
model_manager = ModelManager(torch_dtype=torch.float16, device="cuda",
                             file_path_list=[
                                 "models/kolors/Kolors/text_encoder",
                                 "models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors",
                                 "models/kolors/Kolors/vae/diffusion_pytorch_model.safetensors"
                             ])
pipe = SDXLImagePipeline.from_model_manager(model_manager)

# Load LoRA
pipe.unet = load_lora(
    pipe.unet,
    lora_rank=16, # This parameter should be consistent with that in your training script.
    lora_alpha=2.0, # lora_alpha can control the weight of LoRA.
    lora_path="models/lightning_logs/version_0/checkpoints/epoch=0-step=500.ckpt"
)

加载模型

1.5 生成图片

torch.manual_seed(0)
image = pipe(
    prompt="二次元,一个灰色短发小女孩,在家中沙发上坐着,双手托着腮,很无聊,全身,蓝色连衣裙",
    negative_prompt="丑陋、变形、嘈杂、模糊、低对比度",
    cfg_scale=4,
    num_inference_steps=50, height=1024, width=1024,
)
image.save("1.jpg")

剩下几段代码与此格式类似,均为生成图片的代码,在**pipe()**中修改参数:

  • prompt:正向描述词,即你想要生成的图片应该包含的内容
  • negative_prompt:反向提示词,即你不希望生成的图片的内容
  • ffg_scale:控制生成图像与提示之间的匹配度,数值越高,生成的图像与提示之间的匹配度越高
  • num_inference_steps:推理步骤越多,生成的图像通常会越细致和清晰
  • height:生成图片的高度
  • width:生成图片的宽度

最后通过**image.save()**将生成的图片保存到本地。

import numpy as np
from PIL import Image


images = [np.array(Image.open(f"{i}.jpg")) for i in range(1, 9)]
image = np.concatenate([
    np.concatenate(images[0:2], axis=1),
    np.concatenate(images[2:4], axis=1),
    np.concatenate(images[4:6], axis=1),
    np.concatenate(images[6:8], axis=1),
], axis=0)
image = Image.fromarray(image).resize((1024, 2048))
image

最后用过numpy库的**concatenate()**将图片拼接在一起,至此我们想要生成的一幅8张图片的连环画小故事就算完成了,就可以提交作品了
此图为微调后的结果,仅供参考

2 知识点补充

2.1 Data-Juicer

数据处理和转换工具,旨在简化数据的提取、转换和加载过程。该系统强大之处在于它针对多模态数据的处理,覆盖了文本、图像、音频甚至视频,为当今和未来多模态模型的发展提供了强有力的支持。

2.2 DiffSynth-Studio

高效微调训练大模型工具,它通过先进的机器学习技术,为用户提供了一种全新的创作方式,使得风格转换变得更加高效和直观。该工具的目标用户群体广泛,包括但不限于艺术家、设计师、视频编辑者和AI爱好者。

3 Scepter Webui

3.1 官方定义

SCEPTER is an open-source code repository dedicated to generative training, fine-tuning, and inference, encompassing a suite of downstream tasks such as image generation, transfer, editing. SCEPTER integrates popular community-driven implementations as well as proprietary methods by Tongyi Lab of Alibaba Group, offering a comprehensive toolkit for researchers and practitioners in the field of AIGC. This versatile library is designed to facilitate innovation and accelerate development in the rapidly evolving domain of generative models.

3.2 魔塔体验及私有部署

魔塔体验:浅尝功能
Scepter官方github地址:GitHub链接
PS:如果在安装中遇到各种问题,尽管向通义等AI工具提问~~~~~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值