本文主要介绍使用virtualenv库生成venv,进而部署/运行Hunyuan-DiT大模型的方法,同时分享一些经验。
1. 在线体验
本文代码已部署到百度飞桨AI Studio平台,以供大家在线体验Hunyuan-DiT文生图大模型。
项目链接:腾讯混元DiT(Hunyuan-DiT) 在线体验
2. 虚拟环境部署方法
原始Github链接:https://github.com/Tencent/HunyuanDiT
2.1 conda环境部署
官方提供的方法为conda生成虚拟环境,代码如下:
git clone https://github.com/tencent/HunyuanDiT
cd HunyuanDiT
# 1. Prepare conda environment
conda env create -f environment.yml
# 2. Activate the environment
conda activate HunyuanDiT
# 3. Install pip dependencies
python -m pip install -r requirements.txt
# 4. (Optional) Install flash attention v2 for acceleration (requires CUDA 11.6 or above)
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.2.post3
environment.yml文件内容如下,可以看到官方使用python 3.8.12版本部署。
name: HunyuanDiT
channels:
- pytorch
- nvidia
dependencies:
- python=3.8.12
- pytorch=1.13.1
- pip
2.2 virtualenv环境部署
在百度AI Studio平台上,无法使用上述代码生成conda虚拟环境,故本文探索了使用virtualenv库生成虚拟环境并运行的方法。
实测发现python 3.10下是可以正常部署/运行的。
virtualenv环境部署代码如下:
git clone https://github.com/tencent/HunyuanDiT
cd HunyuanDiT
pip install -U virtualenv
python -m virtualenv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.1.2.post3
3. 模型下载
该模型较大,共45GB,需要使用 huggingface_hub[cli] 工具下载。
该项目其他python文件中模型路径默认为 HunyuanDiT/ckpts,为避免找不到模型的问题,一定将模型文件下载到 HunyuanDiT/ckpts 下。
模型下载代码:
python -m pip install "huggingface_hub[cli]"
cd ~/HunyuanDiT
mkdir ckpts
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Tencent-Hunyuan/HunyuanDiT --local-dir ./ckpts
模型清单(通过 cd ~/HunyuanDiT/ckpts && du -ah 获取):
2.0K ./.gitattributes
16K ./README.md
1.0K ./dialoggen/openai/clip-vit-large-patch14-336/tokenizer_config.json
513K ./dialoggen/openai/clip-vit-large-patch14-336/merges.txt
5.0K ./dialoggen/openai/clip-vit-large-patch14-336/config.json
512 ./dialoggen/openai/clip-vit-large-patch14-336/special_tokens_map.json
2.2M ./dialoggen/openai/clip-vit-large-patch14-336/tokenizer.json
1.6G ./dialoggen/openai/clip-vit-large-patch14-336/tf_model.h5
843K ./dialoggen/openai/clip-vit-large-patch14-336/vocab.json
512 ./dialoggen/openai/clip-vit-large-patch14-336/preprocessor_config.json
1.5K ./dialoggen/openai/clip-vit-large-patch14-336/README.md
1.6G ./dialoggen/openai/clip-vit-large-patch14-336/pytorch_model.bin
3.2G ./dialoggen/openai/clip-vit-large-patch14-336
3.2G ./dialoggen/openai
4.7G ./dialoggen/model-00001-of-00004.safetensors
4.7G ./dialoggen/model-00002-of-00004.safetensors
512 ./dialoggen/generation_config.json
251M ./dialoggen/model-00004-of-00004.safetensors
1.0K ./dialoggen/special_tokens_map.json
4.6G ./dialoggen/model-00003-of-00004.safetensors
72K ./dialoggen/model.safetensors.index.json
2.0K ./dialoggen/config.json
1.5K ./dialoggen/tokenizer_config.json
482K ./dialoggen/tokenizer.model
18G ./dialoggen
22K ./Notice
291K ./asset/mllm.png
500K ./asset/radar.png
5.0M ./asset/long text understanding.png
356K ./asset/framework.png
72K ./asset/logo.png
512 ./asset/chinese elements understanding.png
123K ./asset/cover.png
6.3M ./asset
2.9G ./t2i/model/pytorch_model_module.pt
5.7G ./t2i/model/pytorch_model_ema.pt
8.5G ./t2i/model
512 ./t2i/tokenizer/special_tokens_map.json
1.0K ./t2i/tokenizer/tokenizer_config.json
310K ./t2i/tokenizer/vocab.txt
107K ./t2i/tokenizer/vocab_org.txt
422K ./t2i/tokenizer
3.7G ./t2i/clip_text_encoder/pytorch_model.bin
1.0K ./t2i/clip_text_encoder/config.json
3.7G ./t2i/clip_text_encoder
1.0K ./t2i/sdxl-vae-fp16-fix/config.json
320M ./t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.bin
320M ./t2i/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors
639M ./t2i/sdxl-vae-fp16-fix
14G ./t2i/mt5/pytorch_model.bin
512 ./t2i/mt5/tokenizer_config.json
512 ./t2i/mt5/generation_config.json
4.2M ./t2i/mt5/spiece.model
512 ./t2i/mt5/special_tokens_map.json
3.0K ./t2i/mt5/README.md
1.0K ./t2i/mt5/config.json
14G ./t2i/mt5
27G ./t2i
15K ./LICENSE.txt
45G .
4. 运行
4.1 激活虚拟环境
激活conda环境:
cd HunyuanDiT
conda activate HunyuanDiT
激活venv环境:
cd HunyuanDiT
source venv/bin/activate
4.2 运行模型
官方项目内提供两类运行方式,Gradio交互式界面或命令行模式。
Gradio交互式界面运行代码:
# By default, we start a Chinese UI.
python app/hydit_app.py
# Using Flash Attention for acceleration.
python app/hydit_app.py --infer-mode fa
# You can disable the enhancement model if the GPU memory is insufficient.
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
python app/hydit_app.py --no-enhance
# Start with English UI
python app/hydit_app.py --lang en
# Start a multi-turn T2I generation UI.
# If your GPU memory is less than 32GB, use '--load-4bit' to enable 4-bit quantization, which requires at least 22GB of memory.
python app/multiTurnT2I_app.py
命令行模式运行代码:
# Prompt Enhancement + Text-to-Image. Torch mode
python sample_t2i.py --prompt "渔舟唱晚"
# Only Text-to-Image. Torch mode
python sample_t2i.py --prompt "渔舟唱晚" --no-enhance
# Only Text-to-Image. Flash Attention mode
python sample_t2i.py --infer-mode fa --prompt "渔舟唱晚"
# Generate an image with other image sizes.
python sample_t2i.py --prompt "渔舟唱晚" --image-size 1280 768
# Prompt Enhancement + Text-to-Image. DialogGen loads with 4-bit quantization, but it may loss performance.
python sample_t2i.py --prompt "渔舟唱晚" --load-4bit
5. 一些经验
1. 该模型内存占用大于16GB,主要是加载mt5模型的时候,内存不足会自动终止。
2. 该模型显存占用(V100 32G显卡)大于24GB,峰值接近30GB,可能与V100显卡有关。
3. 下载的模型中 HunyuanDiT/ckpts/dialoggen/config.json 文件需要手工修改第50行的路径,否则启动过程中会因为找不到openai模型而重新下载。
此处第50行改为"mm_vision_tower": "/home/aistudio/HunyuanDiT/ckpts/dialoggen/openai/clip-vit-large-patch14-336"
/home/aistudio 字段应为实际运行系统的用户目录。
修改后的 config.json文件内容:
{
"_name_or_path": "./",
"architectures": [
"LlavaMistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"freeze_mm_mlp_adapter": false,
"freeze_mm_vision_resampler": false,
"hidden_act": "silu",
"hidden_size": 4096,
"image_aspect_ratio": "anyres",
"image_crop_resolution": 224,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_split_resolution": 224,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"mm_hidden_size": 1024,
"mm_patch_merge_type": "spatial_unpad",
"mm_projector_lr": null,
"mm_projector_type": "mlp2x_gelu",
"mm_resampler_type": null,
"mm_use_im_patch_token": false,
"mm_use_im_start_end": false,
"mm_vision_select_feature": "patch",
"mm_vision_select_layer": -2,
"mm_vision_tower": "/home/aistudio/HunyuanDiT/ckpts/dialoggen/openai/clip-vit-large-patch14-336",
"mm_vision_tower_lr": 2e-06,
"model_type": "llava_mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"tokenizer_model_max_length": 4096,
"tokenizer_padding_side": "left",
"torch_dtype": "float16",
"transformers_version": "4.37.2",
"tune_mm_mlp_adapter": false,
"tune_mm_vision_resampler": false,
"unfreeze_mm_vision_tower": true,
"use_cache": true,
"use_mm_proj": true,
"vocab_size": 32000
}
4. 加载mt5模型时的两个warning解决办法(此处不修改也行,毕竟只是warning,不过感觉修改后加载mt5模型会稍微快一点)。
修改 HunyuanDiT/hydit/modules/text_encoder.py 文件第28行为:
self.tokenizer = AutoTokenizer.from_pretrained(model_dir, legacy=False, use_fast=False)
即添加 legacy=False, use_fast=False 参数。
修改后的 text_encoder.py 文件内容:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, T5EncoderModel, T5ForConditionalGeneration
class MT5Embedder(nn.Module):
available_models = ["t5-v1_1-xxl"]
def __init__(
self,
model_dir="t5-v1_1-xxl",
model_kwargs=None,
torch_dtype=None,
use_tokenizer_only=False,
conditional_generation=False,
max_length=128,
):
super().__init__()
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.torch_dtype = torch_dtype or torch.bfloat16
self.max_length = max_length
if model_kwargs is None:
model_kwargs = {
# "low_cpu_mem_usage": True,
"torch_dtype": self.torch_dtype,
}
model_kwargs["device_map"] = {"shared": self.device, "encoder": self.device}
self.tokenizer = AutoTokenizer.from_pretrained(model_dir, legacy=False, use_fast=False)
if use_tokenizer_only:
return
if conditional_generation:
self.model = None
self.generation_model = T5ForConditionalGeneration.from_pretrained(
model_dir
)
return
self.model = T5EncoderModel.from_pretrained(model_dir, **model_kwargs).eval().to(self.torch_dtype)
def get_tokens_and_mask(self, texts):
text_tokens_and_mask = self.tokenizer(
texts,
max_length=self.max_length,
padding="max_length",
truncation=True,
return_attention_mask=True,
add_special_tokens=True,
return_tensors="pt",
)
tokens = text_tokens_and_mask["input_ids"][0]
mask = text_tokens_and_mask["attention_mask"][0]
# tokens = torch.tensor(tokens).clone().detach()
# mask = torch.tensor(mask, dtype=torch.bool).clone().detach()
return tokens, mask
def get_text_embeddings(self, texts, attention_mask=True, layer_index=-1):
text_tokens_and_mask = self.tokenizer(
texts,
max_length=self.max_length,
padding="max_length",
truncation=True,
return_attention_mask=True,
add_special_tokens=True,
return_tensors="pt",
)
with torch.no_grad():
outputs = self.model(
input_ids=text_tokens_and_mask["input_ids"].to(self.device),
attention_mask=text_tokens_and_mask["attention_mask"].to(self.device)
if attention_mask
else None,
output_hidden_states=True,
)
text_encoder_embs = outputs["hidden_states"][layer_index].detach()
return text_encoder_embs, text_tokens_and_mask["attention_mask"].to(self.device)
@torch.no_grad()
def __call__(self, tokens, attention_mask, layer_index=-1):
with torch.cuda.amp.autocast():
outputs = self.model(
input_ids=tokens,
attention_mask=attention_mask,
output_hidden_states=True,
)
z = outputs.hidden_states[layer_index].detach()
return z
def general(self, text: str):
# input_ids = input_ids = torch.tensor([list(text.encode("utf-8"))]) + num_special_tokens
input_ids = self.tokenizer(text, max_length=128).input_ids
print(input_ids)
outputs = self.generation_model(input_ids)
return outputs