小白上手 HunyuanCustom 指南

原创已于 2025-05-09 16:45:39 修改 · 604 阅读

17 ·

CC 4.0 BY-SA版权

文章标签：

#AI工具 #HunyuanCustom

于 2025-05-09 15:45:33 首次发布

AI 同时被 2 个专栏收录

22 篇文章

订阅专栏

AI工具

17 篇文章

订阅专栏

Hello 大家好，我是日码未来，夜探AI的开发者鹿先森，立志成为AI领域先行者。如果你对AI感兴趣可以点击下方链接关注我，免费领取最新AI前沿资料，一起探索AI !!!

AI时代还在苦熬剪辑？腾讯开源HunyuanCustom，效率狂飙10倍，彻底颠覆视频创作！（附新手教程）

1. HunyuanCustom 简介

HunyuanCustom 是腾讯推出的多模态定制视频生成框架，能够保持主体一致性并支持图像、音频、视频和文本条件输入。项目主要特点包括：

支持从单张或多张图像生成包含特定主体的视频
支持使用音频驱动生成视频中的说话人物
支持使用视频作为输入，替换视频中的特定对象 README.md:77-84

2. 硬件和软件要求

硬件要求：

最低配置： 需要至少24GB显存的NVIDIA GPU（性能较慢）

推荐配置： 80GB显存的GPU以获得更好的生成质量

软件要求：

操作系统：Linux（已测试） CUDA支持：推荐CUDA 12.4或11.8版本

3. 安装步骤

第一步：克隆代码库

git clone https://github.com/Tencent/HunyuanCustom.git
cd HunyuanCustom

第二步：创建并激活Conda环境

1. 创建conda环境

conda create -n HunyuanCustom python==3.10.9

2. 激活环境

conda activate HunyuanCustom

第三步：安装PyTorch和其他依赖

# 对于CUDA 11.8  
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia  
# 或对于CUDA 12.4  
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia  
  
# 安装pip依赖  
python -m pip install -r requirements.txt  
python -m pip install tensorrt-cu12-bindings==10.6.0 tensorrt-cu12-libs==10.6.0  
  
# 安装flash attention v2加速(需要CUDA 11.8或更高版本)  
python -m pip install ninja  
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3

4. 下载预训练模型

第一步：安装huggingface-cli

python -m pip install "huggingface_hub[cli]"

第二步：下载模型

# 切换到HunyuanCustom目录  
cd HunyuanCustom  
# 使用huggingface-cli工具下载模型到models目录  
# 下载时间可能因网络情况而异，从10分钟到1小时不等  
huggingface-cli download tencent/HunyuanCustom --local-dir ./

下载完成后，模型将保存在HunyuanCustom/models目录下，结构如下：

HunyuanCustom  
  ├──models  
  │  ├──README.md  
  │  ├──hunyuancustom_720P  
  │  │  ├──mp_rank_00_model_states.pt  
  │  │  │──mp_rank_00_model_states_fp8.pt  
  │  │  ├──mp_rank_00_model_states_fp8_map.pt  
  ├  ├──vae_3d  
  │  ├──openai_clip-vit-large-patch14  
  │  ├──llava-llama-3-8b-v1_1

5. 运行推理

方式一：在单GPU上运行（适合入门）

cd HunyuanCustom  
  
export MODEL_BASE="./models"  
export CPU_OFFLOAD=1  
export PYTHONPATH=./  
python hymm_sp/sample_gpu_poor.py \  
    --input './assets/images/seg_woman_01.png' \  
    --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." \  
    --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." \  
    --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" \  
    --video-size 512 896 \  
    --seed 1024 \  
    --sample-n-frames 129 \  
    --infer-steps 30 \  
    --flow-shift-eval-video 13.0 \  
    --save-path './results/1gpu_540p' \  
    --use-fp8

方式二：低配置显存运行（CPU辅助）

如果你的GPU显存非常有限，可以使用以下命令，通过CPU辅助处理：

cd HunyuanCustom  
  
export MODEL_BASE="./models"  
export CPU_OFFLOAD=1  
export PYTHONPATH=./  
python hymm_sp/sample_gpu_poor.py \  
    --input './assets/images/seg_woman_01.png' \  
    --pos-prompt "Realistic, High-quality. A woman is drinking coffee at a café." \  
    --neg-prompt "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion, blurring, text, subtitles, static, picture, black border." \  
    --ckpt ${MODEL_BASE}"/hunyuancustom_720P/mp_rank_00_model_states_fp8.pt" \  
    --video-size 720 1280 \  
    --seed 1024 \  
    --sample-n-frames 129 \  
    --infer-steps 30 \  
    --flow-shift-eval-video 13.0 \  
    --save-path './results/cpu_720p' \  
    --use-fp8 \  
    --cpu-offload

方式三：使用Gradio界面（对小白最友好）

Gradio提供了一个简单的网页界面，使你可以轻松上传参考图像、设置提示词并生成视频：

cd HunyuanCustom  
python hymm_gradio/gradio_id.py

在Gradio界面中，你可以：

上传参考图像
输入描述性提示词
调整视频的尺寸、帧数和其他参数
点击"Generate"按钮生成视频

小白实操建议

确保硬件满足要求：尝试运行前，先确保你的GPU显存至少有24GB。如果显存较小，可以使用CPU辅助模式。

从简单开始：先尝试单GPU模式，确保基本功能可用后再尝试其他复杂功能。

使用Gradio界面：对于不熟悉命令行的用户，Gradio界面提供了友好的操作方式。

参考示例：项目中的assets目录提供了示例图片，可以先用这些图片尝试生成效果。

HunyuanCustom目前已开源的部分主要是单主体视频定制的推理代码和预训练模型，未来计划还会开源其他模式如音频驱动、视频驱动等功能。如果你在特定GPU类型上运行时遇到浮点异常（core dump），可以参考README中提供的解决方案。项目也提供了Docker镜像，对于不想手动安装环境的用户可以考虑使用Docker方式。

往期推荐：

用豆包+即梦AI做儿童故事绘本，涨粉10W+（保姆级教程）

建议收藏！2025年最好用的15个免费AI工具，包括DeepSeek、豆包、腾讯ima...(全是干货)

别卷了！学渣逆袭秘籍：1个AI工具让300页专业书秒变“追剧体验”，暴躁老哥在线划重点