深度学习系列66：试穿模型IDM-VTON上手-CSDN博客

本文链接：https://blog.csdn.net/kittyzc/article/details/138205119

本文介绍了IDM-VTON模型的结构，包括高级语义网络IP-Adapter和低级语义网络GarmentNet，以及如何通过HuggingFace的示例快速上手。详细步骤涉及下载预训练模型并配置环境以实现实时衣物换人效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 模型概述

在这里插入图片描述
如图，总体流程为：

输入为：衣服的编码xg；人物+noise的编码xt；人物身上衣物的mask和人体pose分割(densepose)；
衣服部分经过两部分网络：1）高级语义网络IP-Adapter：是一个图像编码器，比如CLIP模型；2）低级语义网络：称为GarmentNet，是一个UNet，用来提取图像低级细节特征，例如纹理，图案等等。
人体部分经过TryonNet，也是一个UNet。其输入和GarmentNet同层进行拼接后，输入自注意力层，然后取左半部分，与IPAdaper的结果，以及文本编码结果进行交叉注意力计算。

官网为：https://idm-vton.github.io/
不同模型的效果对比图如下：
在这里插入图片描述

2. 快速上手

可以在huggingface的demo上进行尝试：https://hf-mirror.com/spaces/yisol/IDM-VTON
参考https://github.com/camenduru/IDM-VTON-jupyter/blob/main/IDM_VTON_jupyter.ipynb，执行代码：

git clone  https://hub.nuaa.cf/camenduru/IDM-VTON-hf
cd IDM-VTON-hf
apt -y install -qq aria2
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/densepose/model_final_162be9.pkl -d /content/IDM-VTON-hf/ckpt/densepose -o model_final_162be9.pkl
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/humanparsing/parsing_atr.onnx -d /content/IDM-VTON-hf/ckpt/humanparsing -o parsing_atr.onnx
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/humanparsing/parsing_lip.onnx -d /content/IDM-VTON-hf/ckpt/humanparsing -o parsing_lip.onnx
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/openpose/ckpts/body_pose_model.pth -d /content/IDM-VTON-hf/ckpt/openpose/ckpts -o body_pose_model.pth
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/IDM-VTON-DC/unet/diffusion_pytorch_model.bin -d /content/IDM-VTON-hf/ckpt/openpose/ckpts/unet -o diffusion_pytorch_model.bin
aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://hf-mirror.com/camenduru/IDM-VTON/resolve/main/IDM-VTON-DC/unet/config.json -d /content/IDM-VTON-hf/ckpt/openpose/ckpts/unet -o config.json

pip install -q diffusers==0.25.0 accelerate==0.26.1 einops==0.7.0 onnxruntime==1.16.2 cloudpickle omegaconf gradio==4.24.0 fvcore av config spaces -i https://pypi.tuna.tsinghua.edu.cn/simple

然后执行python app.py启动应用即可
另外下载的模型也可以替换为F16的版本，参考：https://hf-mirror.com/camenduru/IDM-VTON-F16/tree/main