参考:Moore-AnimateAnyone git地址
# docker镜像获取
docker pull pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime
# docker 容器启动
docker run -itd -v /root/scratch/Moore-AnimateAnyone:/workspace 5dba57
pip install
Traceback (most recent call last):
File "/workspace/tools/extract_dwpose_from_vid.py", line 10, in <module>
from src.dwpose import DWposeDetector
File "/workspace/src/dwpose/__init__.py", line 12, in <module>
import cv2
File "/opt/conda/lib/python3.11/site-packages/cv2/__init__.py", line 181, in <module>
File "/opt/conda/lib/python3.11/site-packages/cv2/__init__.py", line 153, in bootstrap
native_module = importlib.import_module("cv2")
File "/opt/conda/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
apt-get update
apt-get install libgl1
apt-get install libglib2.0-0
ModuleNotFoundError: No module named 'diffusers'
pip install diffusers==0.24.0
踩坑3 容器里没有挂gpu
启动容器时需要用docker run --gpus all
docker run --gpus all -itd -v /root/scratch/Moore-AnimateAnyone:/workspace
踩坑4 任务报错
IndexError: The shape of the mask [0] at index 0 does not match the shape of the indexed tensor [1, 9216, 320] at index 0
在stage1.py中设置reference_unet = ori_net.reference_unet denoising_unet = ori_net.denoising_unet 改成 reference_unet = copy.deepcopy(ori_net.reference_unet) denoising_unet = copy.deepcopy(ori_net.denoising_unet)
may encounter OOM if VARM is not enough -
帖子有人解释说问题的原因是batchsize, 代码作者在创建dataloader的时候没有设置 drop_last=True而代码又假设了batchsize是满的。这样如果最后一个batch不是满的
train_dataloader = torch.utils.data.DataLoader(
train_dataset, batch_size=cfg.data.train_bs, shuffle=True, num_workers=4, drop_last=True
踩坑5 任务报错:共享内存不足
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
# 查看宿主机最大可用内存
df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 237G 0 237G 0% /dev/shm
# 设置docker共享存储
# docker run -it --shm-size=256m ubuntu /bin/bas
docker run -itd --gpus all –shm-size 100g -v /root/scratch/Moore-AnimateAnyone:/workspace
Filesystem Size Used Avail Use% Mounted on
overlay 113G 87G 27G 77% /
tmpfs 64M 0 64M 0% /dev
shm 100G 0 100G 0% /dev/shm
/dev/sdb 5.5T 108G 5.4T 2% /workspace
/dev/sda1 113G 87G 27G 77% /etc/hosts
tmpfs 237G 12K 237G 1% /proc/driver/nvidia
tmpfs 48G 4.0M 48G 1% /run/nvidia-persistenced/socket
tmpfs 237G 0 237G 0% /proc/acpi
tmpfs 237G 0 237G 0% /proc/scsi
tmpfs 237G 0 237G 0% /sys/firmware
tmpfs 237G 0 237G 0% /sys/devices/virtual/powercap
踩坑6 train_strage_2报错
ValueError: Unexpected keyword arguments: encoder_hidden_states,timestep,attention_mask,video_length,self_attention_additional_feats,mode
Issue can be fixed by playing with torch version or by enabling gradient_checkpointing = False in stage2.yaml.
One thing I tried was using a different version of torch which seemed to fix the issue once pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
Otherwise your best bet is to turn off gradient checkpointing, unsure why exactly, we were running into the same issue if we turned on gradient checkpointing in stage1.yaml too
accelerate 0.33.0
anaconda-anon-usage 0.4.4
antlr4-python3-runtime 4.9.3
archspec 0.2.3
asttokens 2.0.5
astunparse 1.6.3
attrs 23.1.0
av 12.3.0
beautifulsoup4 4.12.3
boltons 23.0.0
Brotli 1.0.9
certifi 2024.7.4
cffi 1.16.0
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
coloredlogs 15.0.1
conda 24.5.0
conda-build 24.5.1
conda-content-trust 0.2.0
conda_index 0.5.0
conda-libmamba-solver 24.1.0
conda-package-handling 2.3.0
conda_package_streaming 0.10.0
contourpy 1.2.1
controlnet-aux 0.0.9
cryptography 42.0.5
cycler 0.12.1
decorator 5.1.1
diffusers 0.24.0
distro 1.9.0
dnspython 2.6.1
einops 0.8.0
executing 0.8.3
expecttest 0.2.1
filelock 3.13.1
flatbuffers 24.3.25
fonttools 4.53.1
frozendict 2.4.2
fsspec 2024.6.1
gmpy2 2.1.2
huggingface-hub 0.24.5
humanfriendly 10.0
hypothesis 6.108.4
idna 3.7
imageio 2.35.0
importlib_metadata 8.2.0
ipython 8.25.0
jedi 0.19.1
Jinja2 3.1.4
jsonpatch 1.33
jsonpointer 2.1
jsonschema 4.19.2
jsonschema-specifications 2023.7.1
kiwisolver 1.4.5
lazy_loader 0.4
libarchive-c 2.9
libmambapy 1.5.8
lintrunner 0.12.5
MarkupSafe 2.1.3
matplotlib 3.9.2
matplotlib-inline 0.1.6
menuinst 2.1.1
mkl-fft 1.3.8
mkl-random 1.2.4
mkl-service 2.4.0
more-itertools 10.1.0
mpmath 1.3.0
networkx 3.3
numpy 1.26.4
omegaconf 2.3.0
onnxruntime 1.19.0
optree 0.12.1
packaging 24.1
parso 0.8.3
pexpect 4.8.0
pillow 10.4.0
pip 24.0
pkginfo 1.10.0
platformdirs 3.10.0
pluggy 1.0.0
prompt-toolkit 3.0.43
protobuf 5.27.3
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
pycosat 0.6.6
pycparser 2.21
Pygments 2.15.1
pyparsing 3.1.2
PySocks 1.7.1
python-dateutil 2.9.0.post0
python-etcd 0.4.5
pytz 2024.1
PyYAML 6.0.1
referencing 0.30.2
regex 2024.7.24
requests 2.32.3
rpds-py 0.10.6
ruamel.yaml 0.17.21
safetensors 0.4.4
scikit-image 0.24.0
scipy 1.14.0
setuptools 69.5.1
six 1.16.0
sortedcontainers 2.4.0
soupsieve 2.5
stack-data 0.2.0
sympy 1.12
tifffile 2024.8.10
timm 0.6.7
tokenizers 0.19.1
torch 2.4.0
torchaudio 2.4.0
torchelastic 0.2.2
torchvision 0.19.0
tqdm 4.66.4
traitlets 5.14.3
transformers 4.44.0
triton 3.0.0
truststore 0.8.0
types-dataclasses 0.6.6
typing_extensions 4.11.0
urllib3 2.2.2
wcwidth 0.2.5
wheel 0.43.0
zipp 3.20.0
zstandard 0.22.0
python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 784 -L 64
python tools/extract_dwpose_from_vid.py --video_root /workspace/data/Batch_4/19489910/BV1mX4y1z75Q
root@46e99c62b9aa:/workspace# nvidia-smi
Sat Aug 17 02:52:24 2024
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 NVIDIA H100 PCIe Off | 00000000:01:00.0 Off | 0 |
| N/A 39C P0 52W / 350W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
| 1 NVIDIA H100 PCIe Off | 00000000:02:00.0 Off | 0 |
| N/A 41C P0 51W / 350W | 1MiB / 81559MiB | 0% Default |
| | | Disabled |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| No running processes found |