Motion相关代码遇到的问题

shlore

已于 2024-06-12 19:18:40 修改

阅读量673

点赞数 12

文章标签：人工智能深度学习

于 2024-01-05 11:36:28 首次发布

本文链接：https://blog.csdn.net/shlore/article/details/135405706

版权

MotionDiffuse:

问题：pycharm debug时，在ddpm_trainer的forward函数中单步调试时会卡住
具体情况：与多线程的dataloader相关。卡在：
```
r = index_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
```
解决方法：将num_workers设为0，且persistant_workers设为False。

OMOMO:

~~问题：安装环境时，执行到 pip install -r requirements.txt 时，遇到：~~

ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

~~解决方法：强制卸载PyYAML~~

find /home/yifei/omomo_env -name *PyYAML*
rm -r /home/yifei/anaconda3/envs/omomo_env/lib/python3.8/site-packages/PyYAML-5.1.2-py3.8.egg-info

问题：安装好环境后，运行test，出现 no module named 错误：

Traceback (most recent call last):
  File "trainer_hand_foot_manip_diffusion.py", line 36, in <module>
    from human_body_prior.body_model.body_model import BodyModel
ModuleNotFoundError: No module named 'human_body_prior.body_model'

解决方法（来自师妹）：把包对应的src文件夹直接复制到工作目录下。

Detectron2:

问题：
```
subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1
```
解决方法：错误解决：subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1_subprocess.calledprocesserror: command '['ninja', -CSDN博客
各种错误，解决一个还有一个，部分原因在于使用了python3.10和pytorch1.12.1，而detectron2并不兼容如此高的版本。遂使用python3.8，pytorch1.10，cuda10.2的组合。

Slahmr:

这个安装很复杂，涉及很多问题。像detectron2就是安装slahmr所需的包。因此直接记录正确安装流程。

conda create -n slahmr python=3.8 -y
conda activate slahmr
# pip install torch==1.11.0+cu102 torchvision==0.12.0+cu102 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu102
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu102.html
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html
python -m pip install pytube
pip install PyOpenGL PyOpenGL_accelerate
pip install chumpy
git clone https://github.com/shubham-goel/NMR.git
cd NMR/
python setup.py install
git clone https://github.com/shubham-goel/4D-Humans.git
cd 4D-Humans/
conda create -n slahmr --clone slahmr_bu
git clone https://github.com/brjathu/PHALP.git
cd PHALP
pip install pytorch_lightning==1.8.0
pip install -e .[all]
cd ../slahmr/
pip install -r requirements.txt
pip install -e .
pip install -v -e third-party/ViTPose
cd third-party/DROID-SLAM
python setup.py install
cd ../..

中间遇到的主要问题包括：

（1）安装torch-scatter后，直接安装PHALP失败，所以检查PHALP的setup.py文件，一个个安装requirements，包括detectron2/pytueb/pyopengl/chumpy/NMR/4D-Humans/pytorch-lightning。

（2）detectron2因为python、cuda版本问题安装失败。调整为正确版本即可。

（3）对于NMR/4D-Humans，可能也要检查一下setup.py。

（4）若自动安装pytorch-lightning，会同时安装最新的torch，导致后续其他包的安装失败。所以在安装pytorch-lightning时要指定版本号。

还有其他的，比如error `nvcc fatal : Unsupported gpu architecture ‘compute_80‘ ninja: build stopped: subcommand fai_nvcc fatal : unsupported gpu architecture 'compute-CSDN博客错误解决：subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1_subprocess.calledprocesserror: command '['ninja', -CSDN博客 /bin/sh: 1: ffmpeg: not found

module 'PIL.Image' has no attribute 'LINEAR': 卸载安装pillow=8.4.0，对应的scikit-learn也要降版本。

结果还是不行，因为4D-Human是基于python3.10的，很多代码都依赖3.10。所以还是得重新安装python3.10的环境。

conda create -n slahmr python=3.10 -y
conda activate slahmr
pip install torch==1.11.0+cu102 torchvision==0.12.0+cu102 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu102
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu102.html
git clone detectron2或者直接下载
cd ../../detectron2-main/
pip install .
python -m pip install pytube
pip install PyOpenGL PyOpenGL_accelerate
pip install chumpy
git clone https://github.com/shubham-goel/NMR.git
cd NMR/
python setup.py install
git clone https://github.com/shubham-goel/4D-Humans.git
cd 4D-Humans/
pip install -e .
conda create -n slahmr --clone slahmr_bu
git clone https://github.com/brjathu/PHALP.git
cd PHALP
pip install pytorch_lightning==1.8.0
pip install -e .[all]
cd ../slahmr/
pip install -r requirements.txt
pip install -e .
pip install -v -e third-party/ViTPose
cd third-party/DROID-SLAM
python setup.py install
cd ../..

结果显存又不够了

CUDA out of memory · Issue #9 · vye16/slahmr (github.com)

No module named ´sklearn.utils.linear_assignment_´_importerror: cannot import name 'linear_assignment-CSDN博客

linear_assignment报错

from scipy.optimize import linear_sum_assignment as linear_assignment

indices = np.hstack([indices[0].reshape(((indices[0].shape[0]), 1)),indices[1].reshape(((indices[0].shape[0]), 1))])

ValueError: batch_size should be a positive integer value, but got batch_size=0

可能是因为input video有切换镜头。保留不切换镜头的部分。

InterControl:

安装时出现的错误包括：

1. 使用自带的environment.yml创建conda环境时，会出现

Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.6.2rc0 Requires-Python >=3.8; 0.7.0 Requires-Python >=3.8; 0.7.0rc1 Requires-Python >=3.8; 0.7.0rc2 Requires-Python >=3.8; 2.1.0 Requires-Python >=3.8.0; 2.1.1 Requires-Python >=3.8.0
ERROR: Could not find a version that satisfies the requirement en-core-web-sm==3.3.0 (from versions: none)
ERROR: No matching distribution found for en-core-web-sm==3.3.0
failed

原因：environment.yml里有一些包需要单独安装。

2. libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11

原因：使用自带的environment.yml安装torch和cudatoolkit可能导致该问题。

3. ffmpeg: error while loading shared libraries: libopenh264.so.5: cannot open shared object file: No such file or directory

原因：该环境下的libopenh264.so版本太高。

Downloading the openh264 binary from GitHub
Copying/renaming the binary to my conda env, e.g. ~/anaconda3/envs/py38/lib/libopenh264.so.5 where py38 is the env name

新的安装流程：

将environment.yml里的pip包另外保存为pip_requires.txt
conda env create -f environment.yml
conda activate InterControl
conda unistall pytorch torchvision torchaudio cudatoolkit
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r pip_requires.txt
python -m spacy download en_core_web_sm # 可能需要先单独下载包
pip install git+https://github.com/openai/CLIP.git # 可能需要先单独下载包

DIMOS:

1. DIMOS不需要安装pointnet2本体，只需要安装pointnet2_obs。然而，pointnet2_obs需要cudatoolkit版本与nvcc版本严格对应，且不可以用pytorch-cuda代替。所以最终选择pytorch1.12和cuda11.6.4。

2. RuntimeError: Mask shape should match input shape; transformer_mask is not supported in the fallback case.

原因：pytorch1.13以上版本可能不会出现此问题。然而由于上面的问题，只能使用pytorch1.12。所以打开查看出错源文件，发现问题跟使用了fast_path有关。故在transformer初始化时添加dropout=0.1，避免使用fast_path。由于使用的是off-the-shelf模型inference，所以不需要担心dropout的其他影响。

3. 为blender自带的python安装包 [link1, link2]

While blender's python doesn't come with pip installed, it does have ensurepip. That means that you can do something like this:

in blender's python:

>>> import sys
>>> sys.exec_prefix
'/path/to/blender/python'
then in a shell:

cd /path/to/blender/python/bin
./python -m ensurepip
./python -m pip install scipy
As Noam Peled mentions, you need to run these commands as an administrator on Windows - it probably depends on how you have blender installed on your linux machine, but you may also need to do this with escalated privileges.

open-VASA:

1. ~~不安装git lfs，下载hugging face model的方法：hfd.sh [link]~~

pip install -U "huggingface_hub[cli]"

huggingface-cli download --resume-download facebook/DiT-XL-2-512 --local-dir ./models--facebook--DiT-XL-2-512

2. Diffusers包中，读取dit transformer而非pipeline：

 reference_net = Transformer2DModel.from_pretrained(
        "facebook/DiT-XL-2-256", subfolder="transformer",cache_dir='/mnt/disk2/pretrained_weights/', torch_dtype=torch.float16,
    ).to(device="cuda")

3. 一台机器上使用多个github账号：

首先将机器的ssh公钥上传到各个github账号上。然后在git repo文件夹内切换账户

git config --local user.email hongwei.yi@tuebingen.mpg.de

4. https://drive.google.com/file/d/13FKCIASBK-DrWJ_vVMrelGNofMEd_4d4/view?usp=sharing

d/id/view

pip install gdown

gdown https://drive.google.com/uc?id=ID

5. When I train with mixed precision, I get a "ValueError: Attempting to unscale FP16 gradients" error.

6. clone其他库后，上传到指定库为新分支

1、先输入git remote rm origin 删除关联的origin的远程库
2、关联自己的仓库 git remote add origin url
3. 利用pycharm创建新分支或 git checkout -b branchname
4. git push origin branchname

GitHub创建分支两种方式_git新建branch-CSDN博客

PS: 库内切换账户

git config user.name "username"
git config user.email "email"

git如何切换本地账户 - CSDN文库

Motion Diffusion Model

1. 解决：_pickle.UnpicklingError: the STRING opcode argument must be quoted-CSDN博客

2. 高效解决 TypeError : ‘ numpy._DTypeMeta‘ object is not subscriptable 问题_nump y-dtypemeta object is not subscriptable-CSDN博客

3. 执行代码出现ImportError:attempted relative import with no known parent package - 午夜稻草人 - 博客园 (cnblogs.com)

PantoMatrix

1. module 'pyarrow' has no attribute 'serialize'

pyarrow 2.0.0弃用’serialize'和'deserialize'。用pickle.dumps和pickle.loads替代。