MotionDiffuse:
- 问题:pycharm debug时,在ddpm_trainer的forward函数中单步调试时会卡住
具体情况:与多线程的dataloader相关。卡在:r = index_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
解决方法:将num_workers设为0,且persistant_workers设为False。
OMOMO:
问题:安装环境时,执行到 pip install -r requirements.txt 时,遇到:ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
解决方法:强制卸载PyYAMLfind /home/yifei/omomo_env -name *PyYAML* rm -r /home/yifei/anaconda3/envs/omomo_env/lib/python3.8/site-packages/PyYAML-5.1.2-py3.8.egg-info
- 问题:安装好环境后,运行test,出现 no module named 错误:
解决方法(来自师妹):把包对应的src文件夹直接复制到工作目录下。Traceback (most recent call last): File "trainer_hand_foot_manip_diffusion.py", line 36, in <module> from human_body_prior.body_model.body_model import BodyModel ModuleNotFoundError: No module named 'human_body_prior.body_model'
Detectron2:
- 问题:
subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1
- 各种错误,解决一个还有一个,部分原因在于使用了python3.10和pytorch1.12.1,而detectron2并不兼容如此高的版本。遂使用python3.8,pytorch1.10,cuda10.2的组合。
Slahmr:
这个安装很复杂,涉及很多问题。像detectron2就是安装slahmr所需的包。因此直接记录正确安装流程。
conda create -n slahmr python=3.8 -y
conda activate slahmr
# pip install torch==1.11.0+cu102 torchvision==0.12.0+cu102 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu102
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu102/torch_stable.html
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu102.html
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.10/index.html
python -m pip install pytube
pip install PyOpenGL PyOpenGL_accelerate
pip install chumpy
git clone https://github.com/shubham-goel/NMR.git
cd NMR/
python setup.py install
git clone https://github.com/shubham-goel/4D-Humans.git
cd 4D-Humans/
conda create -n slahmr --clone slahmr_bu
git clone https://github.com/brjathu/PHALP.git
cd PHALP
pip install pytorch_lightning==1.8.0
pip install -e .[all]
cd ../slahmr/
pip install -r requirements.txt
pip install -e .
pip install -v -e third-party/ViTPose
cd third-party/DROID-SLAM
python setup.py install
cd ../..
中间遇到的主要问题包括:
(1)安装torch-scatter后,直接安装PHALP失败,所以检查PHALP的setup.py文件,一个个安装requirements,包括detectron2/pytueb/pyopengl/chumpy/NMR/4D-Humans/pytorch-lightning。
(2)detectron2因为python、cuda版本问题安装失败。调整为正确版本即可。
(3)对于NMR/4D-Humans,可能也要检查一下setup.py。
(4)若自动安装pytorch-lightning,会同时安装最新的torch,导致后续其他包的安装失败。所以在安装pytorch-lightning时要指定版本号。
还有其他的,比如error `nvcc fatal : Unsupported gpu architecture ‘compute_80‘ ninja: build stopped: subcommand fai_nvcc fatal : unsupported gpu architecture 'compute-CSDN博客错误解决:subprocess.CalledProcessError: Command ‘[‘ninja‘, ‘-v‘]‘ returned non-zero exit status 1_subprocess.calledprocesserror: command '['ninja', -CSDN博客/bin/sh: 1: ffmpeg: not found
module 'PIL.Image' has no attribute 'LINEAR': 卸载安装pillow=8.4.0,对应的scikit-learn也要降版本。
结果还是不行,因为4D-Human是基于python3.10的,很多代码都依赖3.10。所以还是得重新安装python3.10的环境。
conda create -n slahmr python=3.10 -y
conda activate slahmr
pip install torch==1.11.0+cu102 torchvision==0.12.0+cu102 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu102
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.10.1+cu102.html
git clone detectron2或者直接下载
cd ../../detectron2-main/
pip install .
python -m pip install pytube
pip install PyOpenGL PyOpenGL_accelerate
pip install chumpy
git clone https://github.com/shubham-goel/NMR.git
cd NMR/
python setup.py install
git clone https://github.com/shubham-goel/4D-Humans.git
cd 4D-Humans/
pip install -e .
conda create -n slahmr --clone slahmr_bu
git clone https://github.com/brjathu/PHALP.git
cd PHALP
pip install pytorch_lightning==1.8.0
pip install -e .[all]
cd ../slahmr/
pip install -r requirements.txt
pip install -e .
pip install -v -e third-party/ViTPose
cd third-party/DROID-SLAM
python setup.py install
cd ../..
结果显存又不够了
CUDA out of memory · Issue #9 · vye16/slahmr (github.com)
linear_assignment报错
from scipy.optimize import linear_sum_assignment as linear_assignment
indices = np.hstack([indices[0].reshape(((indices[0].shape[0]), 1)),indices[1].reshape(((indices[0].shape[0]), 1))])
ValueError: batch_size should be a positive integer value, but got batch_size=0
可能是因为input video有切换镜头。保留不切换镜头的部分。
InterControl:
安装时出现的错误包括:
1. 使用自带的environment.yml创建conda环境时,会出现
Pip subprocess error:
ERROR: Ignored the following versions that require a different python version: 0.6.2rc0 Requires-Python >=3.8; 0.7.0 Requires-Python >=3.8; 0.7.0rc1 Requires-Python >=3.8; 0.7.0rc2 Requires-Python >=3.8; 2.1.0 Requires-Python >=3.8.0; 2.1.1 Requires-Python >=3.8.0
ERROR: Could not find a version that satisfies the requirement en-core-web-sm==3.3.0 (from versions: none)
ERROR: No matching distribution found for en-core-web-sm==3.3.0
failed
原因:environment.yml里有一些包需要单独安装。
2. libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11
原因:使用自带的environment.yml安装torch和cudatoolkit可能导致该问题。
3. ffmpeg: error while loading shared libraries: libopenh264.so.5: cannot open shared object file: No such file or directory
原因:该环境下的libopenh264.so版本太高。
- Downloading the openh264 binary from GitHub
- Copying/renaming the binary to my conda env, e.g.
~/anaconda3/envs/py38/lib/libopenh264.so.5
wherepy38
is the env name
新的安装流程:
将environment.yml里的pip包另外保存为pip_requires.txt
conda env create -f environment.yml
conda activate InterControl
conda unistall pytorch torchvision torchaudio cudatoolkit
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r pip_requires.txt
python -m spacy download en_core_web_sm # 可能需要先单独下载包
pip install git+https://github.com/openai/CLIP.git # 可能需要先单独下载包
DIMOS:
1. DIMOS不需要安装pointnet2本体,只需要安装pointnet2_obs。然而,pointnet2_obs需要cudatoolkit版本与nvcc版本严格对应,且不可以用pytorch-cuda代替。所以最终选择pytorch1.12和cuda11.6.4。
2. RuntimeError: Mask shape should match input shape; transformer_mask is not supported in the fallback case.
原因:pytorch1.13以上版本可能不会出现此问题。然而由于上面的问题,只能使用pytorch1.12。所以打开查看出错源文件,发现问题跟使用了fast_path有关。故在transformer初始化时添加dropout=0.1,避免使用fast_path。由于使用的是off-the-shelf模型inference,所以不需要担心dropout的其他影响。
3. 为blender自带的python安装包 [link1, link2]
While blender's python doesn't come with pip installed, it does have ensurepip. That means that you can do something like this:
in blender's python:
>>> import sys
>>> sys.exec_prefix
'/path/to/blender/python'
then in a shell:
cd /path/to/blender/python/bin
./python -m ensurepip
./python -m pip install scipy
As Noam Peled mentions, you need to run these commands as an administrator on Windows - it probably depends on how you have blender installed on your linux machine, but you may also need to do this with escalated privileges.
open-VASA:
1. 不安装git lfs,下载hugging face model的方法:hfd.sh [link]
pip install -U "huggingface_hub[cli]" huggingface-cli download --resume-download facebook/DiT-XL-2-512 --local-dir ./models--facebook--DiT-XL-2-512
2. Diffusers包中,读取dit transformer而非pipeline:
reference_net = Transformer2DModel.from_pretrained(
"facebook/DiT-XL-2-256", subfolder="transformer",cache_dir='/mnt/disk2/pretrained_weights/', torch_dtype=torch.float16,
).to(device="cuda")
3. 一台机器上使用多个github账号:
首先将机器的ssh公钥上传到各个github账号上。然后在git repo文件夹内切换账户
git config --local user.email hongwei.yi@tuebingen.mpg.de
4. https://drive.google.com/file/d/13FKCIASBK-DrWJ_vVMrelGNofMEd_4d4/view?usp=sharing
d/id/view
pip install gdown
gdown https://drive.google.com/uc?id=ID
5. When I train with mixed precision, I get a "ValueError: Attempting to unscale FP16 gradients" error.
6. clone其他库后,上传到指定库为新分支
1、先输入git remote rm origin 删除关联的origin的远程库
2、关联自己的仓库 git remote add origin url
3. 利用pycharm创建新分支或 git checkout -b branchname
4. git push origin branchname
GitHub创建分支两种方式_git新建branch-CSDN博客
PS: 库内切换账户
git config user.name "username"
git config user.email "email"
Motion Diffusion Model
1. 解决:_pickle.UnpicklingError: the STRING opcode argument must be quoted-CSDN博客
PantoMatrix
1. module 'pyarrow' has no attribute 'serialize'
pyarrow 2.0.0弃用’serialize'和'deserialize'。用pickle.dumps和pickle.loads替代。