step1: copy conda
docker cp 本机conda虚拟环境路径 容器内虚拟环境路径
查看虚拟环境路径:
conda info -e
例如:
本机
容器内:
则,迁移命令为:
docker cp /home/XXX/anaconda3/envs/py38torch18 容器IP:/opt/conda/envs/
迁移完成后,进入容器,验证:
conda info -e
问题:
1.遇到bash: /opt/conda/envs/py38torch18/bin/pip: /home/XXXX/anaconda3/envs/py38torch18/bin/python: bad interpreter: No such file or directory:
原因:
cp 了conda环境但是,python路径还是原机器路径,需要改一下
解决:
打开/opt/conda/envs/py38torch18/bin,(对应你自己虚拟环境的路径)
vi /opt/conda/envs/py38torch18/bin/pip
修改第一行
#!/home/xxx/anaconda3/envs/py38torch18/bin/python
改为
#!/opt/conda/envs/py38torch18/bin/python
2. 原先conda环境中设置了horovod,迁移之后也无法使用,出现
(py38torch18) root@xxxxx:/Waymo/OpenPCDet# python
Python 3.8.12 (default, Oct 12 2021, 13:49:34)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import horovod.torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/conda/envs/py38torch18/lib/python3.8/site-packages/horovod/torch/__init__.py", line 35, in <module>
from horovod.torch import elastic
File "/opt/conda/envs/py38torch18/lib/python3.8/site-packages/horovod/torch/elastic/__init__.py", line 17, in <module>
from horovod.torch.mpi_ops import init, shutdown
File "/opt/conda/envs/py38torch18/lib/python3.8/site-packages/horovod/torch/mpi_ops.py", line 36, in <module>
raise e
File "/opt/conda/envs/py38torch18/lib/python3.8/site-packages/horovod/torch/mpi_ops.py", line 33, in <module>
from horovod.torch import mpi_lib_v2 as mpi_lib
ImportError: libmpi.so.12: cannot open shared object file: No such file or directory
一般情况,还可通过如下命令进行check build,
horovodrun --check-build
同样出现bad interpreter问题,
类似的,因为迁移之后是复制操作,一些文件的路径还是原机器的路径,与容器内不匹配了,修改即可。如下图,将第一行内容改为容器内conda路径即可,
再次验证即可:horovod出现了,但还需要安装,发现Framework,Controller以及TensorOperations都没有:
可参考:Horovod Installation Guide — Horovod documentation
完成*★,°*:.☆( ̄▽ ̄)/$:*.°★* 。