We find that IsaacGym rendering works only on the first GPU when multiple GPUs are available. If you encounter the same issue, please follow the instructions below to rendering with other GPUs. This might take a few megabytes of memory on gpu 0 due to OpenGL but the actual isaacgym rendering takes effect on the selected vulkan device.
在多个显卡的服务器上render RGB or depth images时,可能会报错 Segmentation fault (core dumped)
这样的错误 (注意首先排除是 CUDA Out Of Memory
问题导致)。最简单的可行办法是强制只用0号卡进行渲染。
其实这个问题不止IsaacGym有,Isaac Sim和Maniskill (SAPIEN)都有。
比较复杂但是有效的解决办法是:
检查系统中 vulkan
的安装
vulkaninfo
# 如果没有,安装:
sudo apt-get install cmake git gcc g++ mesa-* libwayland-dev libxrandr-dev
sudo apt-get install libvulkan1 mesa-vulkan-drivers vulkan-utils
接下来可以参考下列solution (from https://github.com/mihdalal/manipgen#multi-gpu-rendering):
-
Setup the vulkan device chooser layer from here. If during installation
meson compile -C builddir
command does not work, useninja -C builddir
instead. -
Set the following environment variables either in .bashrc or in command line prior to the training command:
VK_ICD_FILENAMES=/etc/vulkan/icd.d/nvidia_icd.json
(Verify if this file is located in/usr/share/vulkan/icd.d/nvidia_icd.json
or the above path. If it is present in both locations, delete the/usr/share/vulkan/icd.d/
directory.)DISABLE_LAYER_NV_OPTIMUS_1=1
DISABLE_LAYER_AMD_SWITCHABLE_GRAPHICS_1=1
-
run the following in your terminal:
unset DISPLAY
sudo /usr/bin/X :1 &
(launch a virtual display)- For training on gpu X, use command:
DISPLAY=:1 ENABLE_DEVICE_CHOOSER_LAYER=1 VULKAN_DEVICE_INDEX=X CUDA_VISIBLE_DEVICES=X python dagger.py device='cuda:X' [other arguments]
Ref:
https://forums.developer.nvidia.com/t/segmentation-fault-when-using-different-gpus/224136/5