Singularity（九）| 配置深度学习容器

最新推荐文章于 2024-05-19 09:58:46 发布

ShengXinF3

最新推荐文章于 2024-05-19 09:58:46 发布

阅读量1.1k

点赞数 16

文章标签：后端

本文链接：https://blog.csdn.net/long_1998/article/details/136664619

版权

Singularity（九）| 配置深度学习容器

9.1 NVIDIA CUDA 简介

为什么要在人工智能应用中使用 GPU？

在人工智能领域，我们主要使用两个子领域：机器学习和深度学习。后者是基于人工神经网络的机器学习方法大家族的一部分。

在深度学习中，操作本质上是矩阵乘法，GPU 比 CPU（中央处理器）更高效。这也是近年来 GPU 的使用不断增长的原因。事实上，GPU 因其大规模并行架构而被视为深度学习的核心。

然而，GPU 不能执行任何程序。事实上，它们使用一种特定的语言（英伟达™（NVIDIA®）公司的 CUDA）来利用其架构优势。那么，如何在应用程序中使用 GPU 并与之通信呢？

英伟达™（NVIDIA®）CUDA 技术

英伟达™（NVIDIA®）CUDA（Compute Unified Device Architecture）是一种并行计算架构，结合了用于 GPU 编程的 API。CUDA 将应用代码转换成 GPU 可以执行的指令集。

NVIDIA 开发了 CUDA SDK 、cuBLAS（Basic Linear Algebra Subroutines，基本线性代数子程序）和 cuDNN（Deep Neural Network，深度神经网络）等库，以便轻松高效地与 GPU 通信。CUDA 有 C、C++ 和 Fortran 三种语言版本。例如，TensorFlow 和 Keras 等深度学习库就是基于这些技术。

为什么使用 Nvidia-docker？

Nvidia-docker 可满足开发人员的需求，他们希望将人工智能功能添加到自己的应用程序中，将其容器化，并部署到由英伟达™（NVIDIA®）GPU 驱动的服务器上。

Nvidia-docker 的目标是建立一种架构，允许在通过 API 提供的服务中开发和部署深度学习模型。因此，通过向多个应用实例提供 GPU 资源，可以优化 GPU 资源的利用率。

此外，我们还能受益于容器化环境的优势：

隔离每个人工智能模型的实例。
具有特定依赖关系的多个模型的主机托管。
同一模型在多个版本下的主机托管。
一致的模型部署。
模型性能监控。

在容器中使用 GPU 需要在容器中安装 CUDA 并赋予访问设备的权限。考虑到这一点，Nvidia 开发了 nvidia-docker 工具，允许以隔离和安全的方式在容器中暴露英伟达™（NVIDIA®）GPU 设备。

Nvidia-docker 工具

Nvidia-docker 可以让我们在隔离的环境中访问 GPU 资源。为了在应用程序中使用 GPU 加速，英伟达™（NVIDIA®）公司开发了多种工具（不完全列表）：

CUDA 工具包（CUDA Toolkit）：一套用于开发软件/程序的工具，可同时使用 CPU、RAM 和 GPU 执行计算。它可用于 x86、Arm 和 POWER 平台。
cuDNN：一个用于加速深度学习网络并优化 Tensorflow 和 Keras 等主要框架的 GPU 性能的基元库。
NVIDIA cuBLAS：GPU 加速线性代数子程序库。

CUDA Toolkit 是最底层的选项。它提供最多的控制（内存和指令）来构建自定义应用程序。库提供了 CUDA 功能的抽象。通过它们，您可以专注于应用程序开发，而不是 CUDA 实现。

一旦实现了所有这些元素，使用 nvidia-docker 服务的架构就可以使用了。

下面的图表总结了这一过程：

这一架构允许我们的应用程序在隔离的环境中使用 GPU 资源。概括地说，该架构由以下部分组成：

宿主机操作系统： Linux, Windows ...
Docker：使用 Linux 容器隔离环境
NVIDIA driver：安装相关硬件的驱动程序（必须存在于宿主机上）
NVIDIA container runtime：对前三者进行协调
Docker 容器上的应用程序：
- CUDA
- cuDNN
- cuBLAS
- Tensorflow/Keras

CUDA Toolkit

CUDA Toolkit 是 NVIDIA 开发的一套用于并行计算的软件工具集，旨在利用 NVIDIA GPU 的计算能力。它为开发者提供了强大的 GPU 编程支持，使他们能够在 GPU 上运行高性能的并行计算应用程序。CUDA Toolkit 包含多个主要模块，以下是一些主要模块的简要介绍：

CUDA Driver API： CUDA Driver API 是一组面向驱动程序的低级别函数，允许开发者与 GPU 驱动直接交互,用于低级 CUDA 编程。与 CUDA Runtime API 相比，CUDA Driver API 提供了更底层、更灵活的 GPU 控制。它允许开发者对 CUDA 的细节进行更精细的控制，并在特殊情况下提供额外的性能优化。如果我们安装 NVIDIA CUDA Toolkit，NVIDIA 驱动程序也将被安装。我们通常用于链接 CUDA 程序的共享库名称是 libcuda.so。它的头文件名是cuda.h。
```
$ find / -name cuda.h
/usr/include/cuda.h
/usr/include/linux/cuda.h
/usr/src/linux-headers-5.4.0-47/include/linux/cuda.h
/usr/src/linux-headers-5.4.0-47/include/uapi/linux/cuda.h
/usr/src/linux-headers-5.4.0-48/include/linux/cuda.h
/usr/src/linux-headers-5.4.0-48/include/uapi/linux/cuda.h
$ find / -name libcuda.*
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so.450.66
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/stubs/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.450.66
/usr/lib/x86_64-linux-gnu/libcuda.so
```
从库文件名中，我们可以看到 NVIDIA 驱动程序版本是 450.66。
CUDA Runtime API： CUDA Runtime API 是一组运行时库函数，用于在应用程序中实现 GPU 并行计算。开发者可以使用这些函数来管理 GPU 设备、内存分配、数据传输和执行计算任务等。CUDA Runtime API 提供了较高层次的抽象，使得开发者能够相对容易地编写和维护 CUDA 程序。

CUDA Runtime 库与 NVIDIA CUDA Toolkit 一起安装，用于高级 CUDA 编程。我们通常用于链接 CUDA 程序的共享库名称是 libcudart.so。它的头文件名是 cuda_runtime.h。
```
$ find / -name cuda_runtime.h
/usr/include/cuda_runtime.h
$ find / -name libcudart.*
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1.243
```
从库文件名中，我们可以看到 NVIDIA Runtime 库版本是 10.1.243。
cuBLAS： cuBLAS 是 NVIDIA 提供的针对线性代数操作的 GPU 加速库。它包含了一系列优化的线性代数函数，如矩阵乘法、矩阵求逆、LU 分解等，可在 GPU 上高效执行这些计算。cuBLAS 对于科学计算和深度学习等领域非常有用。
cuDNN： cuDNN 是 NVIDIA 提供的用于深度学习的 GPU 加速库。它针对深度神经网络的训练和推断提供了优化的算法和函数。cuDNN 能够显著加速常见的深度学习任务，如卷积、池化、归一化等。
cuFFT： cuFFT 是 NVIDIA 提供的用于高性能傅里叶变换的 GPU 库。它允许开发者在 GPU 上执行快速傅里叶变换（FFT）和反变换，适用于信号处理、图像处理和科学计算等领域。

除了上述主要模块之外，CUDA Toolkit 还包含其他工具和库，用于性能分析、调试、优化和并行编程支持。例如，nvcc 是 CUDA 的编译器，用于将 CUDA C/C++ 代码编译为可在 GPU 上执行的二进制代码。NVIDIA Nsight工具用于性能分析和调试 CUDA 应用程序。另外，CUDA Toolkit 还提供了支持其他编程语言（如 Python）的绑定和接口，方便开发者在不同领域使用 CUDA 进行 GPU 编程。

9.2 Singularity 配置 GPU 支持

SingularityCE 原生支持运行使用 NVIDIA CUDA GPU 计算框架或 AMD ROCm 解决方案的应用容器。这样，无论主机操作系统如何，用户都可以轻松访问支持 GPU 的机器学习框架（如 tensorflow）。只要主机安装了 CUDA/ROCm 的驱动程序和库，就可以在最新的 Ubuntu 20.04 容器中运行 tensorflow。

安装 NVIDIA 驱动程序

访问 NVIDIA 官方网站（https://www.nvidia.com/Download/index.aspx）并搜索适用于当前主机 GPU 型号的最新 Linux 驱动。
下载适合 GPU 型号的驱动程序，通常为一个 .run 文件。
在终端中，切换到下载目录后执行：
```
sh ./NVIDIA-Linux-x86_64-xxx.xx.xx.run
```
脚本将引导完成安装过程。按照提示进行操作（选 yes）。
验证驱动程序安装是否成功：
```
$ nvidia-smi
```

NVIDIA GPU 和 CUDA

运行 run 或以其他方式执行容器 (shell，exec) 的命令可以采用 --nv 选项，这将设置容器的环境，以使用 NVIDIA GPU 和基本 CUDA 库来运行支持 CUDA 的应用程序，--nv 选项将：

确保容器内存在 /dev/nvidiaX 设备表项，以便主机上的 GPU 可以访问；
从主机找到并绑定基本的 CUDA 库到容器中，使它们对容器可用，并匹配主机上的内核 GPU 驱动程序。
在容器内设置 LD_LIBRARY_PATH（指定共享库/动态链接库的搜索路径），以便在容器内运行的应用程序使用CUDA 库的绑定版本。

要使用 --nv 选项在容器内运行 CUDA 应用程序，必须确保:

主机安装了 NVIDIA GPU 驱动程序，并安装了基本 NVIDIA/CUDA 库的匹配版本。主机不需要运行 X server，除非希望从容器中运行图形应用程序。
NVIDIA 库位于系统的库搜索路径中。
容器内的应用程序是为 CUDA 版本和设备能力级别编译的，这是由主机卡和驱动程序支持的。

这些要求通常通过直接从 NVIDIA 网站安装 NVIDIA 驱动程序和 CUDA 软件包来满足。Linux 发行版可能提供NVIDIA 驱动程序和 CUDA 库，但它们通常是过时的，这可能导致运行针对最新版本 CUDA 编译的应用程序出现问题。

SingularityCE 将使用配置文件 /etc/singularity/nvliblist.conf 中的库列表找到主机上的 NVIDIA/CUDA 库，并通过 ldconfig 缓存解析路径。在发布时，此列表适用于最新的稳定 CUDA 版本。管理员可以对其进行修改，以便在必要时添加其他库。

9.3 配置 tensorflow-gpu

Tensorflow 通常用于机器学习项目，但很难在旧系统上安装，并且经常更新。从容器中运行 tensorflow 可以消除安装问题，并使尝试新版本变得容易。

Docker Hub 上的官方 tensorflow 存储库包含 NVIDIA GPU 支持容器，将使用 CUDA 进行处理。我们可以在Docker Hub 的标签页面上查看可用的版本。

容器很大，所以最好在开始使用它之前构建或将 docker 映镜像到 SIF 中:

$ singularity build --sandbox tensorflow_2.12.0 docker://tensorflow/tensorflow:2.12.0-gpu

然后运行支持 GPU 的容器：

$ singularity run --nv tensorflow_2.12.0

________                               _______________
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1002 and group 1002,
which should map to the ID and group for your user on the Docker host. Great!

/sbin/ldconfig.real: Can't create temporary cache file /etc/ld.so.cache~: Read-only file system

我们可以通过使用 tensorflow list_local_devices() 函数来验证 GPU 是否在容器中可用：

Singularity> python
Python 3.8.10 (default, May 26 2023, 14:05:08) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.client import device_lib
2023-08-07 15:10:24.927897: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> print(device_lib.list_local_devices())
2023-08-07 15:10:43.407577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /device:GPU:0 with 22136 MB memory:  -> device: 0, name: Tesla P40, pci bus id: 0000:0b:00.0, compute capability: 6.1
2023-08-07 15:10:43.408850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /device:GPU:1 with 23668 MB memory:  -> device: 1, name: Tesla P40, pci bus id: 0000:84:00.0, compute capability: 6.1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11406739320947338039
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 23211278336
locality {
  bus_id: 1
  links {
  }
}
incarnation: 447019799153434400
physical_device_desc: "device: 0, name: Tesla P40, pci bus id: 0000:0b:00.0, compute capability: 6.1"
xla_global_id: 416903419
, name: "/device:GPU:1"
device_type: "GPU"
memory_limit: 24818548736
locality {
  bus_id: 2
  numa_node: 1
  links {
  }
}
incarnation: 12753825214935049399
physical_device_desc: "device: 1, name: Tesla P40, pci bus id: 0000:84:00.0, compute capability: 6.1"
xla_global_id: 2144165316
]

默认情况下，SingularityCE --nv 选项将确保主机上的所有 nvidia 设备都存在于容器中。

这种行为与 nvidia-docker 不同，在 nvidia-docker 中，使用 NVIDIA_VISIBLE_DEVICES 环境变量来控制是否部分或全部主机 GPU 在容器中可见。nvidia-container-runtime 根据 NVIDIA_VISIBLE_DEVICES 的值显式地将设备绑定到容器中。

要控制在 SingularityCE 容器中使用的 GPU，可以在运行容器之前设置 SINGULARITYENV_CUDA_VISIBLE_DEVICES，或者在容器内设置 CUDA_VISIBLE_DEVICES。这个变量将限制CUDA 程序检测到的 GPU 设备。

例如，运行 tensorflow 容器，但只使用主机中的第一个 GPU，我们可以这样做：

$ SINGULARITYENV_CUDA_VISIBLE_DEVICES=0 singularity run --nv tensorflow_latest-gpu.sif
# or
$ export SINGULARITYENV_CUDA_VISIBLE_DEVICES=0
$ singularity run --nv tensorflow_latest-gpu.sif

9.4 创建 sandbox

$ singularity shell -w --nv --no-home -B /mnt tensorflow_2.12.0
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /usr/bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
WARNING: Skipping mount /var/run/nvidia-persistenced/socket [files]: /var/run/nvidia-persistenced/socket doesn't exist in container

简单处理警告信息：

Singularity> touch /usr/bin/nvidia-smi /usr/bin/nvidia-debugdump /usr/bin/nvidia-persistenced /usr/bin/nvidia-cuda-mps-control /usr/bin/nvidia-cuda-mps-server
Singularity> mkdir /var/run/nvidia-persistenced
Singularity> touch /var/run/nvidia-persistenced/socket
exit
$ singularity shell -w --nv --no-home -B /mnt tensorflow_2.12.0
Singularity> nvidia-smi

运行成功后在容器中可以检测到显卡信息：

9.5 配置 R Keras

安装 R

apt update
apt install wget language-pack-en dirmngr gnupg apt-transport-https ca-certificates software-properties-common
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/'
apt install r-base
R --version
apt install build-essential

参考：https://tensorflow.rstudio.com/install/custom

TensorFlow 的一些组件 (例如 Keras 库) 依赖于额外的 Python 包。如果我们通过 R 来安装 tensorflow，install_keras() 函数会自动安装这些依赖项。然而，由于我们是利用 R 直接调用 python 中的 tensorflow，所以应确保先在 python 中手动安装额外的依赖项：

$ pip install tensorflow-hub tensorflow-datasets scipy requests Pillow h5py pandas pydot

安装 R tensorflow 和 keras 包

Singularity> LC_ALL=C.UTF-8 R

在 R 环境下运行：

install.packages("tensorflow")
install.packages("keras")
install.packages("ggplot2")

定位 TensorFlow 所依赖的 python 环境：

Singularity> which python
/usr/bin/python

在 R 中定位 TensorFlow：

一旦安装了 TensorFlow，我们需要确保 R 的 TensorFlow 包可以找到已经安装在容器 python 环境中的 TensorFlow。R 的 TensorFlow 包会扫描系统中各种版本的 Python，还会扫描可用的虚拟环境和 conda 环境，所以在很多情况下，不需要我们进行额外的操作。然而，我们也可以通过指定 RETICULATE_PYTHON 环境变量来强制在特定的 Python 环境中进行探测：

Sys.setenv(RETICULATE_PYTHON="/usr/bin/python")
library(tensorflow)

测试 TensorFlow 是否可以使用本地 GPU：

tf$config$list_physical_devices("GPU")

测试是否能在 GPU 上成功运行以下脚本：

$ singularity build keras.sif tensorflow_2.12.0
$ singularity exec --nv -B /mnt keras.sif Rscript /mnt/data1/Customers/LongZhengbiao/Container/test_container.R

test_container.R 脚本内容如下：

#Sys.setenv(RETICULATE_PYTHON="/usr/bin/python")

library(keras)

mnist <- dataset_mnist()
train_images <- mnist$train$x
train_labels <- mnist$train$y
test_images <- mnist$test$x
test_labels <- mnist$test$y

network <- keras_model_sequential() %>% 
  layer_dense(units = 512, activation = "relu", input_shape = c(28 * 28)) %>%
  layer_dense(units = 10, activation = "softmax")

network %>% compile(
  optimizer = "rmsprop",
  loss = "categorical_crossentropy",
  metrics = c("accuracy")
)

train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images / 255
test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images / 255

train_labels <- to_categorical(train_labels)
test_labels <- to_categorical(test_labels)

history <- network %>% fit(train_images, train_labels, epochs = 5, batch_size = 128)

metrics <- network %>% evaluate(test_images, test_labels, verbose = 0)
metrics

pdf("test_container.pdf")
plot(history)
dev.off()

运行成功后输出：

9.6 Troubleshooting

【ERROR】E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: UNKNOWN ERROR (34)

【解决方法】检查调用容器时是否使用了 -nv 选项，这将设置容器的环境，以使用 NVIDIA GPU 和基本 CUDA 库来运行支持 CUDA 的应用程序。

【ERROR】CUDA_ERROR_UNKNOWN

【解决方法】CUDA 依赖于加载的多个内核模块。并非所有模块都在系统启动时加载。如果我们在容器中遇到 CUDA_ERROR_UNKNOWN，请首先在主机上初始化驱动程序堆栈：以 root 身份运行 modprobe nvidia_uvm，并使用 nvidia-persistenced 来避免驱动程序卸载。

【ERROR】nv files may not be bound with --writable

[root@gpu-test01 Container]# singularity shell -w --no-home --nv tensorflow_2.12.0
WARNING: nv files may not be bound with --writable
WARNING: Skipping mount /usr/bin/nvidia-smi [files]: /usr/bin/nvidia-smi doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-debugdump [files]: /usr/bin/nvidia-debugdump doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-persistenced [files]: /usr/bin/nvidia-persistenced doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-control [files]: /usr/bin/nvidia-cuda-mps-control doesn't exist in container
WARNING: Skipping mount /usr/bin/nvidia-cuda-mps-server [files]: /usr/bin/nvidia-cuda-mps-server doesn't exist in container
Singularity> nvidia-smi
bash: nvidia-smi: command not found
Singularity> exit
exit

【解决方法】如果尝试使用沙箱文件，它们对 Nvidia 容器不是很有用，因为 NVidia 容器不能与可写磁盘一起工作，所以通常最好使用更方便的 sif 格式。我们也可根据警告信息在容器中添加相对应的文件（在当前示例中还未遇到问题）：

Singularity> touch /usr/bin/nvidia-smi /usr/bin/nvidia-debugdump /usr/bin/nvidia-persistenced /usr/bin/nvidia-cuda-mps-control /usr/bin/nvidia-cuda-mps-server
Singularity> mkdir /var/run/nvidia-persistenced
Singularity> touch /var/run/nvidia-persistenced/socket

【ERROR】在 R 或 Rscript 时出现：During startup - Warning messages: 1: Setting LC_CTYPE failed, using "C"

During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C"

【解决方法】这些警告消息表示在你的 R 会话中设置了不同的区域设置（locale），但是设置失败，因此使用了默认的 "C" 区域设置。

先安装所需的语言包：

apt-get install language-pack-en

在 ~/.bash_profile 文件中添加环境变量：

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

【ERROR】modprobe: ERROR: could not insert 'nvidia_uvm': Required key not available

【解决方法】尝试使用 modprobe nvidia-uvm 加载 nvidia-uvm。

【ERROR】Could not load dynamic library 'libnvinfer.so.7' 【ERROR】Could not load dynamic library 'libnvinfer_plugin.so.7'

【解决方法】参考：https://stackoverflow.com/questions/74956134/could-not-load-dynamic-library-libnvinfer-so-7

【ERROR】TF-TRT Warning: Cannot dlopen some TensorRT libraries.

【解决方法】参考：https://stackoverflow.com/questions/74956134/could-not-load-dynamic-library-libnvinfer-so-7

pip install tensorrt

【ERROR】Couldn't open CUDA library libcuda.so.1

【解决方法】参考：https://stackoverflow.com/questions/41890549/tensorflow-cannot-open-libcuda-so-1

扫码关注微信公众号【生信F3】获取文章完整内容，分享生物信息学最新知识。 ShengXinF3_QRcode

本文由 mdnice 多平台发布

ShengXinF3

关注

16
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
Singularity（九）| 配置深度学习容器

例如，nvcc 是 CUDA 的编译器，用于将 CUDA C/C++ 代码编译为可在 GPU 上执行的二进制代码。R 的 TensorFlow 包会扫描系统中各种版本的 Python，还会扫描可用的虚拟环境和 conda 环境，所以在很多情况下，不需要我们进行额外的操作。要控制在 SingularityCE 容器中使用的 GPU，可以在运行容器之前设置 SINGULARITYENV_CUDA_VISIBLE_DEVICES，或者在容器内设置 CUDA_VISIBLE_DEVICES。
复制链接

扫一扫