FreeBSD也能跑cuda AI训练拉!
在FreeBSD安装好pytorch和飞桨cpu版本后,尝试安装英伟达nvidia p4计算卡驱动。毕竟全靠cpu速度太慢了,还是GPU快啊!在磕磕绊绊几天后,终于成功成功安装好nvidia p4的cuda驱动,pytorch成功运行,FreeBSD也能跑cuda AI训练拉!
参考:https://github.com/verm/freebsd-stable-diffusion
安装前置技能
需要FreeBSD下的Linux兼容执行经验,具体参考FreeBSD官网:Chapter 12. Linux Binary Compatibility | FreeBSD Documentation Portal
简单来说就是要执行下面这句:
pkg install linux_base-c7
pkg install linux-c7-devtools
安装miniconda,参考安装Miniconda@FreeBSD13-CSDN博客
FreeBSD跑起来GPU训练坎坷的过程
尝试安装cuda470版本
首先FreeBSD下已经有nvidia的驱动,查找nvidia驱动
pkg search nvidia
libva-nvidia-driver-0.0.11 NVDEC-based backend for VAAPI
linux-nvidia-libs-550.54.14 NVidia graphics libraries and programs (Linux version)
linux-nvidia-libs-304-304.137 NVidia graphics libraries and programs (Linux version)
linux-nvidia-libs-340-340.108 NVidia graphics libraries and programs (Linux version)
linux-nvidia-libs-390-390.154 NVidia graphics libraries and programs (Linux version)
linux-nvidia-libs-470-470.161.03 NVidia graphics libraries and programs (Linux version)
nvidia-driver-550.54.14 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-driver-304-304.137_10 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-driver-340-340.108_4 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-driver-390-390.154_1 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-driver-470-470.161.03_1 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-drm-510-kmod-550.54.14_1 NVIDIA DRM Kernel Module
nvidia-drm-515-kmod-550.54.14_1 NVIDIA DRM Kernel Module
nvidia-drm-kmod-550.54.14 NVIDIA DRM Kernel Module
nvidia-hybrid-graphics-0.6 NVIDIA secondary GPU configuration - Optimus Technology support
nvidia-hybrid-graphics-390-0.6 NVIDIA secondary GPU configuration - Optimus Technology support
nvidia-secondary-driver-550.54.14_1 NVidia graphics card binary drivers for hardware OpenGL rendering on secondary device
nvidia-secondary-driver-390-390.154_1 NVidia graphics card binary drivers for hardware OpenGL rendering on secondary device
nvidia-settings-535.146.02_1 Display Control Panel for X NVidia driver
nvidia-texture-tools-2.1.2 Texture Tools with support for DirectX 10 texture formats
nvidia-xconfig-525.116.04 Tool to manipulate X configuration files for the NVidia driver
nvidia_gpu_prometheus_exporter-g20181028_19 NVIDIA GPU Prometheus exporter
这里我们先选470版本,550版本好像有问题,在linux下也没装成功。大约525或535版本应该也可以,但是FreeBSD没有现成的。
安装驱动
pkg install nvidia-driver-470
但是FreeBSD下直接运行nvidia-smi是看不到cuda版本的:
nvidia-smi
Thu May 2 17:17:58 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:13:00.0 Off | 0 |
| N/A 48C P0 23W / 75W | 0MiB / 7611MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
需要使用nv-sglrun 来运行,安装pkg install libc6-shim来获得nv-sglrun命令:
pkg install libc6-shim
安装linux-nvidia-libs-470库
pkg install linux-nvidia-libs-470
查看驱动:
nv-sglrun nvidia-smi
现在驱动就正常了
nv-sglrun nvidia-smi
shim init
Thu May 2 17:48:27 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:13:00.0 Off | 0 |
| N/A 48C P0 23W / 75W | 0MiB / 7611MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
原来是cuda是11.4,我原来记错了啊,需要安装11.4的飞桨或者pytorch
安装飞桨2.6.1 cuda11.2版本
conda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.2 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
测试飞桨
python -c "import paddle; paddle.utils.run_check()"
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0502 18:13:53.988302 1225 default_variables.cpp:433] Fail to open /proc/self/io: No such file or directory [2]
/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/framework.py:688: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.
warnings.warn(
Running verify PaddlePaddle program ...
[2024-05-02 18:13:55,300] [ WARNING] install_check.py:60 - You are using GPU version PaddlePaddle, but there is no GPU detected on your machine. Maybe CUDA devices is not set properly.
Original Error is
I0502 18:13:55.313150 1225 program_interpreter.cc:212] New Executor is Running.
I0502 18:13:55.346310 1225 interpreter_util.cc:624] Standalone Executor is Used.
PaddlePaddle works well on 1 CPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
没有启动GPU,还需努力!
换成11.6版本cuda飞桨试试
conda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.6 -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/Paddle/ -c conda-forge
发现问题了,原来飞桨的安装版本,cuda11.4的用的是11.7飞桨2.6.1 py310_gpu_cuda11.7_many_linux, cuda11.6用的conda install paddlepaddle-gpu==2.6.1 cudatoolkit=11.2 。
这里乌龙了,飞桨官网并没有放错版本,是本人看到xx 被替换 BB,就以为当前的是xx。
这样当然换了版本一样的报错。
加上变量:export LD_PRELOAD="/home/skywalk/work/dummy-uvm.so"
然后报错
W0502 18:46:41.397257 1347 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
W0502 18:46:41.397300 1347 gpu_resources.cc:196] WARNING: device: 0. The installed Paddle is compiled with CUDA 11.6, but CUDA runtime version in your machine is 11.4, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
ExternalError: CUDA error(46), all CUDA-capable devices are busy or unavailable.
也就是不加变量是看不到GPU的。
尝试安装cuda550版本
pkg install nvidia-driver-550.54.14
pkg install linux-nvidia-libs-550.54.14
安装完之后重启,可以看到是12.4版本
nv-sglrun nvidia-smi
shim init
Thu May 2 19:52:25 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
飞桨官网没有conda 对12.4版本的支持,只能用pip安装:
python -m pip install paddlepaddle-gpu==2.6.1.post120 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
装了之后报错。
尝试安装pytorch,但是conda安装太慢,只好放弃了。其实如果这里是国外,那么conda应该能安好装550的对应torch版本环境。但是国内conda的 -c nvidia太慢,以至于报错,先放弃了。
中间还安装了nvidia-driver-390,但是它完全不显示cuda版本,导致根本没法找相应的飞桨或torch,只好放弃了。
怒了,装cuda535版本,成功!
执行命令,里面可能还把amd gpu的驱动也安装了,就不管那么多了,装上:
pkg install nvidia-driver-535.146.02 linux-nvidia-libs-535.146.02 libva-nvidia-driver nvidia-drm-kmod-535.146.02
驱动安装成功:
nv-sglrun nvidia-smi
shim init
Thu May 2 20:46:41 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
我的天,老天爷,这是杂的了,竟然成功了!
(pytorch) [skywalk@fb14 ~]$ export LD_PRELOAD="/home/skywalk/work/dummy-uvm.so"
(pytorch) [skywalk@fb14 ~]$ python3 -c 'import torch; print(torch.cuda.is_available())'
True
(pytorch) [skywalk@fb14 ~]$ python3 -c "import torch ; print(torch.randn((2,3),device='cuda'))"
tensor([[-0.6359, 0.0748, 1.7495],
[ 2.2609, 0.0373, -0.1241]], device='cuda:0')
太牛了,FreeBSD下安装英伟达nvidai驱动,成功跑起来pytorch拉!
飞桨还是没过,不过不重要了,这两个有一个跑起来就行!
使用fastai测试
LD_PRELOAD="/home/skywalk/work/dummy-uvm.so" python3 testai.py
测试代码testai.py:
from fastai.text.all import *
path = untar_data(URLs.IMDB)
path.ls()
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
learn.show_results()
温度上来了:
0 Tesla P4 Off | 00000000:13:00.0 Off | 0 |
| N/A 82C P0 69W / 75W | 1852MiB / 7680MiB | 98% Default |
感觉速度比cpu快了10倍以上。不过后面报错,不知道是fastai的代码对显存要求较高,毕竟P4只有8G显存。
FreeBSD跑CUDA总结:
FreeBSD,要用nvidia 535驱动:
pkg install nvidia-driver-535.146.02 linux-nvidia-libs-535.146.02 libva-nvidia-driver nvidia-drm-kmod-535.146.02
pytorch,要用12.1cuda版本:
pip3 install torch torchvision torchaudio
FreeBSD Linux虚拟那块需要的库:
pkg install linux_base-c7
pkg install linux-c7-devtools
pkg install libc6-shim
FreeBSD下还需要一个库:LD_PRELOAD="/home/skywalk/work/dummy-uvm.so"
这个文件是这样下载并编译的:
# 下载
fetch https://gist.githubusercontent.com/shkhln/40ef290463e78fb2b0000c60f4ad797e/raw/f640983249607e38af405c95c457ce4afc85c608/uvm_ioctl_override.c
# 编译
/compat/linux/bin/cc --sysroot=/compat/linux -m64 -std=c99 -Wall -ldl -fPIC -shared -o dummy-uvm.so uvm_ioctl_override.c
编译好的文件 dummy-uvm.so放在~/work 目录,后面要用到。
安装了miniconda3,并创建了pytorch这个虚拟python env环境,每次进入venv环境用命令:
source miniconda3/etc/profile.d/conda.sh
conda activate pytorch
执行要前面加上这句LD_PRELOAD="/home/skywalk/work/dummy-uvm.so" ,也就是LD_PRELOAD="/home/skywalk/work/dummy-uvm.so" python3 xx.py ,例子如下:
LD_PRELOAD="/home/skywalk/work/dummy-uvm.so" python3 -c "import torch ; print(torch.randn((2,3),device='cuda'))"
tensor([[ 0.7900, -0.0157, 0.6979],
[-1.2775, -0.4350, 1.0054]], device='cuda:0')
之所以每次执行的赋值LD_PRELOAD是因为如果直接把LD_PRELOAD通过export设给用户环境,会影响整个用户系统,甚至不能执行ls pwd等命令。
调试
nv-sglrun nvidia-smi报错 Failed to initialize NVML: Driver/library version mismatch
安装linux-nvidia-libs-470库解决问题
pkg install linux-nvidia-libs-470
另外每次修改驱动后,需要重启机器。
FreeBSD pkg速度慢的问题
由于频繁更新驱动,需要较快的下载速度,修改文件:
vi /usr/local/etc/pkg/repos/FreeBSD.conf
加入内容:
FreeBSD: {
url: "http://mirrors.ustc.edu.cn/freebsd-pkg/${ABI}/quarterly",
}
pkg 加速完成!
gpu报错,现在至少证明能看到gpu拉!
export LD_PRELOAD="/home/skywalk/work/dummy-uvm.so"
(pytorch) [skywalk@fb14 ~]$ python -c "import paddle; paddle.utils.run_check()"
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0502 18:46:39.726972 1347 default_variables.cpp:433] Fail to open /proc/self/io: No such file or directory [2]
Running verify PaddlePaddle program ...
I0502 18:46:41.369956 1347 program_interpreter.cc:212] New Executor is Running.
W0502 18:46:41.370352 1347 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.4, Runtime API Version: 11.6
W0502 18:46:41.397257 1347 gpu_resources.cc:164] device: 0, cuDNN Version: 8.4.
W0502 18:46:41.397300 1347 gpu_resources.cc:196] WARNING: device: 0. The installed Paddle is compiled with CUDA 11.6, but CUDA runtime version in your machine is 11.4, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/utils/install_check.py", line 273, in run_check
_run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/utils/install_check.py", line 150, in _run_static_single
exe.run(startup_prog)
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/executor.py", line 1746, in run
res = self._run_impl(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/executor.py", line 1952, in _run_impl
ret = new_exe.run(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/executor.py", line 831, in run
tensors = self._new_exe.run(
OSError: In user code:
File "<string>", line 1, in <module>
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/utils/install_check.py", line 273, in run_check
_run_static_single(use_cuda, use_xpu, use_custom, custom_device_name)
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/utils/install_check.py", line 135, in _run_static_single
input, out, weight = _simple_network()
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/utils/install_check.py", line 31, in _simple_network
weight = paddle.create_parameter(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/tensor/creation.py", line 228, in create_parameter
return helper.create_parameter(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/layer_helper_base.py", line 444, in create_parameter
self.startup_program.global_block().create_parameter(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/framework.py", line 4381, in create_parameter
initializer(param, self)
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/nn/initializer/initializer.py", line 40, in __call__
return self.forward(param, block)
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/nn/initializer/constant.py", line 84, in forward
op = block.append_op(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/framework.py", line 4467, in append_op
op = Operator(
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/framework.py", line 3016, in __init__
for frame in traceback.extract_stack():
ExternalError: CUDA error(46), all CUDA-capable devices are busy or unavailable.
[Hint: 'cudaErrorDevicesUnavailable'. This indicates that all CUDA devices are busy or unavailable at the current time. Devices are often busy/unavailable due touse of cudaComputeModeExclusive, cudaComputeModeProhibited or when long running CUDA kernels have filled up the GPU and are blocking new work from starting. They can also be unavailabledue to memory constraints on a device that already has active CUDA work being performed.] (at ../paddle/phi/backends/gpu/cuda/cuda_info.cc:209)
[operator < fill_constant > error]
后面测试了各种版本,好像550版本 470版本都有点问题,所以最后是装了535版本搞定的。而且飞桨也没有搞定,而是torch搞定了。
FreeBSD的pkg被降低版本到1.20.9
安装nvidia 470版本时,pkg被降低了版本, pkg --version
1.20.9
导致无法安装535版本 。
反复用pkg update,不行。进入ports版本重新make:cd /usr/ports/ports-mgmt/pkg && make install ,还是1.20.6版本。
将这个文件改名:/usr/local/etc/pkg/repos # mv FreeBSD.conf FreeBSD.confbak
然后pkg update ,终于升级了pkg的信息了。 然后pkg install pkg, 终于装上了1.21. 2版本。
飞桨的cuda12版本报错
python -c "import paddle; paddle.utils.run_check()"
Error: Can not import paddle core while this file exists: /home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/libpaddle.so
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/__init__.py", line 28, in <module>
from .base import core # noqa: F401
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/__init__.py", line 36, in <module>
from . import core
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/core.py", line 380, in <module>
raise e
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/core.py", line 268, in <module>
from . import libpaddle
ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/paddle/base/libpaddle.so)
确实cuda12版本有这个glibxcc not found问题,看官网github issue里面就有,是个老遗留问题。
尝试安装pytorch是不是能救一下飞桨。结果发现torch的conda安装太慢了...用pip试试:
pip3 install torch torchvision torchaudio
torch不行,
python3 -c 'import torch; print(torch.cuda.is_available())'
/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
飞桨也不行,下面信息有误,以后再修正过来。
python3 -c 'import torch; print(torch.cuda.is_available())'
/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
解决的方法就是不用cuda550,换成cuda535版本,搞定torch!
torch好之后,ls等命令报错
ls
ld-elf.so.1: Shared object "libdl.so.2" not found, required by "dummy-uvm.so"
那就不要把dummy-uvm.so写到全局里面,每次用的时候单独写,如:
LD_PRELOAD="/home/skywalk/work/dummy-uvm.so" python3 xx.py
训练报错CUDA out of memory.
在使用fastai测试样例的时候,大约训练3-4分钟之后 报错:
File "/home/skywalk/miniconda3/envs/pytorch/lib/python3.10/site-packages/fastai/text/models/awdlstm.py", line 86, in forward
masked_embed = self.emb.weight * mask
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 92.00 MiB. GPU
不知道是测试程序超过了8G 显存,还是这个P4卡有问题。