ViViT——面部视频压力识别的实践(14)

天知道cuDNN, cuFFT, and cuBLAS Errors · Issue #62075 · tensorflow/tensorflow · GitHub我到底要听谁的话

Step 1:Debug oneDNN

import os
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
import tensorflow as tf

 在import tensorflow之前将'TF_ENABLE_ONEDNN_OPTS'设置为0可以解决第一个warning(但是谁在运行程序之前,每次还要这样设置啊???不现实,侧面反映这不是什么大问题,忽略)

Anyway,note 一下(oneAPI Deep Neural Network Library (oneDNN))

以及我真的需要把清除pip缓存的语句背下来,为什么我背不下来!

pip cache purge

 就三个词!

再参考一下WSL2 - TensorFlow Install Issue Unable to register cuDNN factory

TensorFlow 🤝Conda🤝NVIDIA GPU on Ubuntu

所以tensorflow2.14.0版本及以上就是不用装cudatoolkit和cudnn,那就不是版本不兼容的问题了,因为我根本就不用装啊!!!

pip install tensorflow[and-cuda]

一个语句就能完事儿!虽然会有报错,并且根本用不了!!! 

但是已经成功解决报错的博主告诉我就是不能装tensorflow2.14.0及以上的,得装2,13.0的版本,我再create 一个env,step by step follow 她的步骤看看问题能不能得到解决,也就是两个🤝的blog

Step 2:🤝🤝

按照conda install -c conda-forge cudatoolkit=11.8 cudnn=8.8装的话后续pip用不了,不知道为什么

会报以下的错,然后怎么pip install tensorflow==2.13.0都装不上

<frozen graalpy.pip_hook>:48: RuntimeWarning: You are using an untested version of pip. GraalPy provides patches and workarounds for a number of packages when used with compatible pip versions. We recommend to stick with the pip version that ships with this version of GraalPy.

那我就自己装

 参看https://www.tensorflow.org/install/source#gpu

(vivit-env) dddcyy@dddcyy6100846:~$ conda search cudatoolkit
Loading channels: done
# Name                       Version           Build  Channel
cudatoolkit                      9.0      h13b8566_0  pkgs/main
cudatoolkit                      9.2               0  pkgs/main
cudatoolkit                 10.0.130               0  pkgs/main
cudatoolkit                 10.1.168               0  pkgs/main
cudatoolkit                 10.1.243      h6bb024c_0  pkgs/main
cudatoolkit                  10.2.89      hfd86e86_0  pkgs/main
cudatoolkit                  10.2.89      hfd86e86_1  pkgs/main
cudatoolkit                 11.0.221      h6bb024c_0  pkgs/main
cudatoolkit                   11.3.1      h2bc3f7f_2  pkgs/main
cudatoolkit                   11.8.0      h6a678d5_0  pkgs/main

(vivit-env) dddcyy@dddcyy6100846:~$ conda search cudnn
Loading channels: done
# Name                       Version           Build  Channel
cudnn                          7.0.5       cuda8.0_0  pkgs/main
cudnn                          7.1.2       cuda9.0_0  pkgs/main
cudnn                          7.1.3       cuda8.0_0  pkgs/main
cudnn                          7.2.1       cuda9.2_0  pkgs/main
cudnn                          7.3.1      cuda10.0_0  pkgs/main
cudnn                          7.3.1       cuda9.0_0  pkgs/main
cudnn                          7.3.1       cuda9.2_0  pkgs/main
cudnn                          7.6.0      cuda10.0_0  pkgs/main
cudnn                          7.6.0      cuda10.1_0  pkgs/main
cudnn                          7.6.0       cuda9.0_0  pkgs/main
cudnn                          7.6.0       cuda9.2_0  pkgs/main
cudnn                          7.6.4      cuda10.0_0  pkgs/main
cudnn                          7.6.4      cuda10.1_0  pkgs/main
cudnn                          7.6.4       cuda9.0_0  pkgs/main
cudnn                          7.6.4       cuda9.2_0  pkgs/main
cudnn                          7.6.5      cuda10.0_0  pkgs/main
cudnn                          7.6.5      cuda10.1_0  pkgs/main
cudnn                          7.6.5      cuda10.2_0  pkgs/main
cudnn                          7.6.5       cuda9.0_0  pkgs/main
cudnn                          7.6.5       cuda9.2_0  pkgs/main
cudnn                          8.2.1      cuda11.3_0  pkgs/main
cudnn                       8.9.2.26        cuda11_0  pkgs/main
cudnn                       8.9.2.26        cuda12_0  pkgs/main
cudnn                       9.1.1.17        cuda12_0  pkgs/main

那就装一个cudatoolkit==11.8.0 & cudnn==8.9.2.26(for cuda11_0) 

conda create -n vivit-env python=3.10
conda activate vivit-env
conda install cudatoolkit==11.8.0
conda install cudnn==8.9.2.26

mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
conda deactivate

echo $CONDA_PREFIX
:/home/dddcyy/miniconda3/envs/vivit-env
echo $LD_LIBRARY_PATH
:/home/dddcyy/miniconda3/envs/vivit-env/lib/

conda activate vivit-env
pip install tensorflow==2.13

你以为就完了吗?没有!会报错说你没有cuda driver,我????从来都没给我报过这个错过,我懵了,电脑上怎么可能没有呢?然后stack overflow上让我装TensorRT。好,我装,刚好🤝🤝里面也有!!!

pip install tensorrt==8.5.3.1
TENSORRT_PATH=$(dirname $(python -c "import tensorrt;print(tensorrt.__file__)"))
echo $TENSORRT_PATH
:/home/dddcyy/miniconda3/envs/vivit-env/lib/python3.10/site-packages/tensorrt
#linking tensorrt library files to LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dddcyy/miniconda3/envs/vivit-env/lib/python3.10/site-packages/tensorrt
conda deactivate

装了tensorrt之后才会给我装nvidia前缀的几个包,看来得装啊

Installing collected packages: nvidia-cuda-runtime-cu11, nvidia-cublas-cu11, nvidia-cudnn-cu11, tensorrt
Successfully installed nvidia-cublas-cu11-11.11.3.6 nvidia-cuda-runtime-cu11-11.8.89 nvidia-cudnn-cu11-9.6.0.74 tensorrt-8.5.3.1

所以大功告成了吗?虽然我先装的tensorflow再装的tensorrt,不会这个顺序也会妨碍我吧??? 结果表明,哪怕我重装了tensorflow还是会有报错,这回就给我换着法儿报错。虽然解决了三个unable,但是接踵而来的报错似乎也不可小觑哈哈哈哈哈哈

(vivit-env) dddcyy@dddcyy6100846:~$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2025-01-13 12:57:24.373915: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-13 12:57:24.546476: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-13 12:57:26.321954: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-01-13 12:57:26.343778: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2025-01-13 12:57:26.343840: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

忽略oneDNN,然后是三个NUMA的报错,我之前还庆幸我没有,哈哈哈,终于NUMA也轮到我头上来了。该来的还是会来。不过🤝🤝也有,还好还好,我差点心肌梗塞而死。

Non-Uniform Memory Access (NUMA)

参看🤝🤝里提到的Fixing NUMA problem这篇Blog,应该还是能解决的

Step 3 : Fixing NUMA problems

我很好奇,为什么github上面的回答不管事,最后帮我解决问题的是medium????medium网站现在已经成为新的曙光了吗???

lspci | grep -i nvidia

第一步我运行不了,我可以放弃吗?我想放弃

第二步,我只有三个欸

ls /sys/bus/pci/devices/
490b:00:00.0  5582:00:00.0  92ab:00:00.0

以及我真的就只有三个文件,下一步也没办法进行啊 

我根本没有0000:01:00.0/numa_node这个东西啊!!!不知道要怎么解决这个NUMA的问题,但是也许可以不解决吗?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值