记录一次gpu环境问题解决

开始时nvidia-smi无法连接到gpu,nvidia-smi has failed because it couldn't communicate with the nvidia driver. make sure that the lat。后来各种解决也没解决好,然后莫名其妙的就恢复了。

然后就是:torch.cuda.is_available() return False

!python -m torch.utils.collect_env

Collecting environment information…
/datasdc_3421/asr/ubuntu20.04/espnet/tools/anaconda/envs/yi/lib/python3.7/site-packages/torch/cuda/init.py:52: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
PyTorch version: 1.7.1+cu110
Is debug build: False
CUDA used to build PyTorch: 11.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.7 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.20.5

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla P40
GPU 1: Tesla P40

Nvidia driver version: 450.51.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.3
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.2.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.2.4
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.8
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.1.2
[pip3] torch==1.7.1+cu110
[pip3] torchaudio==0.7.2
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.8.2+cu110
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py37he8ac12f_0
[conda] mkl_fft 1.3.0 py37h54f3939_0
[conda] mkl_random 1.1.1 py37h0573a6f_0
[conda] numpy 1.19.2 py37h54aff64_0
[conda] numpy-base 1.19.2 py37hfa32c7d_0
[conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] pytorch-lightning 1.1.2
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 1.7.1+cu110
[conda] torchaudio 0.7.2 py37 pytorch
[conda] torchaudio 0.7.2
[conda] torchmetrics 0.9.3
[conda] torchvision 0.8.2+cu110
[conda] torchvision 0.8.2 py37_cu102 pytorch

运行torch时报错:UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up env

问题解决:(188条消息) UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up env_qq_409992227的博客-CSDN博客

同时pytorch-lightning更改了tensorboard版本,但并没有导致tensor flow运行出错。

tensorboard               2.10.0                    <pip>
tensorboard-data-server   0.6.1                     <pip>
tensorboard-plugin-wit    1.8.1                     <pip>
tensorflow-estimator      1.13.0                    <pip>
tensorflow-gpu            1.13.1                    <pip>
之前是报警告的,但代码也是平稳运行

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值