刚换了台笔记本电脑 ,显卡为RTX3050ti,在用TensorFlow-gpu训练模型时,一直报错,踩了大坑,特在此写下我的解决方法。因我在解决gpu运行不了时尝试过更换cuda版本,更换cudnn版本,更换tensorflow-gpu与keras版本,故报的错误也是乱七八糟。
报的错误一:
2021-08-09 21:04:53.637764: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2021-08-09 21:04:58.598447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2021-08-09 21:17:47.603456: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2021-08-09 21:17:47.675868: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2021-08-09 21:17:47.676730: I tensorflow/stream_executor/stream.cc:4963] [stream=000001774007A1F0,impl=00000177393F7250] did not memzero GPU location; source: 000000726209DF28
2021-08-09 21:17:47.676867: I tensorflow/stream_executor/stream.cc:316] did not allocate timer: 000000726209DED0
2021-08-09 21:17:47.676954: I tensorflow/stream_executor/stream.cc:1964] [stream=000001774007A1F0,impl=00000177393F7250] did not enqueue 'start timer': 000000726209DED0
2021-08-09 21:17:47.677084: I tensorflow/stream_executor/stream.cc:1976] [stream=000001774007A1F0,impl=00000177393F7250] did not enqueue 'stop timer': 000000726209DED0
2021-08-09 21:17:47.677201: F tensorflow/stream_executor/gpu/gpu_timer.cc:65] Check failed: start_event_ != nullptr && stop_event_ != nullptr
错误二:
错误三:
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
错误四:
CuDNN library: 7.4.1 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration
其实这些错误都是因为一个问题造成的:我的电脑显卡是3050ti的,属于30系,只能安装cuda11版本以上 的,故我重新安装了cuda11.3.1版本和对应的cudnn8.2.0版本,问题得以解决(我最早安装的是cudnn8.2.1则报错,我也很懵逼,后来改到cuda10及对应的cudnn还是不行)