Windows 安装配置tensorflow-gpu2.6、cuda、cudnn的一些问题及解决办法

Cam.W.

已于 2024-08-30 09:39:35 修改

阅读量1.1k

点赞数 17

文章标签： windows tensorflow 人工智能深度学习 python

于 2024-08-29 19:55:13 首次发布

本文链接：https://blog.csdn.net/weixin_43938638/article/details/141681910

版权

前言

tensorflow-gpu、cuda、cudnn的版本一定要对应！！！

详见：在 Windows 环境中从源代码构建 | TensorFlow (google.cn)

1.配置环境

我的配置环境：windows11(家庭版，x64) + tensorflow-gpu2.6.0 + cudatoolkit11.2.0 + cudnn8.1.0.77 + Anaconda3(2024.06_1) + pycharm 2023.3.2 + RTX4050显卡

2.安装步骤

1.安装顺序

安装pycharm
安装Anaconda3
管理员方式打开Anaconda promt，通过conda或pip命令安装接下来步骤中的安装包
通过conda命令安装cudatoolkit
通过conda命令安装cudnn
通过pip命令安装tensorflow-gpu

2.安装参考链接

按照上述顺序安装，即可完成环境配置。我参考的博文如下：

Anaconda3安装：1.Windows下的Anaconda详细安装教程_windows安装anaconda-CSDN博客

2. 还是搞不懂Anaconda是什么?读这一篇文章就够了-CSDN博客（辅助参考）

步骤3~6安装：十分钟安装Tensorflow-gpu2.6.0+本机CUDA12 以及numpy+matplotlib各包版本协调问题_tensorflow cuda12-CSDN博客

注意点：

1.安装Anaconda3第一个参考链接中的“conda默认虚拟环境路径修改”，我的是使用方法1生效（方法2也可以）：

2.配置tensorflow-gpu、cuda、cudnn时，按照参考博文步骤，最终输入conda list命令所显示的安装包版本会与博主所展示的些许差别，可以先不用着急改，我安装完后 list 如下：

3.测试验证

1.TensorFlow-gpu测试

测试代码如下：

import tensorflow as tf

"""
    tensorflow-gpu-2.6.0 test
"""
print(tf.__version__)
print(tf.test.gpu_device_name())
print(tf.config.experimental.set_visible_devices)
print('GPU:', tf.config.list_physical_devices('GPU'))
print('CPU:', tf.config.list_physical_devices(device_type='CPU'))
print(tf.config.list_physical_devices('GPU'))
print(tf.test.is_gpu_available())
# 输出可用的GPU数量
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
# 查询GPU设备

得到报错结果如下：显示“Could not load dynamic library ‘cudart64_110.dll‘； dlerror: cudart64_110.dll”等不能加载动态库的错误。

在网上搜了一大堆基本都是缺什么dll文件，就寻找下载对应的的dll文件，然后放在C:\Windows\System路径下。这是治标不治本的方法。仔细一想，我的tensorflow-gpu、cuda、cudnn的版本都是按照tensorflow官网建议来的，应该不会错。于是，先查看了我的conda虚拟环境下的是否有这些缺失的dll文件，我创建的路径是D:\ProgramData\Data\Anaconda_envs\envs\tf-gpu-2.6\Library\bin：

报错不能加载的dll文件都在，怀疑是系统找不到该路径，于是打开系统环境，找到path，果然，没有添加。添加后，再次运行测试代码，结果如下：

得到上述结果，测试通过，则说明TensorFlow-gpu环境配置初步成功。Congratulations！！！

2. 调用cuda加速测试

测试代码如下：

import cv2
import tensorflow as tf
from mtcnn import MTCNN

config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)

detector = MTCNN()

img = cv2.imread("Lena3.png")

output = detector.detect_faces(img)

print(output)

得到报错信息如下，主要是

错误1. Call to CreateProcess failed. Error code: 2

告警2. Couldn't get ptxas version string: Internal: Couldn't invoke ptxas.exe --version

告警3. Internal: Failed to launch ptxas

D:\ProgramData\Data\Anaconda_envs\envs\tf-gpu-2.6\python.exe D:\Codes\Python\tf_gpu_test\cuda_test.py 
2024-08-29 17:45:39.732544: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 17:45:40.595161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 17:45:40.618800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 17:45:40.827761: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2024-08-29 17:45:41.754506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2024-08-29 17:45:42.639977: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.641084: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.641227: W tensorflow/stream_executor/gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Couldn't invoke ptxas.exe --version
2024-08-29 17:45:42.648187: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2024-08-29 17:45:42.648554: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
2024-08-29 17:45:45.448664: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[{'box': [202, 182, 158, 210], 'confidence': 0.9971387386322021, 'keypoints': {'left_eye': (267, 265), 'right_eye': (336, 267), 'nose': (316, 317), 'mouth_left': (265, 348), 'mouth_right': (322, 351)}}]

Process finished with exit code 0

对于错误1 Call to CreateProcess failed，该博主的方法1：运行 conda install -c nvidia cuda-nvcc 来确保ptxas在conda环境中 (因为我的cudatoolkit是通过在conda虚拟环境中安装的，没有通过从官网下载安装包的方式，因此方法2不适合我)。完整命令如下：

conda install -c nvidia cuda-nvcc

试了，还是报错，和上面的报错信息一模一样，没有任何改善....... （别急着卸载cuda-nvcc）

通过在电脑中搜索 ptxas.exe ，发现它就在我的conda虚拟环境路径下：

于是老方法，将该路径添加到系统环境变量的path中。关闭pycharm，重新打开，运行测试代码，上述错误和警告均解决，nice!!! 正确结果如下：

D:\ProgramData\Data\Anaconda_envs\envs\tf-gpu-2.6\python.exe D:\Codes\Python\tf_gpu_test\cuda_test.py 
2024-08-29 19:11:11.700032: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 19:11:14.105552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 19:11:14.129957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3684 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4050 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.9
2024-08-29 19:11:14.347828: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2024-08-29 19:11:15.332231: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8100
2024-08-29 19:11:21.279191: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
[{'box': [202, 182, 158, 210], 'confidence': 0.9971387386322021, 'keypoints': {'left_eye': (267, 265), 'right_eye': (336, 267), 'nose': (316, 317), 'mouth_left': (265, 348), 'mouth_right': (322, 351)}}]

Process finished with exit code 0

3.人脸识别代码测试

下面跑一段人脸识别代码测试：

import cv2
import tensorflow as tf
from mtcnn import MTCNN

config = tf.compat.v1.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.6
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)

detector = MTCNN()
img = cv2.imread('Lena3.png')
output = detector.detect_faces(img)
print(output)

x,y,width,height = output[0]['box']
left_eye_X,left_eye_Y = output[0]['keypoints']['left_eye']
right_eye_X,right_eye_Y = output[0]['keypoints']['right_eye']
nose_X,nose_Y = output[0]['keypoints']['nose']
mouth_left_X,mouth_left_Y = output[0]['keypoints']['mouth_left']
mouth_right_X,mouth_right_Y = output[0]['keypoints']['mouth_right']

# opencv的三色顺序为BGR (Blue,Green,Red)
cv2.rectangle(img,pt1=(x,y),pt2=(x+width,y+height),color=(0,255,0),thickness=2)
cv2.circle(img,center=(left_eye_X,left_eye_Y),color=(0,0,255),thickness=2,radius=1)
cv2.circle(img,center=(right_eye_X,right_eye_Y),color=(0,0,255),thickness=2,radius=1)
cv2.circle(img,center=(nose_X,nose_Y),color=(255,0,0),thickness=2,radius=1)
cv2.circle(img,center=(mouth_left_X,mouth_left_Y),color=(255,0,0),thickness=2,radius=1)
cv2.circle(img,center=(mouth_right_X,mouth_right_Y),color=(255,0,0),thickness=2,radius=1)
cv2.imshow('result',img)

cv2.waitKey(0)

结果如下：