在训练DeepSpeech时steps增长速度很慢,使用nvidia-smi命令查看, 发现GPU内存几乎没用,用的全部CPU. 买的是新的显卡RTX3060, 这个显卡采用的最新的ampere架构, CUDA必须是11.x, cuDNN必须是8.x。然而deepspeech 仍用tensorflow==1.15.4,google官方没有对应的cuda, cudnn的版本, 出现下面错误。
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams with these attrs: [dropout=0, seed=4568, num_params=8, input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="lstm", seed2=249]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
后发现nvidia 有自己的tensorflow版本:
Accelerating TensorFlow on NVIDIA A100 GPUs | NVIDIA Developer Blog
安装后解决了问题。
显卡: NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4
pip:
nvidia-cublas 11.2.1.74
nvidia-cuda-cupti 11.1.69
nvidia-cuda-nvcc 11.1.74
nvidia-cuda-nvrtc 11.1.105
nvidia-cuda-runtime 11.1.74
nvidia-cudnn 8.0.4.30
nvidia-cufft 10.3.0.74
nvidia-curand 10.2.2.74
nvidia-cusolver 11.0.0.74
nvidia-cusparse 11.2.0.275
nvidia-dali-cuda110 0.27.0
nvidia-dali-nvtf-plugin 0.27.0+nv20.11
nvidia-nccl 2.8.2
nvidia-tensorboard 1.15.0+nv20.11
nvidia-tensorflow 1.15.4+nv20.11
nvidia-tensorrt 7.2.1.6