问题记录-TensorFlow 1 - InternalError (see above for traceback): Blas GEMM launch failed

UNCLE-TOU?

已于 2024-07-01 19:12:32 修改

阅读量213

点赞数 6

文章标签： tensorflow

于 2024-07-01 15:03:08 首次发布

本文链接：https://blog.csdn.net/uncle_tou/article/details/140101296

版权

在复现一个开源项目https://github.com/macanv/BERT-BiLSTM-CRF-NER。有一个疑似TensorFlow-gpu、cudnn、cuda之间版本不兼容的问题。问题详情如下：

在base中输入nvidia-smi显示无此命令：

输入nvitop可正常显示：

输入nvcc -V显示为：

可得显卡驱动版本为470.199.02，cuda版本为11.4
环境中各包的版本为：

Tensorflow的版本是根据git项目中的readme设置的，不好轻易更改。上面的cudatoolkit和cudnn是运行命令tensorflow-gpu==1.12.0时自动安装的。查询得知版本依赖如下：

不知是否存在版本对应错误问题？
主要症状
原封不动地将项目下载到本地，第一次运行程序出现如下报错：

totalMemory: 23.70GiB freeMemory: 23.45GiB
2024-07-01 14:45:52.995573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2024-07-01 14:46:32.655609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2024-07-01 14:46:32.655637: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2024-07-01 14:46:32.655643: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2024-07-01 14:46:32.655769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22732 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:5e:00.0, compute capability: 8.6)
2024-07-01 14:47:35.319078: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
2024-07-01 14:47:35.321609: I tensorflow/stream_executor/stream.cc:2076] [stream=0x137c5e60,impl=0x137c5f00] did not wait for [stream=0x17edbb60,impl=0x1378c680]
2024-07-01 14:47:35.321668: I tensorflow/stream_executor/stream.cc:5011] [stream=0x137c5e60,impl=0x137c5f00] did not memcpy device-to-host; source: 0x7fd8d8251400
2024-07-01 14:47:35.321761: F tensorflow/core/common_runtime/gpu/gpu_util.cc:292] GPU->CPU Memcpy failed

第二次运行程序则出现如下报错：

InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(202, 2), b.shape=(2, 768), m=202, n=768, k=2
         [[node bert/embeddings/MatMul (defined at /home/dell/下载/enter/envs/TY_NER_tf1/lib/python3.6/site-packages/bert_base-0.0.9py3.6.egg/bert_base/bert/modeling.py:486) = MatMul[T=DTLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/embeddings/one_hot, bert/embeddings/token_type_embeddings/read)]]
         [[{{node crf_loss/Mean/_4075}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_3726_crf_loss/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

可以确定的是，显存充足（24G），batch-size足够小（调整为1依然报错），重启不能解决问题，程序没有错误（别人能够成功复现）

问题原因

30系显卡不支持CUDA10及以下版本，因此只能使用TensorFlow==2.4.0

UNCLE-TOU?

关注

6
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
问题记录-TensorFlow 1 - InternalError (see above for traceback): Blas GEMM launch failed

在复现一个开源项目https://github.com/macanv/BERT-BiLSTM-CRF-NER。有一个疑似TensorFlow-gpu、cudnn、cuda之间版本不兼容的问题。可以确定的是，显存充足（24G），batch-size足够小（调整为1依然报错），重启不能解决问题，程序没有错误（别人能够成功复现）
复制链接

扫一扫