【新方案】RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

内卷焦虑人士

已于 2024-04-25 11:16:24 修改

阅读量6k

点赞数 8

文章标签： transformer 人工智能深度学习

于 2022-06-23 10:31:00 首次发布

本文链接：https://blog.csdn.net/weixin_46398647/article/details/125421526

版权

问题

使用GPU训练时报错，报错内容如下

/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [16,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [18,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [19,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [20,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [21,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [27,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [28,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:247: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [31,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "/home/work/ner/Msra/train.py", line 92, in <module>
    train()
  File "/home/work/ner/Msra/train.py", line 83, in train
    train_data.map(start_train,
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2376, in map
    return self._map_single(
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 518, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/fingerprint.py", line 458, in wrapper
    out = func(self, *args, **kwargs)
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2764, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2644, in apply_function_on_filtered_inputs
    processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 2336, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "/home/work/ner/Msra/train.py", line 67, in start_train
    loss.backward()
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/anaconda3/envs/bitter/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

解决方案

1、仔细对过了特征维度，都是正确
2、GPU显存也不存在不足的情况
3、batch_size改小，无效
4、num_classes是正确的
5、cuda，pytorch版本合理
6、网络最后加‘sigmoid’，无效
7、不存在分类标签越界情况
8、删除jupyter lab隐藏文件，无效

ls -a 
rm .ipynb_checkpoints/ -r

9、强制同步

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

报错：

RuntimeError: CUDA error: device-side assert triggered

10、改用CPU运行

# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('cpu')

报错：

IndexError: Target -1 is out of bounds.

11、换过损失函数，无效

网上所有方法我全尝试了一遍，都没有解决问题，我草，烦死了

最终解决方案！！！

transformers在不指定版本的情况下安装的是4.20.1
更改为4.16.2，问题解决

内卷焦虑人士

关注

8
点赞
踩
14

收藏

觉得还不错? 一键收藏
8
评论
【新方案】RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm`CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`CUDA error: device-side assert triggeredIndexError: Target -1 is out of bounds.
复制链接

扫一扫