NotImplementedError: Could not run ‘aten::empty_strided‘ with arguments from the ‘CUDA‘ backend.

今天在加载前面训练好的模型进行检测推理计算的时候,出现了一个奇怪的报错,

NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

终端截图如下:

报错详情内容如下:

NotImplementedError: Could not run 'aten::empty_strided' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty_strided' is only available for these backends: [CPU, Meta, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].

CPU: registered at aten\src\ATen\RegisterCPU.cpp:18433 [kernel]
Meta: registered at aten\src\ATen\RegisterMeta.cpp:12703 [kernel]
BackendSelect: registered at aten\src\ATen\RegisterBackendSelect.cpp:665 [kernel]
Python: registered at ..\aten\src\ATen\core\PythonFallbackKernel.cpp:47 [backend fallback]
Named: registered at ..\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: fallthrough registered at ..\aten\src\ATen\ConjugateFallback.cpp:22 [kernel]
Negative: fallthrough registered at ..\aten\src\ATen\native\NegateFallback.cpp:22 [kernel]
ADInplaceOrView: fallthrough registered at ..\aten\src\ATen\core\VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradCPU: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradCUDA: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradXLA: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradLazy: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradXPU: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradMLC: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradHPU: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradNestedTensor: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse1: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse2: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
AutogradPrivateUse3: registered at ..\torch\csrc\autograd\generated\VariableType_2.cpp:10483 [autograd kernel]
Tracer: registered at ..\torch\csrc\autograd\generated\TraceType_2.cpp:11423 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:466 [backend fallback]
Autocast: fallthrough registered at ..\aten\src\ATen\autocast_mode.cpp:305 [backend fallback]
Batched: registered at ..\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ..\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]

溯源报错查询发现是在torch加载模型的地方报错的。

代码中具体位置如下:

model = torch.jit.load(model_path).to(device)

我还是比较少用到jit这个方法的,以前加载模型的话大都是直接torch.load来进行的,为此专门查了一下Torch的官方文档,在这里,如下所示:

torch.jit.load

torch.jit.load(fmap_location=None_extra_files=None_restore_shapes=False)[SOURCE]

Load a ScriptModule or ScriptFunction previously saved with torch.jit.save.

All previously saved modules, no matter their device, are first loaded onto CPU, and then are moved to the devices they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised.

Parameters

  • f – a file-like object (has to implement read, readline, tell, and seek), or a string containing a file name

  • map_location (string or torch.device) – A simplified version of map_location in torch.jit.save used to dynamically remap storages to an alternative set of devices.

  • _extra_files (dictionary of filename to content) – The extra filenames given in the map would be loaded and their content would be stored in the provided map.

  • _restore_shapes (bool) – Whether or not to retrace the module on load using stored inputs

Returns

ScriptModule object.

实例代码如下:

import torch
import io

torch.jit.load('scriptmodule.pt')

# Load ScriptModule from io.BytesIO object
with open('scriptmodule.pt', 'rb') as f:
    buffer = io.BytesIO(f.read())

# Load all tensors to the original device
torch.jit.load(buffer)

# Load all tensors onto CPU, using a device
buffer.seek(0)
torch.jit.load(buffer, map_location=torch.device('cpu'))

# Load all tensors onto CPU, using a string
buffer.seek(0)
torch.jit.load(buffer, map_location='cpu')

# Load with extra files.
extra_files = {'foo.txt': ''}  # values will be replaced with data
torch.jit.load('scriptmodule.pt', _extra_files=extra_files)
print(extra_files['foo.txt'])

简单翻译了一下英文稳定介绍,如下:

加载以前使用torch.jit.save保存的ScriptModule或ScriptFunction。

所有以前保存的模块,无论其设备如何,都首先加载到CPU上,然后移动到保存它们的设备。
如果此操作失败(例如,因为运行时系统没有特定的设备),将引发异常。

因为之前的模型是借助于torch.jit.save方法进行存储的,所以这里模型的加载也是需要使用torch.jit.load来完成的,如果说之前的模型是直接torch.save存储的,那么加载模型的话也是可以直接基于torch.jit.load实现的。

官方的文档其实已经指出来了问题产生的原因:就是原始的模型是基于GPU设备训练然后存储的,使用torch.jit.load方法时会先将模型加载到CPU中,之后移动到原来存储模型的设备也就是GPU中,但是我本地是没有GPU设备的,自然也就找不到了,也就会报错了。

解决办法也很简单,将原来的加载代码改为如下:

model = torch.jit.load(model_path, map_location='cpu')

或者:

model = torch.jit.load(model_path, map_location=torch.device('cpu')).to(torch.device('cpu'))

到此问题解决,记录一下备忘!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Together_CZ

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值