在 Pytorch 到 onnx 转化的过程中,出现以下问题。
/path/model/bin2onnx.py:157: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sign != 0:
/path/model/bin2onnx.py:172: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sign != 0:
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [100,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:922: indexSelectSmallIndex: block: [2,0,0], thread: [101,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
1. 将 device 改为 CPU
# device = torch.device("cuda:0")
# ⬆️原代码,⬇️ 修改后
device = torch.device("cpu")
将所有 tensor 、变量所在的 device 改为 CPU 后,报错变得可读性更强的!
IndexError: index out of range in self
2. 通过对每个输入变量进行观测,定位问题
class Embedder(nn.Module):
def __init__(self, vocab_size, d_model, padding_idx=None):
super().__init__()
self.embed = nn.Embedding(vocab_size, d_model, padding_idx)
def forward(self, x):
return self.embed(x)
self.coord_embed_x = Embedder(BBOX+COORD_PAD+SVG_END, self.embed_dim, padding_idx=MASK)
self.coord_embed_y = Embedder(BBOX+COORD_PAD+SVG_END, self.embed_dim, padding_idx=MASK)
在本文采用的代码中,Embedder 在定义时就确定了 vocab_size = BBOX+COORD_PAD+SVG_END
。
因此,输入的形状也需要匹配
pixel_seq = torch.randint(0, BBOX+COORD_PAD+SVG_END, (n_samples, 2), device=device)
xy_seq = torch.randint(0, BBOX+COORD_PAD+SVG_END, (n_samples, 2, 2), device=device)
🔥结论
遇到类似问题,可以先调到 CPU 上,搞清楚真实问题后再逐步排查,效果更佳!
参考文章:
[1] https://blog.csdn.net/weixin_43301333/article/details/121155260
[2] https://blog.csdn.net/BetrayFree/article/details/134267306