RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor ref

ssimmu

已于 2024-03-06 15:54:10 修改

阅读量412

点赞数 3

文章标签： pytorch 人工智能 transformer

于 2024-03-06 15:33:50 首次发布

本文链接：https://blog.csdn.net/ssimmu/article/details/136507731

版权

作者在使用Transformer模型时遇到RuntimeError，定位到PositionalEncoding层。问题在于nn.register_buffer与多卡环境的兼容性。通过将register_buffer替换为nn.Parameter解决了问题，提醒在多卡并行时注意此类问题。

摘要由CSDN通过智能技术生成

参考：`RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location` · Lightning-AI/pytorch-lightning · Discussion #14377 · GitHub

问题：

调用 model时出现如题报错，换用其他model没报错，google了下，问题在PositionalEncoding中。

model背景：调用transformer，使用了原PositionalEncoding代码：

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)

        self.register_buffer('pe', pe)


    def forward(self, x):
        # not used in the final model
        x = x + self.pe[:x.shape[0], :]
        return self.dropout(x)

本人解决方案

将其改为即可

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(p=dropout)

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-np.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)

        # self.register_buffer('pe', pe)
        self.register_parameter('pe', nn.Parameter(pe, requires_grad=False))

    def forward(self, x):
        # not used in the final model
        x = x + self.pe[:x.shape[0], :]
        return self.dropout(x)

原因应该是nn.register_buffer、torch版本和DDP之间的问题，因为我对model进行单卡测试时无报错，so。。。，我之前也因为self.register_buffer('pe', pe)这行代码遇到过其他报错，总之使用PositionalEncoding且多卡并行时注意一下这里。

ssimmu

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor ref

原因应该是nn.register_buffer、torch版本和DDP之间的问题，因为我对model进行单卡测试时无报错，so。，我之前也因为self.register_buffer('pe', pe)这行代码遇到过其他报错，总之使用PositionalEncoding且多卡并行时注意一下这里。调用 model时出现如题报错，换用其他model没报错，google了下，问题在PositionalEncoding中。
复制链接

扫一扫

RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor ref

问题：

本人解决方案

“相关推荐”对你有帮助么？