在Transformer模型中遇到这个问题
参考方法:https://www.jianshu.com/p/e1a0b14916f9
原因
class Add_Norm(nn.Module):
def __init__(self):
super(Add_Norm, self).__init__()
self.dropout = nn.Dropout(config.p)
def forward(self, x, sub_layer, **kwargs):
sub_output = sub_layer(x, **kwargs)
x = self.dropout(x + sub_output)
layer_norm = nn.LayerNorm(x.size()[1:])
out = layer_norm(x)
return out
在Transformer模型中的Add_Norm模块的forward函数中,直接构造LayerNorm,导致GPU冲突。
所以只要将LayerNorm层在初始化函数中申明就好。
但我这个LayerNorm层的输入维度依赖于x的维度,所以不好这么做,只能简单粗暴的将其设为gpu显存中。
代码如下:
class Add_Norm(nn.Module):
def __init__(self):
super(Add_Norm, self).__init__()
self.dropout = nn.Dropout(config.p)
def forward(self, x, sub_layer, **kwargs):
sub_output = sub_layer(x, **kwargs)
x = self.dropout(x + sub_output)
layer_norm = nn.LayerNorm(x.size()[1:]).to("cuda:0")
out = layer_norm(x)
return out