RNN、LSTM、GRU等模型使用GPU时的常见错误

错误信息:

RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu

错误提示说,输入的张量在cuda中,但模型的隐藏层在CPU中,但我们检查模型,发现模型已经被放入到了cuda中了

# 实例化模型
model = LSTM(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)

# GPU加速
if torch.cuda.is_available():
	device = 'cuda:0'
    model = model.to(device)
    trainX = trainX.to(device)
    trainY = trainY.to(device)
    testX = testX.to(device)
    testY = testY.to(device)

究竟问题出在哪里了呢?

这时我们检查一下模型类

input_dim = 5      # 数据的特征数
hidden_dim = 32    # 隐藏层的神经元个数
num_layers = 2     # LSTM的层数
output_dim = 1     # 预测值的特征数
                   
class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(LSTM, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.num_layers = num_layers

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)

        # Readout layer 在LSTM后再加一个全连接层,因为是回归问题,所以不能在线性层后加激活函数
        self.fc = nn.Linear(hidden_dim, output_dim) 

    def forward(self, x):
        # Initialize hidden state with zeros   
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_() 
        # 这里x.size(0)就是batch_size

        # Initialize cell state
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).requires_grad_()

        # One time step
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))

        out = self.fc(out) 

        return out

发现在forward函数中,出现了h0和c0两个变量,这是给LSTM最开始的时步用的,这两个变量却是在CPU中,但它们不是模型的参数,即便后面使用了model = model.to(device)命令,也无法将这两个变量放入到GPU中。

将上述程序改成

if torch.cuda.is_available():
	device = 'cuda:0'
    
trainX = trainX.to(device)
trainY = trainY.to(device)
testX = testX.to(device)
testY = testY.to(device)

input_dim = 6      # 数据的特征数
hidden_dim = 32    # 隐藏层的神经元个数
num_layers = 2     # LSTM的层数
output_dim = 1     # 预测值的特征数
                   
class LSTM(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers, output_dim):
        super(LSTM, self).__init__()
        # Hidden dimensions
        self.hidden_dim = hidden_dim

        # Number of hidden layers
        self.num_layers = num_layers

        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)

        # Readout layer 在LSTM后再加一个全连接层,因为是回归问题,所以不能在线性层后加激活函数
        self.fc = nn.Linear(hidden_dim, output_dim) 

    def forward(self, x):
        # Initialize hidden state with zeros   
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim, device=x.device).requires_grad_() 
        # 这里x.size(0)就是batch_size

        # Initialize cell state
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim, device=x.device).requires_grad_()

        # One time step
        # We need to detach as we are doing truncated backpropagation through time (BPTT)
        # If we don't, we'll backprop all the way to the start even after going through another batch
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))

        out = self.fc(out) 

        return out

程序可以正常运行。

  • 4
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
使用GRULSTMRNN进行实验,可能会遇到以下问题: 1.梯度爆炸或梯度消失: 这是由于梯度在反向传播指数级增长或衰减,导致网络无法训练。解决方法有: - 梯度剪裁(Gradient Clipping):在反向传播限制梯度的范围,避免梯度过大或过小。可以通过设置阈值或缩放梯度来实现。 - 使用更稳定的激活函数,如ReLU、LeakyReLU等。 - 使用正则化技术,如dropout、L2正则化等。 - 使用Batch Normalization。 2.过拟合: 当模型过于复杂或数据集过小,容易出现过拟合现象。解决方法有: - 增加数据量。 - 使用正则化技术,如dropout、L2正则化等。 - 早停法(Early Stopping):在验证集上连续几轮损失没有下降停止训练。 - 使用其他优化器,如Adam、Adagrad等。 3.训练速度慢: 由于RNN网络的长依赖性,训练速度通常较慢。解决方法有: - 使用GPU加速。 - 使用截断反向传播(Truncated Backpropagation Through Time),将长序列分为若干段进行训练。 - 使用双向RNN(Bidirectional RNN),利用前后两个方向的信息来加速训练。 具体步骤如下: 1. 梯度剪裁: ``` optimizer = torch.optim.Adam(model.parameters(), lr=lr) total_loss = 0. for epoch in range(num_epochs): for i, (inputs, targets) in enumerate(train_loader): optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() nn.utils.clip_grad_norm_(model.parameters(), max_norm=1) optimizer.step() total_loss += loss.item() ``` 2. dropout正则化: ``` class GRU(nn.Module): def __init__(self, input_size, hidden_size, num_layers, dropout): super(GRU, self).__init__() self.gru = nn.GRU(input_size, hidden_size, num_layers, dropout=dropout) def forward(self, x, h0): out, hn = self.gru(x, h0) return out, hn ``` 3. 早停法: ``` optimizer = torch.optim.Adam(model.parameters(), lr=lr) total_loss = 0. prev_loss = float('inf') best_loss = float('inf') for epoch in range(num_epochs): for i, (inputs, targets) in enumerate(train_loader): optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, targets) loss.backward() optimizer.step() total_loss += loss.item() val_loss = evaluate(model, val_loader, criterion) if val_loss > prev_loss: if total_loss < best_loss: best_loss = total_loss torch.save(model.state_dict(), 'model.pt') break prev_loss = val_loss ``` 4. 截断反向传播: ``` bptt = 5 optimizer = torch.optim.Adam(model.parameters(), lr=lr) total_loss = 0. for epoch in range(num_epochs): for i, (inputs, targets) in enumerate(train_loader): optimizer.zero_grad() hidden = model.init_hidden(batch_size) for j in range(0, inputs.size(1), bptt): inputs_ = inputs[:, j:j+bptt] targets_ = targets[:, j:j+bptt] outputs, hidden = model(inputs_, hidden) loss = criterion(outputs, targets_) loss.backward() nn.utils.clip_grad_norm_(model.parameters(), max_norm=1) optimizer.step() total_loss += loss.item() ``` 5. 双向RNN: ``` class BiGRU(nn.Module): def __init__(self, input_size, hidden_size, num_layers): super(BiGRU, self).__init__() self.gru = nn.GRU(input_size, hidden_size, num_layers, bidirectional=True) def forward(self, x, h0): out, hn = self.gru(x, h0) out = out[:, :, :hidden_size] + out[:, :, hidden_size:] return out, hn ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值