深度学总结:RNN训练需要注意地方:pytorch每一个batch训练之前需要把hidden = hidden.data,否者反向传播的梯度会遍历以前的timestep

48 篇文章 0 订阅
26 篇文章 1 订阅

pytorch每一个batch训练之前需要把hidden = hidden.data,否者反向传播的梯度会遍历以前的timestep

tensorflow也有把new_state更新,但是没有明显detach的操作,预计是tensorflow自己机制默认backpropagation一个timestep的梯度:

    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)

pytorch每一个batch训练之前需要把hidden = hidden.data,否者反向传播的梯度会遍历以前的timestep,它是自动求导,需要专门把那个state提出来一下,这样就相当于detach了,反向梯度到这里就停止了。

# train the RNN
def train(rnn, n_steps, print_every):
    
    # initialize the hidden state
    hidden = None      
    
    for batch_i, step in enumerate(range(n_steps)):
        # defining the training data 
        time_steps = np.linspace(step * np.pi, (step+1)*np.pi, seq_length + 1)
        data = np.sin(time_steps)
        data.resize((seq_length + 1, 1)) # input_size=1

        x = data[:-1]
        y = data[1:]
        
        # convert data into Tensors
        x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension
        y_tensor = torch.Tensor(y)

        # outputs from the rnn
        prediction, hidden = rnn(x_tensor, hidden)

        ## Representing Memory ##
        # make a new variable for hidden and detach the hidden state from its history
        # this way, we don't backpropagate through the entire history
        hidden = hidden.data

        # calculate the loss
        loss = criterion(prediction, y_tensor)
        # zero gradients
        optimizer.zero_grad()
        # perform backprop and update weights
        loss.backward()
        optimizer.step()

        # display loss and predictions
        if batch_i%print_every == 0:        
            print('Loss: ', loss.item())
            plt.plot(time_steps[1:], x, 'r.') # input
            plt.plot(time_steps[1:], prediction.data.numpy().flatten(), 'b.') # predictions
            plt.show()
    
    return rnn
  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
以下是一个基本的RNN模型,可以用于训练一个输入数据为[85, 1139]的数据集: ```python import torch import torch.nn as nn # 定义超参数 input_size = 1139 hidden_size = 512 num_layers = 2 batch_size = 16 seq_length = 85 num_epochs = 10 learning_rate = 0.001 # 随机生成数据集 data = torch.randn(batch_size, seq_length, input_size) # 定义模型 class RNN(nn.Module): def __init__(self, input_size, hidden_size, num_layers): super(RNN, self).__init__() self.hidden_size = hidden_size self.num_layers = num_layers self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True) self.fc = nn.Linear(hidden_size, 1) def forward(self, x): h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) out, _ = self.rnn(x, h0) out = self.fc(out[:, -1, :]) return out device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = RNN(input_size, hidden_size, num_layers).to(device) # 定义损失函数和优化器 criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) # 训练模型 for epoch in range(num_epochs): for i in range(0, data.size(1) - seq_length, seq_length): inputs = data[:, i:i+seq_length, :] targets = torch.randn(batch_size, 1) inputs = inputs.to(device) targets = targets.to(device) # 向传播 outputs = model(inputs) loss = criterion(outputs, targets) # 反向传播和优化 optimizer.zero_grad() loss.backward() optimizer.step() if (i+1) % 10 == 0: print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' .format(epoch+1, num_epochs, i+1, data.size(1)-seq_length, loss.item())) ``` 这个模型使用了一个两层的RNN,每个RNN层的输出大小为512,使用MSE作为损失函数,使用Adam优化器进行训练。在训练过程中,我们将数据集分成了长度为85的序列,并在每个序列后面添加了一个随机的目标值。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值