PyTorch的RNN使用方法

前言

本文将基于pytorch以及tensorflow和keras描述一些模型的基本用法,该文章目的是方便作者本人查询,大概只有作者本人看得懂,有什么问题欢迎指出,所有的用词仅仅为了作者自己理解方便.实话实说我也是最近才开始用pytorch的,之前都用keras,所以搞不清楚一些细节.

RNN(GRU/LSTM)

 一下全部用gru作为示例,但都以rnn作为统称.

常用参数

input_size

hidden_size

num_layers:        堆叠层数

bidirectional:        双向rnn

batch_first

流程示意

pytorch的官方文档

Inputs:

input, h_0

input:

unbatched input:        (L, H_{in})        =>(seq_len, embedding_dim)

batch_first=False:        (L, N, H_{in})        =>(seq_len, batch_size, embedding_dim)

batch_first=True:        (N, L, H_{in})        =>(batch_size, embedding_dim)

h_0:

(D*num\:layer,H_{out}) or (D*num\:layer,N,H_{out})

bidirectional=0:        (num_layers, hidden_size) or (num_layers, batch_size, hidden_size)

bidirectional=1:        (2*num_layers, hidden_size) or (2*num_layers, batch_size, hidden_size)

containing the initial hidden state for the input sequence.

Defaults to zeros if not provided.

Outputs:

output, h_n

output:

unbatched input:        (L, D*H_{out})        =>(seq_len, hidden_size)

batch_first=False:        (L, N, D*H_{out})        =>(seq_len, batch_size, hidden_size)

batch_first=True:        (N, L, D*H_{out})        =>(batch_size, seq_len, hidden_size)

h_n:

(D*mun\:layer,H_{out}) or (D* num\:layer,N,H_{out})

bidirectional=0:        (num_layers, hidden_size) or (num_layers, batch_size, hidden_size)

bidirectional=1:        (2*num_layers, hidden_size) or (2*num_layers, batch_size, hidden_size)

containing the final hidden state for the input sequence.

作者的理解

与keras的rnn相比,pytorch的rnn默认的输出就是输出整个序列的,其中当hidden_size等于embedding_dim的时候pytorch的rnn行为将与keras完全相同,下面是代码

pytorch代码

import torch
from torch import nn
import numpy as np


class RNN(nn.Module):
	def __init__(
			self,
			input_dim,
			hidden_size
	):
		super().__init__()
		self.gru = nn.GRU(
			input_dim,
			hidden_size=hidden_size,
			batch_first=True
		)

	def forward(self, x, hidden=None):
		if hidden is None:
			output, state = self.gru(x)
		else:
			output, state = self.gru(x, hidden)
		return output, state


if __name__ == '__main__':
	batch_size = 128
	seq_len = 50
	embedding_dim = 32
	input_tensor = torch.from_numpy(
		np.zeros(
			shape=(batch_size, seq_len, embedding_dim),
			dtype='float32'
		)
	)

	net = RNN(embedding_dim, 114)
	output, state = net(input_tensor)
	print("input_size: ", input_tensor.shape)
	print("output_size: ", output.shape)
	print("state_size: ", state.shape)

 输出

input_size:  torch.Size([128, 50, 32])
output_size:  torch.Size([128, 50, 114])
state_size:  torch.Size([1, 128, 114])

可以看到符合(batch_size, seq_len, hidden_size)和(num_layers, batch_size, hidden_size)的格式

keras代码

import numpy as np
import keras
from keras import layers
import tensorflow as tf


if __name__ == '__main__':
	batch_size = 128
	seq_len = 50
	embedding_dim = 32
	input_tensor = np.zeros(
			shape=(batch_size, seq_len, embedding_dim),
			dtype='float32'
	)

	rnn = layers.GRU(embedding_dim)
	rnn_seq = layers.GRU(embedding_dim, return_sequences=True, return_state=True)
	output = rnn(input_tensor)
	print("return_sequences=False: ", output.shape)
	output, state = rnn_seq(input_tensor)
	print("return_sequences=False: ", output.shape)
	print("state: ", state.shape)

 输出

return_sequences=False:  (128, 32)
return_sequences=False:  (128, 50, 32)
state:  (128, 32)

可以看到相当于pytorch的hidden_size=embedding_dim时的输出,由于keras的rnn要手动堆叠,所以state的维度相较于pytorch少了第一个num_layers,其余一模一样.

更新计划

明天会对attention进行分析,以及对seq2seq分析

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值