前言
本文将基于pytorch以及tensorflow和keras描述一些模型的基本用法,该文章目的是方便作者本人查询,大概只有作者本人看得懂,有什么问题欢迎指出,所有的用词仅仅为了作者自己理解方便.实话实说我也是最近才开始用pytorch的,之前都用keras,所以搞不清楚一些细节.
RNN(GRU/LSTM)
一下全部用gru作为示例,但都以rnn作为统称.
常用参数
input_size
hidden_size
num_layers: 堆叠层数
bidirectional: 双向rnn
batch_first
流程示意
pytorch的官方文档
Inputs:
input, h_0
input:
unbatched input: =>(seq_len, embedding_dim)
batch_first=False: =>(seq_len, batch_size, embedding_dim)
batch_first=True: =>(batch_size, embedding_dim)
h_0:
or
bidirectional=0: (num_layers, hidden_size) or (num_layers, batch_size, hidden_size)
bidirectional=1: (2*num_layers, hidden_size) or (2*num_layers, batch_size, hidden_size)
containing the initial hidden state for the input sequence.
Defaults to zeros if not provided.
Outputs:
output, h_n
output:
unbatched input: =>(seq_len, hidden_size)
batch_first=False: =>(seq_len, batch_size, hidden_size)
batch_first=True: =>(batch_size, seq_len, hidden_size)
h_n:
or
bidirectional=0: (num_layers, hidden_size) or (num_layers, batch_size, hidden_size)
bidirectional=1: (2*num_layers, hidden_size) or (2*num_layers, batch_size, hidden_size)
containing the final hidden state for the input sequence.
作者的理解
与keras的rnn相比,pytorch的rnn默认的输出就是输出整个序列的,其中当hidden_size等于embedding_dim的时候pytorch的rnn行为将与keras完全相同,下面是代码
pytorch代码
import torch
from torch import nn
import numpy as np
class RNN(nn.Module):
def __init__(
self,
input_dim,
hidden_size
):
super().__init__()
self.gru = nn.GRU(
input_dim,
hidden_size=hidden_size,
batch_first=True
)
def forward(self, x, hidden=None):
if hidden is None:
output, state = self.gru(x)
else:
output, state = self.gru(x, hidden)
return output, state
if __name__ == '__main__':
batch_size = 128
seq_len = 50
embedding_dim = 32
input_tensor = torch.from_numpy(
np.zeros(
shape=(batch_size, seq_len, embedding_dim),
dtype='float32'
)
)
net = RNN(embedding_dim, 114)
output, state = net(input_tensor)
print("input_size: ", input_tensor.shape)
print("output_size: ", output.shape)
print("state_size: ", state.shape)
输出
input_size: torch.Size([128, 50, 32])
output_size: torch.Size([128, 50, 114])
state_size: torch.Size([1, 128, 114])
可以看到符合(batch_size, seq_len, hidden_size)和(num_layers, batch_size, hidden_size)的格式
keras代码
import numpy as np
import keras
from keras import layers
import tensorflow as tf
if __name__ == '__main__':
batch_size = 128
seq_len = 50
embedding_dim = 32
input_tensor = np.zeros(
shape=(batch_size, seq_len, embedding_dim),
dtype='float32'
)
rnn = layers.GRU(embedding_dim)
rnn_seq = layers.GRU(embedding_dim, return_sequences=True, return_state=True)
output = rnn(input_tensor)
print("return_sequences=False: ", output.shape)
output, state = rnn_seq(input_tensor)
print("return_sequences=False: ", output.shape)
print("state: ", state.shape)
输出
return_sequences=False: (128, 32)
return_sequences=False: (128, 50, 32)
state: (128, 32)
可以看到相当于pytorch的hidden_size=embedding_dim时的输出,由于keras的rnn要手动堆叠,所以state的维度相较于pytorch少了第一个num_layers,其余一模一样.
更新计划
明天会对attention进行分析,以及对seq2seq分析