MachineLP的Github(欢迎follow):https://github.com/MachineLP
GAN 为什么没有 在NLP 取得好成绩?
虽然 GAN 在图像生成上取得了很好的成绩,GAN 并没有在自然语言处理(NLP)任务中取得让人惊喜的成果。 其原因大概可以总结为如下几点:
(1) 原始 GAN 主要应用实数空间(连续型数据)上,在生成离散数据(texts)这个问题上并不 work。GAN 理论的提出者 Ian Goodfellow 博士这样回答来这个问题问题:“GANs 目前并没有应用到自然语言处理(NLP)中,最初的 GANs 仅仅定义在实数领域,GANs 通过训练出的生成器来产生合成数据,然后在合成数据上运行判别器,判别器的输出梯度将会告诉你,如何通过略微改变合成数据而使其更加现实。一般来说只有在数据连续的情况下,你才可以略微改变合成的数据,而如果数据是离散的,则不能简单的通过改变合成数据。例如,如果你输出了一张图片,其像素值是1.0,那么接下来你可以将这个值改为1.0001。如果输出了一个单词“penguin”,那么接下来就不能将其改变为“penguin .001”,因为没有“penguin .001”这个单词。 因为所有的自然语言处理(NLP)的基础都是离散值,如“单词”、“字母”或者“音节”, NLP 中应用 GANs是非常困难的。一般而言,采用增强学习算法。目前据我所知,还没有人真正的开始研究利用增强算法解决 NLP 问题。”
(2) 在生成 text 时,GAN 对整个文本序列进行建模打分。对于部分(partially)生成的序列,十分难判断其在之后生成整个 (fully) 序列时的分数。
(3) 另一个潜在的挑战涉及 RNN 的性质(生成文本大多采用 RNN 模型)。假设我们试图从 latent codes 生成文本,error 就会随着句子的长度成指数级的累积。最开始的几个词可能是相对合理的,但是句子质量会随着句子长度的增加而不断变差。另外,句子的长度是从随机的 latent representation 生成的,所以句子长度也是难以控制。
本文着重介绍SeqGAN原理 和 SeqGAN代码详解:
SeqGAN原理部分:
首先介绍GAN:
GAN主要分为两部分:(GAN目标是训练一个生成模型完美的拟合真实数据分布使得判别模型无法区分。)
(1)生成模型:模拟真实数据的分布。
(2)判别模型:判断一个样本是真实的样本还是生成的样本。
其次介绍SeqGAN:
GAN在图像领域的应用较多,在文本方面效果不佳的原因主要是GAN 在生成连续离散序列时会遇到两个问题:一是因为生成器的输出是离散的,梯度更新从判别器传到生成器比较困难;二是判别器只有当序列被完全生成后才能进行判断,但此刻指导用处已不太大,而如果生成器生成序列的同时判别器来判断,如何平衡当前序列的分数和未来序列的分数又是一个难题。
在这篇论文中,作者提出了一个序列生成模型——SeqGAN ,来解决上述这两个问题。作者将生成器看作是强化学习中的stochastic policy,这样SeqGAN 就可以直接通过gradient policy update 避免生成器中的可导问题。同时,判别器对整个序列的评分作为强化学习的奖励信号可以通过Monte Carlo 搜索传递到序列生成的中间时刻。
具体来说,作者将生成器生成序列的过程看做是一个强化学习中的序列决策过程。生成模型被看作一个agent,目前为止已生成的序列表示当前state,下一个要生成的单词则是采取的action,判别模型对序列的评价分数则是返回的reward。
模型结构如下所示:
如上图,左边是判别器的训练,通过输入来自真实数据的正样例和来自生成器生成的负样例从而训练,判别器由 CNN 组成;右边是生成器的训练,通过将判别器判别的概率回传给生成器从而训练,这里使用了 Monte Carlo search 和 policy gradient 方法。
具体的:
(1) 其中左图为 GAN 网络训练的步骤1,判别器D 主要用来区分真实样本和伪造样本,这里的判别器D 是用 CNN 来实现的。
(2) 右图为 GAN 网络训练的步骤2, 根据判别器D 回传的判别概率回传给生成器G,通过增强学习的方法来更新生成器G,这里的的生成器G 是用 LSTM 来实现的。
(3) 因为 G网络的更新策略是增强学习,增强学习的四个要素 state, action, policy, reward分别为:state 为现在已经生成的tokens (当前 timestep 之前 LSTM decoder 的结果), action 是下一个即将生成的 token (当前解码词), policy 为 GAN 的生成器 G网络,reward 为GAN 的判别器 D网络所生成的判别概率。其中,reward 采用以下方法来近似:
本过程特点:即当解码到t时,即对后面 T-t 个 timestep 采用蒙特卡洛搜索搜索出 N 条路径,将这 N 条路径分别和已经 decode 的结果组成N条完整输出,然后将 D 网络对应奖励的平均值作为 reward. 因为当 t=T 时无法再向后探索路径,所以直接以完整 decode 结果的奖励作为 reward。
(4) 对于 RL 部分,本文采用了 policy gradient 方法。 根据 policy gradient 理论,生成器G的目标函数可以表示如下:
求导结果为: (详细推导过程请看原论文附页)
(5) 每隔一段时间,当生成更多的更逼真的句子后,重新训判别器D,其中判别器的目标函数表示如下:
算法结构图可以表示为如下:
具体的算法结构图可以如下解释:
首先,定义生成器,rool-out (主要是基于Monte Carlo search算法将句子不全,开始入门篇有提到),判别器,以及序列化数据S。
(1)初始化生成器与判别器。
(2)基于数据S通过最大似然估计方法预训练生成器。
(3)使用预训练的生成器参数更新rool-out参数, 用于后面将句子补全,并且基于判别器计算rewards。
(4)提取生成器生成的数据作为负样本,S中的作为正样本。
(5)通过交叉熵预训练判别器。
(6)迭代:对于每一个epoch: (训练生成器g次, 训练判别器d次)
(7) 对于每一次训练 生成器:
(8) 通过生成器生成文本数据。
(9) 对于生成器生成的数据进行拆分:(便于计算和统计生成单个词、中间词、以及整个句子应有的rewards)
(10) 通过rool-out(基于Monte Carlo search算法)计算rewards
(11) end
(12) 通过Policy Gradient更新生成器的参数
(13) end
(14) 对于每一次训练判别器:
(15) 提取生成器生成的数据作为负样本,S中的作为正样本
(16) 训练判别器k个epoch
(17) end
(18) 生成器参数更新rool-out参数
(19)直到seqGAN收敛。
另外可以看一下使用最大似然和强化学习的比较:
还有一些其他文本生成的相关论文:
1. Generating Text via Adversarial Training
论文链接:http://people.duke.edu/~yz196/pdf/textgan.pdf
2. Adversarial Learning for Neural Dialogue Generation
论文链接:https://arxiv.org/pdf/1701.06547.pdf
3. GANs for sequence of discrete elements with the Gumbel-softmax distribution
论文链接:https://arxiv.org/pdf/1611.04051.pdf
4. Connecting generative adversarial network and actor-critic methods
论文链接:https://arxiv.org/pdf/1610.01945.pdf
代码详细解释:
sequence_gan.py
import numpy as np
import tensorflow as tf
import random
from dataloader import Gen_Data_loader, Dis_dataloader
from generator import Generator
from discriminator import Discriminator
from rollout import ROLLOUT
from target_lstm import TARGET_LSTM
import cPickle
####
# 生成器、 target_lstm、rollout使用同一套模型( rnn(不同的变种) )
# 辨别器: 选用的cnn。
####
#########################################################################################
# 生成器的超参数部分
# Generator Hyper-parameters
######################################################################################
# 词的embedding
EMB_DIM = 32 # embedding dimension
# rnn的隐含层单元
HIDDEN_DIM = 32 # hidden state dimension of lstm cell
# 序列的最大长度
SEQ_LENGTH = 20 # sequence length
# rnn开始标示
START_TOKEN = 0
# 预训练
PRE_EPOCH_NUM = 120 # supervise (maximum likelihood estimation) epochs
SEED = 88
BATCH_SIZE = 64
#########################################################################################
# Discriminator Hyper-parameters
#########################################################################################
# 词的embedding选用的是64
dis_embedding_dim = 64
# 定义CNN中的卷积核大小
dis_filter_sizes = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20]
# 定义CNN中的卷积核数量
dis_num_filters = [100, 200, 200, 200, 200, 100, 100, 100, 100, 100, 160, 160]
dis_dropout_keep_prob = 0.75
dis_l2_reg_lambda = 0.2
dis_batch_size = 64
#########################################################################################
# Basic Training Parameters
#########################################################################################
TOTAL_BATCH = 200
positive_file = 'save/real_data.txt'
negative_file = 'save/generator_sample.txt'
eval_file = 'save/eval_file.txt'
generated_num = 10000
# 根据训练好的模型,生成文本; 思想是基于seqence2seqence的思想。
def generate_samples(sess, trainable_model, batch_size, generated_num, output_file):
# Generate Samples
# 用于保存生成的文本, 之后保存到txt文件中。
generated_samples = []
# 开始批量生成文本数据
for _ in range(int(generated_num / batch_size)):
generated_samples.extend(trainable_model.generate(sess))
# 讲生成的文本保存到txt文件中。 用于后边训练rnn模型。
with open(output_file, 'w') as fout:
for poem in generated_samples:
buffer = ' '.join([str(x) for x in poem]) + '\n'
fout.write(buffer)
# task loss用于比较真实数据和生成数据样本分布比较, (注意啊,着重看样本分布)
def target_loss(sess, target_lstm, data_loader):
# target_loss means the oracle negative log-likelihood tested with the oracle model "target_lstm"
# For more details, please see the Section 4 in https://arxiv.org/abs/1609.05473
nll = []
# 重置索引,从0开始。
data_loader.reset_pointer()
# 遍历每一个batch, 统计真实数据与样本数据分布的比较。
for it in xrange(data_loader.num_batch):
batch = data_loader.next_batch()
g_loss = sess.run(target_lstm.pretrain_loss, {target_lstm.x: batch})
nll.append(g_loss)
return np.mean(nll)
# 用于生成模型的预训练。 此处 使用中规中矩的rnn的思路。
def pre_train_epoch(sess, trainable_model, data_loader):
# Pre-train the generator using MLE for one epoch
supervised_g_losses = []
# 重置索引,从0开始。
data_loader.reset_pointer()
# 基于训练样本与测试样本的分布差异,对模型参数进行更新。
for it in xrange(data_loader.num_batch):
batch = data_loader.next_batch()
_, g_loss = trainable_model.pretrain_step(sess, batch)
supervised_g_losses.append(g_loss)
return np.mean(supervised_g_losses)
def main():
# 随机种子,出一道思考题,这是干嘛子用的啊?
random.seed(SEED)
np.random.seed(SEED)
# 断言 START_TOKEN 是否为0.
assert START_TOKEN == 0
# 初始化数据模块,用于训练
gen_data_loader = Gen_Data_loader(BATCH_SIZE)
# 初始化数据模块,用于测试
likelihood_data_loader = Gen_Data_loader(BATCH_SIZE) # For testing
vocab_size = 5000
# 初始化辨别器数据模块,用于训练
dis_data_loader = Dis_dataloader(BATCH_SIZE)
# 定义生成器, 用于模型预训练、 测试。
generator = Generator(vocab_size, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN)
# TARGET_LSTM的权重,用于后面初始化模型参数。
target_params = cPickle.load(open('save/target_params.pkl'))
# 基于模型参数初始化TARGET_LSTM模型,用于数据生成 和 基于task loss的真实样本与预测样本的模型评估。
target_lstm = TARGET_LSTM(vocab_size, BATCH_SIZE, EMB_DIM, HIDDEN_DIM, SEQ_LENGTH, START_TOKEN, target_params) # The oracle model
# 定义辨别器, 用于模型预训练。
discriminator = Discriminator(sequence_length=20, num_classes=2, vocab_size=vocab_size, embedding_size=dis_embedding_dim,
filter_sizes=dis_filter_sizes, num_filters=dis_num_filters, l2_reg_lambda=dis_l2_reg_lambda)
# GPU——config
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
sess.run(tf.global_variables_initializer())
# First, use the oracle model to provide the positive examples, which are sampled from the oracle data distribution
# 基于训练好的target_lstm生成训练数据。
generate_samples(sess, target_lstm, BATCH_SIZE, generated_num, positive_file)
# 将训练数据加载到生成器数据模块
gen_data_loader.create_batches(positive_file)
# 定义日志存放文件。
log = open('save/experiment-log.txt', 'w')
# pre-train generator
print ('Start pre-training...')
log.write('pre-training...\n')
# 对生成器进行预训练。
for epoch in xrange(PRE_EPOCH_NUM):
loss = pre_train_epoch(sess, generator, gen_data_loader)
# 经过五个epoch测试一次。
if epoch % 5 == 0:
# 生成模型预测数据
generate_samples(sess, generator, BATCH_SIZE, generated_num, eval_file)
likelihood_data_loader.create_batches(eval_file)
test_loss = target_loss(sess, target_lstm, likelihood_data_loader)
print ('pre-train epoch ', epoch, 'test_loss ', test_loss)
buffer = 'epoch:\t'+ str(epoch) + '\tnll:\t' + str(test_loss) + '\n'
log.write(buffer)
print ('Start pre-training discriminator...')
# Train 3 epoch on the generated data and do this for 50 times
# 对辨别器进行预训练。
for _ in range(50):
# 基于预训练的生成器生成 假的文本数据。
generate_samples(sess, generator, BATCH_SIZE, generated_num, negative_file)
# 将假的文本数据 加载到 辨别器数据模块, 用于辨别器的预训练。
dis_data_loader.load_train_data(positive_file, negative_file)
# 辨别器训练3次,接着训练辨别器。
for _ in range(3):
dis_data_loader.reset_pointer()
for it in xrange(dis_data_loader.num_batch):
x_batch, y_batch = dis_data_loader.next_batch()
feed = {
discriminator.input_x: x_batch,
discriminator.input_y: y_batch,
discriminator.dropout_keep_prob: dis_dropout_keep_prob
}
_ = sess.run(discriminator.train_op, feed)
####################### 预备工作干好了,该办大事了 #########################
# ROLLOUT功能是干嘛的呢, 嘿嘿, 说白了就是补全, 补成完整的句子: 一是因为生成器的输出是离散的,梯度更新从判别器传到生成器比较困难;二是判别器只有当序列被完全生成后才能进行判断,但此刻指导用处已不太大,而如果生成器生成序列的同时判别器来判断,如何平衡当前序列的分数和未来序列的分数又是一个难题。
# 使用生成器参数初始化辨别器的参数。
rollout = ROLLOUT(generator, 0.8)
print ('#########################################################################')
print ('Start Adversarial Training...')
log.write('adversarial training...\n')
# 那么我们要开始进行训练了。 规则: 训练生成器一次; 训练辨别器五次。 平衡生成器与判别器。
for total_batch in range(TOTAL_BATCH):
# Train the generator for one step
# 训练一次生成器。
for it in range(1):
samples = generator.generate(sess)
# 基于生成器生成的数据 和 辨别器计算rewards。
rewards = rollout.get_reward(sess, samples, 16, discriminator)
feed = {generator.x: samples, generator.rewards: rewards}
# 基于rewards更新生成器模型参数
_ = sess.run(generator.g_updates, feed_dict=feed)
# Test
# 迭代五次,测试一次,测试流程跟上面一样哦。
if total_batch % 5 == 0 or total_batch == TOTAL_BATCH - 1:
generate_samples(sess, generator, BATCH_SIZE, generated_num, eval_file)
likelihood_data_loader.create_batches(eval_file)
test_loss = target_loss(sess, target_lstm, likelihood_data_loader)
buffer = 'epoch:\t' + str(total_batch) + '\tnll:\t' + str(test_loss) + '\n'
print ('total_batch: ', total_batch, 'test_loss: ', test_loss)
log.write(buffer)
# Update roll-out parameters
# 记得用生成器的模型参数进行更新rollout。
rollout.update_params()
# Train the discriminator
# 训练辨别器五次。
for _ in range(5):
# 根据训练的生成器模型,生成句子。
generate_samples(sess, generator, BATCH_SIZE, generated_num, negative_file)
# 将假的文本数据 加载到 辨别器数据模块, 用于辨别器的预训练。
dis_data_loader.load_train_data(positive_file, negative_file)
# 辨别器训练3次,重新生成假数据,接着训练辨别器。
for _ in range(3):
# 重置索引,从0开始。
dis_data_loader.reset_pointer()
# 读取每一个batchz-size, 训练辨别器。
for it in xrange(dis_data_loader.num_batch):
x_batch, y_batch = dis_data_loader.next_batch()
feed = {
discriminator.input_x: x_batch,
discriminator.input_y: y_batch,
discriminator.dropout_keep_prob: dis_dropout_keep_prob
}
_ = sess.run(discriminator.train_op, feed)
# close
log.close()
if __name__ == '__main__':
main()
dataloader.py
import numpy as np
class Gen_Data_loader():
def __init__(self, batch_size):
self.batch_size = batch_size
self.token_stream = []
def create_batches(self, data_file):
self.token_stream = []
with open(data_file, 'r') as f:
for line in f:
line = line.strip()
line = line.split()
parse_line = [int(x) for x in line]
if len(parse_line) == 20:
self.token_stream.append(parse_line)
self.num_batch = int(len(self.token_stream) / self.batch_size)
self.token_stream = self.token_stream[:self.num_batch * self.batch_size]
self.sequence_batch = np.split(np.array(self.token_stream), self.num_batch, 0)
self.pointer = 0
def next_batch(self):
ret = self.sequence_batch[self.pointer]
self.pointer = (self.pointer + 1) % self.num_batch
return ret
def reset_pointer(self):
self.pointer = 0
class Dis_dataloader():
def __init__(self, batch_size):
self.batch_size = batch_size
self.sentences = np.array([])
self.labels = np.array([])
def load_train_data(self, positive_file, negative_file):
# Load data
positive_examples = []
negative_examples = []
with open(positive_file)as fin:
for line in fin:
line = line.strip()
line = line.split()
parse_line = [int(x) for x in line]
positive_examples.append(parse_line)
with open(negative_file)as fin:
for line in fin:
line = line.strip()
line = line.split()
parse_line = [int(x) for x in line]
if len(parse_line) == 20:
negative_examples.append(parse_line)
self.sentences = np.array(positive_examples + negative_examples)
# Generate labels
positive_labels = [[0, 1] for _ in positive_examples]
negative_labels = [[1, 0] for _ in negative_examples]
self.labels = np.concatenate([positive_labels, negative_labels], 0)
# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(self.labels)))
self.sentences = self.sentences[shuffle_indices]
self.labels = self.labels[shuffle_indices]
# Split batches
self.num_batch = int(len(self.labels) / self.batch_size)
self.sentences = self.sentences[:self.num_batch * self.batch_size]
self.labels = self.labels[:self.num_batch * self.batch_size]
self.sentences_batches = np.split(self.sentences, self.num_batch, 0)
self.labels_batches = np.split(self.labels, self.num_batch, 0)
self.pointer = 0
def next_batch(self):
ret = self.sentences_batches[self.pointer], self.labels_batches[self.pointer]
self.pointer = (self.pointer + 1) % self.num_batch
return ret
def reset_pointer(self):
self.pointer = 0
discriminator.py
import tensorflow as tf
import numpy as np
# An alternative to tf.nn.rnn_cell._linear function, which has been removed in Tensorfow 1.0.1
# The highway layer is borrowed from https://github.com/mkroutikov/tf-lstm-char-cnn
def linear(input_, output_size, scope=None):
'''
Linear map: output[k] = sum_i(Matrix[k, i] * input_[i] ) + Bias[k]
Args:
input_: a tensor or a list of 2D, batch x n, Tensors.
output_size: int, second dimension of W[i].
scope: VariableScope for the created subgraph; defaults to "Linear".
Returns:
A 2D Tensor with shape [batch x output_size] equal to
sum_i(input_[i] * W[i]), where W[i]s are newly created matrices.
Raises:
ValueError: if some of the arguments has unspecified or wrong shape.
'''
shape = input_.get_shape().as_list()
if len(shape) != 2:
raise ValueError("Linear is expecting 2D arguments: %s" % str(shape))
if not shape[1]:
raise ValueError("Linear expects shape[1] of arguments: %s" % str(shape))
input_size = shape[1]
# Now the computation.
with tf.variable_scope(scope or "SimpleLinear"):
matrix = tf.get_variable("Matrix", [output_size, input_size], dtype=input_.dtype)
bias_term = tf.get_variable("Bias", [output_size], dtype=input_.dtype)
return tf.matmul(input_, tf.transpose(matrix)) + bias_term
def highway(input_, size, num_layers=1, bias=-2.0, f=tf.nn.relu, scope='Highway'):
"""Highway Network (cf. http://arxiv.org/abs/1505.00387).
t = sigmoid(Wy + b)
z = t * g(Wy + b) + (1 - t) * y
where g is nonlinearity, t is transform gate, and (1 - t) is carry gate.
"""
with tf.variable_scope(scope):
for idx in range(num_layers):
g = f(linear(input_, size, scope='highway_lin_%d' % idx))
t = tf.sigmoid(linear(input_, size, scope='highway_gate_%d' % idx) + bias)
output = t * g + (1. - t) * input_
input_ = output
return output
class Discriminator(object):
"""
A CNN for text classification.
Uses an embedding layer, followed by a convolutional, max-pooling and softmax layer.
"""
def __init__(
self, sequence_length, num_classes, vocab_size,
embedding_size, filter_sizes, num_filters, l2_reg_lambda=0.0):
# Placeholders for input, output and dropout
self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
# Keeping track of l2 regularization loss (optional)
l2_loss = tf.constant(0.0)
with tf.variable_scope('discriminator'):
# Embedding layer
with tf.device('/cpu:0'), tf.name_scope("embedding"):
self.W = tf.Variable(
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W")
self.embedded_chars = tf.nn.embedding_lookup(self.W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
# Create a convolution + maxpool layer for each filter size
pooled_outputs = []
for filter_size, num_filter in zip(filter_sizes, num_filters):
with tf.name_scope("conv-maxpool-%s" % filter_size):
# Convolution Layer
filter_shape = [filter_size, embedding_size, 1, num_filter]
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filter]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv")
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Maxpooling over the outputs
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_length - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
padding='VALID',
name="pool")
pooled_outputs.append(pooled)
# Combine all the pooled features
num_filters_total = sum(num_filters)
self.h_pool = tf.concat(pooled_outputs, 3)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
# Add highway
with tf.name_scope("highway"):
self.h_highway = highway(self.h_pool_flat, self.h_pool_flat.get_shape()[1], 1, 0)
# Add dropout
with tf.name_scope("dropout"):
self.h_drop = tf.nn.dropout(self.h_highway, self.dropout_keep_prob)
# Final (unnormalized) scores and predictions
with tf.name_scope("output"):
W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
l2_loss += tf.nn.l2_loss(W)
l2_loss += tf.nn.l2_loss(b)
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.ypred_for_auc = tf.nn.softmax(self.scores)
self.predictions = tf.argmax(self.scores, 1, name="predictions")
# CalculateMean cross-entropy loss
with tf.name_scope("loss"):
losses = tf.nn.softmax_cross_entropy_with_logits(logits=self.scores, labels=self.input_y)
self.loss = tf.reduce_mean(losses) + l2_reg_lambda * l2_loss
self.params = [param for param in tf.trainable_variables() if 'discriminator' in param.name]
d_optimizer = tf.train.AdamOptimizer(1e-4)
grads_and_vars = d_optimizer.compute_gradients(self.loss, self.params, aggregation_method=2)
self.train_op = d_optimizer.apply_gradients(grads_and_vars)
generator.py
import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops, control_flow_ops
class Generator(object):
def __init__(self, num_emb, batch_size, emb_dim, hidden_dim,
sequence_length, start_token,
learning_rate=0.01, reward_gamma=0.95):
self.num_emb = num_emb
self.batch_size = batch_size
self.emb_dim = emb_dim
self.hidden_dim = hidden_dim
self.sequence_length = sequence_length
self.start_token = tf.constant([start_token] * self.batch_size, dtype=tf.int32)
self.learning_rate = tf.Variable(float(learning_rate), trainable=False)
self.reward_gamma = reward_gamma
self.g_params = []
self.d_params = []
self.temperature = 1.0
self.grad_clip = 5.0
self.expected_reward = tf.Variable(tf.zeros([self.sequence_length]))
with tf.variable_scope('generator'):
self.g_embeddings = tf.Variable(self.init_matrix([self.num_emb, self.emb_dim]))
self.g_params.append(self.g_embeddings)
self.g_recurrent_unit = self.create_recurrent_unit(self.g_params) # maps h_tm1 to h_t for generator
self.g_output_unit = self.create_output_unit(self.g_params) # maps h_t to o_t (output token logits)
# placeholder definition
self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator
self.rewards = tf.placeholder(tf.float32, shape=[self.batch_size, self.sequence_length]) # get from rollout policy and discriminator
# processed for batch
with tf.device("/cpu:0"):
self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim
# Initial states
self.h0 = tf.zeros([self.batch_size, self.hidden_dim])
self.h0 = tf.stack([self.h0, self.h0])
gen_o = tensor_array_ops.TensorArray(dtype=tf.float32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
# sequence2sequence的套路。。。
def _g_recurrence(i, x_t, h_tm1, gen_o, gen_x):
h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple
o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob
log_prob = tf.log(tf.nn.softmax(o_t))
next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32)
x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim
gen_o = gen_o.write(i, tf.reduce_sum(tf.multiply(tf.one_hot(next_token, self.num_emb, 1.0, 0.0),
tf.nn.softmax(o_t)), 1)) # [batch_size] , prob
gen_x = gen_x.write(i, next_token) # indices, batch_size
return i + 1, x_tp1, h_t, gen_o, gen_x
_, _, _, self.gen_o, self.gen_x = control_flow_ops.while_loop(
cond=lambda i, _1, _2, _3, _4: i < self.sequence_length,
body=_g_recurrence,
loop_vars=(tf.constant(0, dtype=tf.int32),
tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, gen_o, gen_x))
self.gen_x = self.gen_x.stack() # seq_length x batch_size
self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length
# supervised pretraining for generator
g_predictions = tensor_array_ops.TensorArray(
dtype=tf.float32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
ta_emb_x = tensor_array_ops.TensorArray(
dtype=tf.float32, size=self.sequence_length)
ta_emb_x = ta_emb_x.unstack(self.processed_x)
# 中规中矩的RNN。
def _pretrain_recurrence(i, x_t, h_tm1, g_predictions):
h_t = self.g_recurrent_unit(x_t, h_tm1)
o_t = self.g_output_unit(h_t)
g_predictions = g_predictions.write(i, tf.nn.softmax(o_t)) # batch x vocab_size
x_tp1 = ta_emb_x.read(i)
return i + 1, x_tp1, h_t, g_predictions
_, _, _, self.g_predictions = control_flow_ops.while_loop(
cond=lambda i, _1, _2, _3: i < self.sequence_length,
body=_pretrain_recurrence,
loop_vars=(tf.constant(0, dtype=tf.int32),
tf.nn.embedding_lookup(self.g_embeddings, self.start_token),
self.h0, g_predictions))
self.g_predictions = tf.transpose(self.g_predictions.stack(), perm=[1, 0, 2]) # batch_size x seq_length x vocab_size
# pretraining loss
self.pretrain_loss = -tf.reduce_sum(
tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
)
) / (self.sequence_length * self.batch_size)
# training updates
pretrain_opt = self.g_optimizer(self.learning_rate)
self.pretrain_grad, _ = tf.clip_by_global_norm(tf.gradients(self.pretrain_loss, self.g_params), self.grad_clip)
self.pretrain_updates = pretrain_opt.apply_gradients(zip(self.pretrain_grad, self.g_params))
#######################################################################################################
# Unsupervised Training
#######################################################################################################
self.g_loss = -tf.reduce_sum(
tf.reduce_sum(
tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
), 1) * tf.reshape(self.rewards, [-1])
)
g_opt = self.g_optimizer(self.learning_rate)
self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip)
self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params))
def generate(self, sess):
outputs = sess.run(self.gen_x)
return outputs
def pretrain_step(self, sess, x):
outputs = sess.run([self.pretrain_updates, self.pretrain_loss], feed_dict={self.x: x})
return outputs
def init_matrix(self, shape):
return tf.random_normal(shape, stddev=0.1)
def init_vector(self, shape):
return tf.zeros(shape)
def create_recurrent_unit(self, params):
# Weights and Bias for input and hidden tensor
self.Wi = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim]))
self.Ui = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim]))
self.bi = tf.Variable(self.init_matrix([self.hidden_dim]))
self.Wf = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim]))
self.Uf = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim]))
self.bf = tf.Variable(self.init_matrix([self.hidden_dim]))
self.Wog = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim]))
self.Uog = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim]))
self.bog = tf.Variable(self.init_matrix([self.hidden_dim]))
self.Wc = tf.Variable(self.init_matrix([self.emb_dim, self.hidden_dim]))
self.Uc = tf.Variable(self.init_matrix([self.hidden_dim, self.hidden_dim]))
self.bc = tf.Variable(self.init_matrix([self.hidden_dim]))
params.extend([
self.Wi, self.Ui, self.bi,
self.Wf, self.Uf, self.bf,
self.Wog, self.Uog, self.bog,
self.Wc, self.Uc, self.bc])
def unit(x, hidden_memory_tm1):
previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1)
# Input Gate
i = tf.sigmoid(
tf.matmul(x, self.Wi) +
tf.matmul(previous_hidden_state, self.Ui) + self.bi
)
# Forget Gate
f = tf.sigmoid(
tf.matmul(x, self.Wf) +
tf.matmul(previous_hidden_state, self.Uf) + self.bf
)
# Output Gate
o = tf.sigmoid(
tf.matmul(x, self.Wog) +
tf.matmul(previous_hidden_state, self.Uog) + self.bog
)
# New Memory Cell
c_ = tf.nn.tanh(
tf.matmul(x, self.Wc) +
tf.matmul(previous_hidden_state, self.Uc) + self.bc
)
# Final Memory cell
c = f * c_prev + i * c_
# Current Hidden state
current_hidden_state = o * tf.nn.tanh(c)
return tf.stack([current_hidden_state, c])
return unit
def create_output_unit(self, params):
self.Wo = tf.Variable(self.init_matrix([self.hidden_dim, self.num_emb]))
self.bo = tf.Variable(self.init_matrix([self.num_emb]))
params.extend([self.Wo, self.bo])
def unit(hidden_memory_tuple):
hidden_state, c_prev = tf.unstack(hidden_memory_tuple)
# hidden_state : batch x hidden_dim
logits = tf.matmul(hidden_state, self.Wo) + self.bo
# output = tf.nn.softmax(logits)
return logits
return unit
def g_optimizer(self, *args, **kwargs):
return tf.train.AdamOptimizer(*args, **kwargs)
rollout.py
import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops, control_flow_ops
import numpy as np
class ROLLOUT(object):
def __init__(self, lstm, update_rate):
self.lstm = lstm
self.update_rate = update_rate
self.num_emb = self.lstm.num_emb
self.batch_size = self.lstm.batch_size
self.emb_dim = self.lstm.emb_dim
self.hidden_dim = self.lstm.hidden_dim
self.sequence_length = self.lstm.sequence_length
self.start_token = tf.identity(self.lstm.start_token)
self.learning_rate = self.lstm.learning_rate
self.g_embeddings = tf.identity(self.lstm.g_embeddings)
self.g_recurrent_unit = self.create_recurrent_unit() # maps h_tm1 to h_t for generator
self.g_output_unit = self.create_output_unit() # maps h_t to o_t (output token logits)
#####################################################################################################
# placeholder definition
self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator
self.given_num = tf.placeholder(tf.int32)
# processed for batch
with tf.device("/cpu:0"):
self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim
ta_emb_x = tensor_array_ops.TensorArray(
dtype=tf.float32, size=self.sequence_length)
ta_emb_x = ta_emb_x.unstack(self.processed_x)
ta_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length)
ta_x = ta_x.unstack(tf.transpose(self.x, perm=[1, 0]))
#####################################################################################################
self.h0 = tf.zeros([self.batch_size, self.hidden_dim])
self.h0 = tf.stack([self.h0, self.h0])
gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
# When current index i < given_num, use the provided tokens as the input at each time step
def _g_recurrence_1(i, x_t, h_tm1, given_num, gen_x):
h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple
x_tp1 = ta_emb_x.read(i)
gen_x = gen_x.write(i, ta_x.read(i))
return i + 1, x_tp1, h_t, given_num, gen_x
# When current index i >= given_num, start roll-out, use the output as time step t as the input at time step t+1
def _g_recurrence_2(i, x_t, h_tm1, given_num, gen_x):
h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple
o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob
log_prob = tf.log(tf.nn.softmax(o_t))
next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32)
x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim
gen_x = gen_x.write(i, next_token) # indices, batch_size
return i + 1, x_tp1, h_t, given_num, gen_x
i, x_t, h_tm1, given_num, self.gen_x = control_flow_ops.while_loop(
cond=lambda i, _1, _2, given_num, _4: i < given_num,
body=_g_recurrence_1,
loop_vars=(tf.constant(0, dtype=tf.int32),
tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, self.given_num, gen_x))
_, _, _, _, self.gen_x = control_flow_ops.while_loop(
cond=lambda i, _1, _2, _3, _4: i < self.sequence_length,
body=_g_recurrence_2,
loop_vars=(i, x_t, h_tm1, given_num, self.gen_x))
self.gen_x = self.gen_x.stack() # seq_length x batch_size
self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length
def get_reward(self, sess, input_x, rollout_num, discriminator):
rewards = []
for i in range(rollout_num):
# given_num between 1 to sequence_length - 1 for a part completed sentence
for given_num in range(1, self.sequence_length ):
feed = {self.x: input_x, self.given_num: given_num}
samples = sess.run(self.gen_x, feed)
feed = {discriminator.input_x: samples, discriminator.dropout_keep_prob: 1.0}
ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed)
ypred = np.array([item[1] for item in ypred_for_auc])
if i == 0:
rewards.append(ypred)
else:
rewards[given_num - 1] += ypred
# the last token reward
feed = {discriminator.input_x: input_x, discriminator.dropout_keep_prob: 1.0}
ypred_for_auc = sess.run(discriminator.ypred_for_auc, feed)
ypred = np.array([item[1] for item in ypred_for_auc])
if i == 0:
rewards.append(ypred)
else:
# completed sentence reward
rewards[self.sequence_length - 1] += ypred
rewards = np.transpose(np.array(rewards)) / (1.0 * rollout_num) # batch_size x seq_length
return rewards
def create_recurrent_unit(self):
# Weights and Bias for input and hidden tensor
self.Wi = tf.identity(self.lstm.Wi)
self.Ui = tf.identity(self.lstm.Ui)
self.bi = tf.identity(self.lstm.bi)
self.Wf = tf.identity(self.lstm.Wf)
self.Uf = tf.identity(self.lstm.Uf)
self.bf = tf.identity(self.lstm.bf)
self.Wog = tf.identity(self.lstm.Wog)
self.Uog = tf.identity(self.lstm.Uog)
self.bog = tf.identity(self.lstm.bog)
self.Wc = tf.identity(self.lstm.Wc)
self.Uc = tf.identity(self.lstm.Uc)
self.bc = tf.identity(self.lstm.bc)
def unit(x, hidden_memory_tm1):
previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1)
# Input Gate
i = tf.sigmoid(
tf.matmul(x, self.Wi) +
tf.matmul(previous_hidden_state, self.Ui) + self.bi
)
# Forget Gate
f = tf.sigmoid(
tf.matmul(x, self.Wf) +
tf.matmul(previous_hidden_state, self.Uf) + self.bf
)
# Output Gate
o = tf.sigmoid(
tf.matmul(x, self.Wog) +
tf.matmul(previous_hidden_state, self.Uog) + self.bog
)
# New Memory Cell
c_ = tf.nn.tanh(
tf.matmul(x, self.Wc) +
tf.matmul(previous_hidden_state, self.Uc) + self.bc
)
# Final Memory cell
c = f * c_prev + i * c_
# Current Hidden state
current_hidden_state = o * tf.nn.tanh(c)
return tf.stack([current_hidden_state, c])
return unit
def update_recurrent_unit(self):
# Weights and Bias for input and hidden tensor
self.Wi = self.update_rate * self.Wi + (1 - self.update_rate) * tf.identity(self.lstm.Wi)
self.Ui = self.update_rate * self.Ui + (1 - self.update_rate) * tf.identity(self.lstm.Ui)
self.bi = self.update_rate * self.bi + (1 - self.update_rate) * tf.identity(self.lstm.bi)
self.Wf = self.update_rate * self.Wf + (1 - self.update_rate) * tf.identity(self.lstm.Wf)
self.Uf = self.update_rate * self.Uf + (1 - self.update_rate) * tf.identity(self.lstm.Uf)
self.bf = self.update_rate * self.bf + (1 - self.update_rate) * tf.identity(self.lstm.bf)
self.Wog = self.update_rate * self.Wog + (1 - self.update_rate) * tf.identity(self.lstm.Wog)
self.Uog = self.update_rate * self.Uog + (1 - self.update_rate) * tf.identity(self.lstm.Uog)
self.bog = self.update_rate * self.bog + (1 - self.update_rate) * tf.identity(self.lstm.bog)
self.Wc = self.update_rate * self.Wc + (1 - self.update_rate) * tf.identity(self.lstm.Wc)
self.Uc = self.update_rate * self.Uc + (1 - self.update_rate) * tf.identity(self.lstm.Uc)
self.bc = self.update_rate * self.bc + (1 - self.update_rate) * tf.identity(self.lstm.bc)
def unit(x, hidden_memory_tm1):
previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1)
# Input Gate
i = tf.sigmoid(
tf.matmul(x, self.Wi) +
tf.matmul(previous_hidden_state, self.Ui) + self.bi
)
# Forget Gate
f = tf.sigmoid(
tf.matmul(x, self.Wf) +
tf.matmul(previous_hidden_state, self.Uf) + self.bf
)
# Output Gate
o = tf.sigmoid(
tf.matmul(x, self.Wog) +
tf.matmul(previous_hidden_state, self.Uog) + self.bog
)
# New Memory Cell
c_ = tf.nn.tanh(
tf.matmul(x, self.Wc) +
tf.matmul(previous_hidden_state, self.Uc) + self.bc
)
# Final Memory cell
c = f * c_prev + i * c_
# Current Hidden state
current_hidden_state = o * tf.nn.tanh(c)
return tf.stack([current_hidden_state, c])
return unit
def create_output_unit(self):
self.Wo = tf.identity(self.lstm.Wo)
self.bo = tf.identity(self.lstm.bo)
def unit(hidden_memory_tuple):
hidden_state, c_prev = tf.unstack(hidden_memory_tuple)
# hidden_state : batch x hidden_dim
logits = tf.matmul(hidden_state, self.Wo) + self.bo
# output = tf.nn.softmax(logits)
return logits
return unit
def update_output_unit(self):
self.Wo = self.update_rate * self.Wo + (1 - self.update_rate) * tf.identity(self.lstm.Wo)
self.bo = self.update_rate * self.bo + (1 - self.update_rate) * tf.identity(self.lstm.bo)
def unit(hidden_memory_tuple):
hidden_state, c_prev = tf.unstack(hidden_memory_tuple)
# hidden_state : batch x hidden_dim
logits = tf.matmul(hidden_state, self.Wo) + self.bo
# output = tf.nn.softmax(logits)
return logits
return unit
def update_params(self):
self.g_embeddings = tf.identity(self.lstm.g_embeddings)
self.g_recurrent_unit = self.update_recurrent_unit()
self.g_output_unit = self.update_output_unit()
target_lstm.py
import tensorflow as tf
from tensorflow.python.ops import tensor_array_ops, control_flow_ops
class TARGET_LSTM(object):
def __init__(self, num_emb, batch_size, emb_dim, hidden_dim, sequence_length, start_token, params):
self.num_emb = num_emb
self.batch_size = batch_size
self.emb_dim = emb_dim
self.hidden_dim = hidden_dim
self.sequence_length = sequence_length
self.start_token = tf.constant([start_token] * self.batch_size, dtype=tf.int32)
self.g_params = []
self.temperature = 1.0
self.params = params
tf.set_random_seed(66)
with tf.variable_scope('generator'):
self.g_embeddings = tf.Variable(self.params[0])
self.g_params.append(self.g_embeddings)
self.g_recurrent_unit = self.create_recurrent_unit(self.g_params) # maps h_tm1 to h_t for generator
self.g_output_unit = self.create_output_unit(self.g_params) # maps h_t to o_t (output token logits)
# placeholder definition
self.x = tf.placeholder(tf.int32, shape=[self.batch_size, self.sequence_length]) # sequence of tokens generated by generator
# processed for batch
with tf.device("/cpu:0"):
self.processed_x = tf.transpose(tf.nn.embedding_lookup(self.g_embeddings, self.x), perm=[1, 0, 2]) # seq_length x batch_size x emb_dim
# initial states
self.h0 = tf.zeros([self.batch_size, self.hidden_dim])
self.h0 = tf.stack([self.h0, self.h0])
# generator on initial randomness
gen_o = tensor_array_ops.TensorArray(dtype=tf.float32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
gen_x = tensor_array_ops.TensorArray(dtype=tf.int32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
def _g_recurrence(i, x_t, h_tm1, gen_o, gen_x):
h_t = self.g_recurrent_unit(x_t, h_tm1) # hidden_memory_tuple
o_t = self.g_output_unit(h_t) # batch x vocab , logits not prob
log_prob = tf.log(tf.nn.softmax(o_t))
next_token = tf.cast(tf.reshape(tf.multinomial(log_prob, 1), [self.batch_size]), tf.int32)
x_tp1 = tf.nn.embedding_lookup(self.g_embeddings, next_token) # batch x emb_dim
gen_o = gen_o.write(i, tf.reduce_sum(tf.multiply(tf.one_hot(next_token, self.num_emb, 1.0, 0.0),
tf.nn.softmax(o_t)), 1)) # [batch_size] , prob
gen_x = gen_x.write(i, next_token) # indices, batch_size
return i + 1, x_tp1, h_t, gen_o, gen_x
_, _, _, self.gen_o, self.gen_x = control_flow_ops.while_loop(
cond=lambda i, _1, _2, _3, _4: i < self.sequence_length,
body=_g_recurrence,
loop_vars=(tf.constant(0, dtype=tf.int32),
tf.nn.embedding_lookup(self.g_embeddings, self.start_token), self.h0, gen_o, gen_x)
)
self.gen_x = self.gen_x.stack() # seq_length x batch_size
self.gen_x = tf.transpose(self.gen_x, perm=[1, 0]) # batch_size x seq_length
# supervised pretraining for generator
g_predictions = tensor_array_ops.TensorArray(
dtype=tf.float32, size=self.sequence_length,
dynamic_size=False, infer_shape=True)
ta_emb_x = tensor_array_ops.TensorArray(
dtype=tf.float32, size=self.sequence_length)
ta_emb_x = ta_emb_x.unstack(self.processed_x)
def _pretrain_recurrence(i, x_t, h_tm1, g_predictions):
h_t = self.g_recurrent_unit(x_t, h_tm1)
o_t = self.g_output_unit(h_t)
g_predictions = g_predictions.write(i, tf.nn.softmax(o_t)) # batch x vocab_size
x_tp1 = ta_emb_x.read(i)
return i + 1, x_tp1, h_t, g_predictions
_, _, _, self.g_predictions = control_flow_ops.while_loop(
cond=lambda i, _1, _2, _3: i < self.sequence_length,
body=_pretrain_recurrence,
loop_vars=(tf.constant(0, dtype=tf.int32),
tf.nn.embedding_lookup(self.g_embeddings, self.start_token),
self.h0, g_predictions))
self.g_predictions = tf.transpose(
self.g_predictions.stack(), perm=[1, 0, 2]) # batch_size x seq_length x vocab_size
# pretraining loss
self.pretrain_loss = -tf.reduce_sum(
tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
tf.reshape(self.g_predictions, [-1, self.num_emb]))) / (self.sequence_length * self.batch_size)
self.out_loss = tf.reduce_sum(
tf.reshape(
-tf.reduce_sum(
tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
tf.reshape(self.g_predictions, [-1, self.num_emb])), 1
), [-1, self.sequence_length]
), 1
) # batch_size
def generate(self, session):
# h0 = np.random.normal(size=self.hidden_dim)
outputs = session.run(self.gen_x)
return outputs
def init_matrix(self, shape):
return tf.random_normal(shape, stddev=1.0)
def create_recurrent_unit(self, params):
# Weights and Bias for input and hidden tensor
self.Wi = tf.Variable(self.params[1])
self.Ui = tf.Variable(self.params[2])
self.bi = tf.Variable(self.params[3])
self.Wf = tf.Variable(self.params[4])
self.Uf = tf.Variable(self.params[5])
self.bf = tf.Variable(self.params[6])
self.Wog = tf.Variable(self.params[7])
self.Uog = tf.Variable(self.params[8])
self.bog = tf.Variable(self.params[9])
self.Wc = tf.Variable(self.params[10])
self.Uc = tf.Variable(self.params[11])
self.bc = tf.Variable(self.params[12])
params.extend([
self.Wi, self.Ui, self.bi,
self.Wf, self.Uf, self.bf,
self.Wog, self.Uog, self.bog,
self.Wc, self.Uc, self.bc])
def unit(x, hidden_memory_tm1):
previous_hidden_state, c_prev = tf.unstack(hidden_memory_tm1)
# Input Gate
i = tf.sigmoid(
tf.matmul(x, self.Wi) +
tf.matmul(previous_hidden_state, self.Ui) + self.bi
)
# Forget Gate
f = tf.sigmoid(
tf.matmul(x, self.Wf) +
tf.matmul(previous_hidden_state, self.Uf) + self.bf
)
# Output Gate
o = tf.sigmoid(
tf.matmul(x, self.Wog) +
tf.matmul(previous_hidden_state, self.Uog) + self.bog
)
# New Memory Cell
c_ = tf.nn.tanh(
tf.matmul(x, self.Wc) +
tf.matmul(previous_hidden_state, self.Uc) + self.bc
)
# Final Memory cell
c = f * c_prev + i * c_
# Current Hidden state
current_hidden_state = o * tf.nn.tanh(c)
return tf.stack([current_hidden_state, c])
return unit
def create_output_unit(self, params):
self.Wo = tf.Variable(self.params[13])
self.bo = tf.Variable(self.params[14])
params.extend([self.Wo, self.bo])
def unit(hidden_memory_tuple):
hidden_state, c_prev = tf.unstack(hidden_memory_tuple)
# hidden_state : batch x hidden_dim
logits = tf.matmul(hidden_state, self.Wo) + self.bo
# output = tf.nn.softmax(logits)
return logits
return unit
参考文献:
(1)SeqGAN: Sequence GAN with Policy Gradient: https://zhuanlan.zhihu.com/p/50790727
(2)《SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient》论文笔记:https://zhuanlan.zhihu.com/p/23326430
(3)详解 GAN 在自然语言处理中的问题:原理、技术及应用: http://www.360doc.com/content/17/0210/18/32056199_628087216.shtml