使用RNN像进行写作

最新推荐文章于 2024-01-27 21:02:27 发布

Mr Robot

最新推荐文章于 2024-01-27 21:02:27 发布

阅读量305

点赞数 2

分类专栏： NLP

本文链接：https://blog.csdn.net/leva345/article/details/126283383

版权

NLP 专栏收录该内容

25 篇文章

订阅专栏

活动地址：CSDN21天学习挑战赛

学习的最大理由是想摆脱平庸，早一天就多一份人生的精彩；迟一天就多一天平庸的困扰。

网络文学因为其创作特点，作者要保障更新量，稳定更新时间，萝卜快了不洗泥，写完就发布，错别字都不改的情况十分普遍。甚至现在一些小说都是根据人工智能胡拼乱凑的，这篇博文咱就看看怎么用人工智能生成一篇名作家的作品。

就拿莎士比亚举个栗子吧，核心思想非常简单：以莎士比亚写的真实文本作为输入，并输入到即将要训练的RNN中；然后，用训练好的模型来生成新文本，这些文本看起来像是英国最伟大的作家所写的。

简单起见，这里将使用基于TensorFlow运行的框架TFLearn（http://tflearn.org/），这里所使用的例子只是标准版的一部分（https：//github.com/tflearn/tflearn/blob/master/examples/nlp/lstm_generator_shakespeare.py），所开发的模型是字符级RNN语言模型，考虑的序列则是字符序列而不是单词序列。

用pip方式安装TFLearn：

pip install -T TFLearn

导入一些有用的模块，并下载莎士比亚写的文本。本例使用的文本位于https：//raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/shakespeare_input.txt：

from __future__ import absolute_import, division, print_function

import os
import pickle
from six.moves import urllib

import tflearn
from tflearn.data_utils import *

path = "shakespeare_input.txt"
char_idx_file = 'char_idx.pickle'

if not os.path.isfile(path):
    urllib.request.urlretrieve("https://raw.githubusercontent.com/tflearn/tflearn.github.io/master/resources/shakespeare_input.txt", path)

将输入文本转换为向量，并通过string_to_semi_redundant_sequences()返回解析的序列和目标以及关联的字典（函数输出一个元组：包括输入、目标和字典）：

maxlen = 25

char_idx = None
if os.path.isfile(char_idx_file):
  print('Loading previous char_idx')
  char_idx = pickle.load(open(char_idx_file, 'rb'))

X, Y, char_idx = \
    textfile_to_semi_redundant_sequences(path, seq_maxlen=maxlen, redun_step=3,
                                         pre_defined_char_idx=char_idx)

pickle.dump(char_idx, open(char_idx_file,'wb'))

定义由三个LSTM组成的RNN，每个LTSM有512个节点，并返回完整序列而不是仅返回最后一个序列。请注意，使用概率为50％的drop-out模块来连接LSTM模块。最后一层是全连接层，softmax长度等于字典尺寸。损失函数采用categorical_crossentropy，优化器采用Adam：

g = tflearn.input_data([None, maxlen, len(char_idx)])
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512, return_seq=True)
g = tflearn.dropout(g, 0.5)
g = tflearn.lstm(g, 512)
g = tflearn.dropout(g, 0.5)
g = tflearn.fully_connected(g, len(char_idx), activation='softmax')
g = tflearn.regression(g, optimizer='adam', loss='categorical_crossentropy',
                       learning_rate=0.001)

现在可以用库函数flearn.models.generator.SequenceGenerator(network，dictionary=char_idx，seq_maxlen=maxle，clip_gradients=5.0，checkpoint_path=‘model_shakespeare’)生成序列：

m = tflearn.SequenceGenerator(g, dictionary=char_idx,
                              seq_maxlen=maxlen,
                              clip_gradients=5.0,
                              checkpoint_path='model_shakespeare')

经过50次迭代，从输入文本中选取一个随机序列并生成一个新的文本。温度参数控制所创建序列的多样性；接近于0的温度创建的序列看起来就像用于训练的样本，温度越高，结果越多样：

for i in range(50):
    seed = random_sequence_from_textfile(path, maxlen)
    m.fit(X, Y, validation_set=0.1, batch_size=128,
          n_epoch=1, run_id='shakespeare')
    print("-- TESTING...")
    print("-- Test with temperature of 1.0 --")
    print(m.generate(600, temperature=1.0, seq_seed=seed))
    print("-- Test with temperature of 0.5 --")
    print(m.generate(600, temperature=0.5, seq_seed=seed))

当一件新的未知或遗忘的艺术作品需要被鉴定归于某位作者时，就会有著名学者将这件作品与作者的其他作品进行比较。学者们所做的是在作者已知作品的文本序列中寻找共同特征，并希望在鉴定作品中找到相似的特征。本节的工作方式与之前的相似：RNN学习莎士比亚作品中的特征，然后这些特征被用来产生新的、从未见过的文本，这些文本很好地代表了最伟大的英国作家的写作风格。