#背景
来自GitHub上《tensorflow_cookbook》【https://github.com/nfmcclure/tensorflow_cookbook/tree/master/09_Recurrent_Neural_Networks】
Implementing an LSTM Model for Text Generation
We show how to implement a LSTM (Long Short Term Memory) RNN for Shakespeare language generation. (Word level vocabulary)
将展示如何为莎士比亚语言生成实现LSTM(长短期记忆)RNN。 (词汇词汇)
#代码
# Implementing an LSTM RNN Model
#------------------------------
# Here we implement an LSTM model on all a data set of Shakespeare works.
'''
We start by loading the necessary libraries and resetting the default computational graph.
我们首先加载必要的库并重置默认的计算图。
'''
import os
import re
import string
import requests
import numpy as np
import collections
import random
import pickle
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.python.framework import ops
ops.reset_default_graph()
'''We start a computational graph session.'''
sess = tf.Session()
'''
Next, it is important to set the algorithm and data processing parameters.
接下来,设置算法和数据处理参数很重要。
Parameter : Descriptions
min_word_freq: Only attempt to model words that appear at least 5 times. 仅尝试对出现至少5次的单词进行建模.
rnn_size: size of our RNN (equal to the embedding size) RNN大小(等于嵌入大小)
epochs: Number of epochs to cycle through the data
batch_size: How many examples to train on at once
learning_rate: The learning rate or the convergence paramter 学习率或收敛度参数
training_seq_len: The length of the surrounding word group (e.g. 10 = 5 on each side) 周围单词组的长度(例如每侧10 = 5)
embedding_size: Must be equal to the rnn_size
save_every: How often to save the model
eval_every: How often to evaluate the model
prime_texts: List of test sentences
'''
# Set RNN Parameters
min_word_freq = 5 # Trim the less frequent words off
rnn_size = 128 # RNN Model size
epochs = 10 # Number of epochs to cycle through data
batch_size = 100 # Train on this many examples at once
learning_rate = 0.001 # Learning rate
training_seq_len = 50 # how long of a word group to consider
embedding_size = rnn_size # Word embedding size
save_every = 500 # How often to save model checkpoints
eval_every = 50 # How often to evaluate the test sentences
prime_texts = ['thou art more', 'to be or not to', 'wherefore art thou']
# Download/store Shakespeare data
data_dir = 'temp'
data_file = 'shakespeare.txt'
model_path = 'shakespeare_model'
full_model_dir = os.path.join(data_dir, model_path)
# Declare punctuation to remove, everything except hyphens and apostrophes
# 声明标点符号以删除除连字符和撇号之外的所有内容
punctuation = string.punctuation
punctuation = ''.join([x for x in punctuation if x not in ['-', "'"]])
# Make Model Directory
if not os.path.exists(full_model_dir):
os.makedirs(full_model_dir) '''用于递归创建目录。'''
# Make data directory
if not os.path.exists(data_dir):
os.makedirs(data_dir)
'''
Download the data if we don't have it saved already. The data comes from the Gutenberg Project
'''
print('Loading Shakespeare Data')
# Check if file is downloaded.
if not os.path.isfile(os.path.join(data_dir, data_file)):
print('Not found, downloading Shakespeare texts from www.gutenberg.org')
shakespeare_url = 'http://www.gutenberg.org/cache/epub/100/pg100.txt'
# Get Shakesp