基于rnn网络的自动写唐诗机器人
项目目录:
my_rnn_model用来保存训练的模型;
clean_txt.py 脚本负责将清洗唐诗集(poetry.txt)
代码如下
poem_dir = "./poetry.txt"
# 读txt文件
with open(poem_dir, "r", encoding="utf-8") as f:
lines = f.readlines()
# 定义一个诗集列表用来装诗
poem_collections = []
with open("new_poetry.txt", "w", encoding="utf-8") as f1:
for line in lines:
# maxsplit=1 是将line分成两份
if len(line)<10 or len(line)>100:
continue
name, content = line.strip().split(":", maxsplit=1)
poem_collections.append(content)
for word in name:
# 清洗唐诗集中的不规则字符
if word not in "_()\/:《[] ":
f1.write(word)
f1.write(":")
for word in content:
if word not in "_()\/:《[] ":
f1.write(word)
f1.write("\n")
接下来就训练模型的代码 my_poem_train.py:
1.在训练有关文字的模型时,往往需要先构建文字字典,用来包含所有文字和其索引,这样后来就可以构建词向量。代码如下:
# -------------------------------数据预处理---------------------------#
# 诗库目录
poem_dir = "./new_poetry.txt"
# 读txt文件
with open(poem_dir, "r", encoding="utf-8") as f:
lines =