该程序是学习现有的文章,然后学习预测下个字符,这样一个字符一个字符的学会写文章
先打印下char_indices
{'\n': 0, ' ': 1, '!': 2, '"': 3, "'": 4, '(': 5, ')': 6, ',': 7, '-': 8, '.': 9, '0': 10, '1': 11, '2': 12, '3': 13, '4': 14, '5': 15, '6': 16, '7': 17, '8': 18, '9': 19, ':': 20, ';': 21, '=': 22, '?': 23, '[': 24, ']': 25, '_': 26, 'a': 27, 'b': 28, 'c': 29, 'd': 30, 'e': 31, 'f': 32, 'g': 33, 'h': 34, 'i': 35, 'j': 36, 'k': 37, 'l': 38, 'm': 39, 'n': 40, 'o': 41, 'p': 42, 'q': 43, 'r': 44, 's': 45, 't': 46, 'u': 47, 'v': 48, 'w': 49, 'x': 50, 'y': 51, 'z': 52, 'ä': 53, 'æ': 54, 'é': 55, 'ë': 56}
然后构造训练数据,输入是 sentences,输出是 next_chars,构造成如下结构,sentences就是把句子拆分出来,next_chars,名字就看出来了,就是下一个字符
sentences next_chars
preface\n\n\nsupposing that truth is a woma n
face\n\n\nsupposing that truth is a woman-- w
e\n\n\nsupposing that truth is a woman--wha t
\nsupposing that truth is a woman--what t h
pposing that truth is a woman--what then ?
sing that truth is a woman--what then? i s
g that truth is a woman--what then? is t h
hat truth is a woman--what then? is ther e
truth is a woman--what then? is there n o
uth is a woman--what then? is there not g
is a woman--what then? is there not gro u
a woman--what then? is there not ground \n
woman--what then? is there not ground\nfo r
an--what then? is there not ground\nfor s u
-what then? is there not ground\nfor susp e
at then? is there not ground\nfor suspect i
then? is there not ground\nfor suspecting
n? is there not ground\nfor suspecting th a
is there not ground\nfor suspecting that a
there not ground\nfor suspecting that all
re not ground\nfor suspecting that all ph i
not ground\nfor suspecting that all philo s
ground\nfor suspecting that all philosop h
ound\nfor suspecting that all philosopher s
d\nfor suspecting that all philosophers, i
or suspecting that all philosophers, in s
suspecting that all philosophers, in so f
pecting that all philosophers, in so far
ting that all philosophers, in so far as
g that all philosophers, in so far as th e
hat all philosophers, in so far as they h
all philosophers, in so far as they hav e
l philosophers, in so far as they have b e
hilosophers, in so far as they have been \n
osophers, in so far as they have been\ndo g
phers, in so far as they have been\ndogma t
rs, in so far as they have been\ndogmatis t
in so far as they have been\ndogmatists,
so far as they have been\ndogmatists, ha v
far as they have been\ndogmatists, have f
r as they have been\ndogmatists, have fai l
s they have been\ndogmatists, have failed
hey have been\ndogmatists, have failed to
have been\ndogmatists, have failed to un d
ve been\ndogmatists, have failed to under s
been\ndogmatists, have failed to understa n
n\ndogmatists, have failed to understand w
ogmatists, have failed to understand wom e
atists, have failed to understand women- -
sts, have failed to understand women--th a
啊,有一点,就是上面的sentence,直接看起来好像不一样长,实际是一样长的,只不过前面三行,有两个\n,在打印的时候是两个字符,实际上\n是一个字符,导致的看起来不整齐
然后进行one-hot编码,这都是NLP的常规操作,然后输入输出数据shape为:
x.shape (200285, 40, 57)
y.shape (200285, 57)
神经网络模型为
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 128) 95232
_________________________________________________________________
dense_1 (Dense) (None, 57) 7353
=================================================================
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________
——————————————————————
总目录