RNN with keras_rnn ke'r-CSDN博客

1.简单的RNN程序--文本生成

RNN在NLP领域有着广泛的应用，其中一个应用就是构建语言模型。语言模型让我们可以在给定前文的情况下预测下一个词的可能性。

在这，我们基于语言模型，在给定前十个字符的情况下去预测下一个字符。

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Jun  4 17:18:03 2018

@author: john
"""

from keras.layers import Dense,Activation
from keras.layers.recurrent import SimpleRNN
from keras.models import Sequential
from keras.utils.vis_utils import plot_model
import numpy as np

####load data
fin=open('./11-0.txt','rb')
lines=[]
for line in fin:
    line=line.strip().lower()
    line=line.decode('ascii','ignore')
    if len(line)==0:
        continue
    lines.append(line)
fin.close()
text=' '.join(lines)

####creat the lookup tables
chars=set([c for c in text])
nb_chars=len(chars)
char2index=dict((c,i) for i,c in enumerate(chars))
index2char=dict((i,c) for i,c in enumerate(chars))

####create the input and label texts
SEQLEN=10
STEP=1

input_chars=[]
label_chars=[]

for i in range(0,len(text)-SEQLEN,STEP):
    input_chars.append(text[i:i+SEQLEN])
    label_chars.append(text[i+SEQLEN])

####vectorize teh input data and label texts
X=np.zeros((len(input_chars),SEQLEN,nb_chars),dtype=np.bool)
y=np.zeros((len(input_chars),nb_chars),dtype=np.bool)    
for i,input_char in enumerate(input_chars):
    for j,ch in enumerate(input_char):
        X[i,j,char2index[ch]]=1
    y[i,char2index[label_chars[i]]]=1


####define the model
HIDDEN_SIZE=128
BATCH_SIZE=128
NUM_ITERATIONS=25
NUM_EPOCHES_PER_ITERATION=1
NUM_PERDS_PER_EPOCH=100
model=Sequential()
model.add(SimpleRNN(HIDDEN_SIZE,return_sequences=False,input_shape=(SEQLEN,nb_chars),
                    unroll=True))
model.add(Dense(nb_chars,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='rmsprop')

####test
for iteration in range(NUM_ITERATIONS):
    print('='*50)
    print('iteration #:%d'%(iteration))
    model.fit(X,y,batch_size=BATCH_SIZE,epochs=NUM_EPOCHES_PER_ITERATION)
    
    test_idx=np.random.randint(len(input_chars))
    test_chars=input_chars[test_idx]
    print('Generating from seed:%s'%(test_chars))
    print(test_chars,end=' ')
    for i in range(NUM_PERDS_PER_EPOCH):
        Xtest=np.zeros((1,SEQLEN,nb_chars))
        for i,ch in enumerate(test_chars):
            Xtest[0,i,char2index[ch]]=1
        pred=model.predict(Xtest,verbose=0)[0]
        ypred=index2char[np.argmax(pred)]
        print(ypred,end='')
        test_chars=test_chars[1:]+ypred

测试就是给定它十个字符，让它一直循环生成下去，看看能生成什么玩意。