1、官网下载后,直接运行lmdb_lstm.py。但是下载的速度很慢。
解决办法: 在Ubuntu16.04系统下,首先下载imdb.npz数据集(百度网盘下载地址),只需将 imdb.npz放在 ~/.keras/datasets/ 目录下即可。
在命令窗口进入imdb.npz数据集当前下载目录,然后输入:
sudo mkdir ~/.keras/datasets/
sudo cp imdb.npz ~/.keras/datasets/
运行例程,即可跳过下载数据过程。
2、程序分析
# -*- coding: utf-8 -*-
# 加载 Keras 模型相关的 Python 模块
import numpy as np
np.random.seed(1337) # for reproducibility
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Activation, Embedding
from keras.layers import LSTM
from keras.datasets import imdb
max_features = 20000
maxlen = 80 # cut texts after this number of words (among top max_features most common words)
batch_size = 32
# 加载数据
print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
# 数据预处理
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
# 构建 LSTM 模型
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128, dropout=0.2)) # 词嵌入
model.add(LSTM(128, dropout_W=0.2, dropout_U=0.2)) # LSTM 层
model.add(Dense(1))# 二分类层
model.add(Activation('sigmoid'))
model.summary() # 打印模型
# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print('Train...')
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=15,
validation_data=(X_test, y_test))
score, acc = model.evaluate(X_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
参考文献: