跟着《python深度学习》做了个书上小项目,现在总结下该项目内容。
-
获取数据(这里获取的是IMDB数据集,其中有train_data和train_label两个标签,train_data:英文句子,train_label:正/负面(0,1))
from keras.datasets import imdb (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
-
数据预处理(要将列表转化为张量,一般采用填充列表或one-hot编码方式,填充列表是将列表拥有相同的长度,ont-hot编码是将数据转化为只有0和1)
import numpy as np def vectorize_sequences(sequences, dimension=10000): results = np.zeros((len(sequences), dimension)) for i, sequence in enumerate(sequences): results[i, sequence] = 1 return results x_train = vectorize_sequences(train_data) x_test = vectorize_sequences(test_data) # 或者标签向量化采用下面方式 y_train = np.asarray(train_labels).astype('float32') y_test = np.asarray(train_labels).astype('float32')
-
构建网络(这里采用两层relu层(units=16)+一层sigmoid层)
from keras import models from keras import layers model = models.Sequential() model.add(layers.Dense(16, activation='relu', input_shape=(10000, ))) model.add(layers.Dense(16, activation='relu')) model.add(layers.Dense(1, activation='sigmoid'))
-
编译模型(采用二元交叉熵损失函数(binary_crossentropy))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
-
验证模型(留出10000数据作为验证,其余进行训练,训练后发现第四轮开始验证损失和精度越来越不好,说明开始过拟合)
# 留出验证集 x_val = x_train[:10000] partial_x_train = x_train[10000:] y_val = y_train[:10000] partial_y_train = y_train[10000:] # 训练模型 model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) history = model.fit(partial_x_train, partial_y_train, epochs=20, batch_size=512, validation_data=(x_val, y_val))
-
重新训练网络,设置训练轮数为4轮。
model = models.Sequential() model.add(layers.Dense(16, activation='relu', input_shape=(10000,))) model.add(layers.Dense(16, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=4, batch_size=512) result = model.evaluate(x_test, y_test) print(result)