实例3:
通过波士顿的房屋数据预测房价
这是一个回归问题,因为最终输出的房价是一个连续值。
加载数据:
from keras.datasets import boston_housing
(train_data, train_targets), (test_data, test_targets) = boston_housing.load_data()
预处理:
每个样本是32个特征值,数据各不相同,需要进行特征缩放实现特征标准化:特征值减去均值除以标准差,这样处理过的特征值的均值是0,标准差是1
# 预处理, 特征缩放
mean = train_data.mean(axis=0)# 均值
train_data -= mean
std = train_data.std(axis=0) # 标准差
train_data /= std
test_data -= mean
test_data /= std
构建网路、设置优化器、损失函数及评估标准:
这里定义构建网络函数,为了下一步训练时调用方便,因为我们训练时要采用K折验证法,多次调用模型
采用了回归问题中常用的损失函数:均方差MSE,以及评估标准(监控指标):平均绝对误差MAE
def build_model():
# Because we will need to instantiate
# the same model multiple times,
# we use a function to construct it.
model = models.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(train_data.shape[1],)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae']) # 标量回归问题 损失函数用mse,评估用mae
return model
训练及验证模型:
由于样本集很小,可能选择的训练及验证数据不同,结果会造成较大区别,所以可以采用K折验证法,比较巧妙:)如图:
from keras import backend as K
# Some memory clean-up
K.clear_session()
import numpy as np
k = 4
num_val_samples = len(train_data) // k
num_epochs = 100
# all_scores = []
all_mae_histories = []
for i in range(k):
print('processing fold #', i)
# Prepare the validation data: data from partition # k
val_data = train_data[i * num_val_samples: (i + 1) * num_val_samples]
val_targets = train_targets[i * num_val_samples: (i + 1) * num_val_samples]
# Prepare the training data: data from all other partitions
partial_train_data = np.concatenate(
[train_data[:i * num_val_samples],
train_data[(i + 1) * num_val_samples:]],
axis=0)
partial_train_targets = np.concatenate(
[train_targets[:i * num_val_samples],
train_targets[(i + 1) * num_val_samples:]],
axis=0)
# Build the Keras model (already compiled)
model = build_model()
# Train the model (in silent mode, verbose=0)
history = model.fit(partial_train_data, partial_train_targets,
validation_data=(val_data, val_targets),
epochs=num_epochs, batch_size=1, verbose=0)
mae_history = history.history['val_mean_absolute_error']
all_mae_histories.append(mae_history)
# Evaluate the model on the validation data
# val_mse, val_mae = model.evaluate(val_data, val_targets, verbose=0)
# all_scores.append(val_mae)
绘制每折训练结果的图形:
import matplotlib.pyplot as plt
plt.plot(range(1, len(average_mae_history) + 1), average_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
前面10次左右的训练验证的平均绝对误差比较大,没有进入平稳状态,可以过滤掉;
图形现在也比较棱角分明,可以通过每个点转换为前一个点的指数移动平均值来平滑图形;
def smooth_curve(points, factor=0.9):
smoothed_points = []
for point in points:
if smoothed_points:
previous = smoothed_points[-1]
smoothed_points.append(previous * factor + point * (1 - factor))
else:
smoothed_points.append(point)
return smoothed_points
smooth_mae_history = smooth_curve(average_mae_history[10:])
import matplotlib.pyplot as plt
plt.plot(range(1, len(smooth_mae_history) + 1), smooth_mae_history)
plt.xlabel('Epochs')
plt.ylabel('Validation MAE')
plt.show()
这样可以清晰的看到最低点:)