14.4 使用MLP神经网络进行预测
(1)创建函数Performance_tune,实现了数据准备、训练测试集划分、模型实施以及测试集上的预测功能。首先,通过调用Prepare和Indicator函数对数据进行准备和特征工程。接着,使用互信息分析筛选对目标列具有显著互信息的滞后特征。然后,利用PowerTransformer进行数据归一化和逆转换,划分训练集和测试集。构建MLP神经网络模型,对训练集进行拟合,然后在测试集上进行预测。最后,计算并输出预测结果的MAE和RMSE指标,并绘制预测与实际值的对比图。函数的返回值包括预测结果、实际值和它们的比较表。
def Performance_tune(Data, col, Lag_list, Roll_window, mic):
seed(111)
# 准备数据
data = Prepare(Data, col, Lag_list, Roll_window)
df = Indicator(Data)
data = pd.concat([df, data], axis=1)
data = data.tail(data.shape[0] - 100)
# 计算滞后列与目标列的互信息
MIC = mutual_information_lag(data, col, 100)
val = list(MIC[1].values())
selected = [i for i in val if i > mic] # 选择与目标列具有更高互信息的列,其互信息分数超过 "mic"
bestCol = len(selected)
Best_col = list(MIC[1].keys())
Best_col = Best_col[:bestCol]
Remove_col = list(set(data.columns[:-1]) - set(Best_col))
data.drop(Remove_col, axis=1, inplace=True)
# 定义训练数据
x_scaler = PowerTransformer()
X = data.values[:, :-1]
X = x_scaler.fit_transform(X)
X_train, X_test = X[:-60, :], X[-60:, :]
y_scaler = PowerTransformer()
y = data.values[:, -1].reshape(-1, 1)
y = y_scaler.fit_transform(y)
y_train, y_test = y[:-60, :], y[-60:, :]
n_steps = X.shape[1]
print(n_steps)
# 定义模型
model = Sequential()
model.add(Dense(1425, activation='relu', input_dim=n_steps, kernel_initializer='he_uniform'))
model.add(Dropout(0.058665802462848915))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error', metrics=[tf.keras.metrics.RootMeanSquaredError()])
# 拟合模型
model.fit(X_train, y_train, epochs=500, batch_size=32, verbose=0)
model.summary()
# 展示预测
predict = model.predict(X_test, verbose=0)
mae = mean_absolute_error(y_test, predict)
rmse = sqrt(mean_squared_error(y_test, predict))
print('MAE before inverse Scaling: %f' % mae)
print('RMSE before inverse Scaling: %f' % rmse)
y_test = y_scaler.inverse_transform(y_test)
y_test = ['%.1f' % y for y in y_test]
y_test = [float(i) for i in y_test]
predict = y_scaler.inverse_transform(predict)
predict = ['%.3f' % y for y in predict]
predict = [float(i) for i in predict]
x = data[-60:].index
x = pd.DataFrame(Data.index[-60:]).values
# 反向缩放后的MAE和RMSE准确性
mae = mean_absolute_error(y_test, predict)
rmse = sqrt(mean_squared_error(y_test, predict))
compare = pd.DataFrame(columns=['Actual', 'Predict'])
compare['Actual'] = y_test
compare['Predict'] = predict
print('=======')
print('MAE after inverse Scaling: %f' % mae)
print('RMSE after inverse Scaling: %f' % rmse)
print('======Summary======')
print('Lag_list:', '===>', 'From:', min(Lag_list), 'To:', max(Lag_list))
print('Roll_window:', '===>', 'From:', min(Roll_window), 'To:', max(Roll_window))
print('==============================')
# 绘制预测和实际价格的图表
plt.plot(x, predict, '--bo', label="predict", linestyle='dashed', color='C6')
plt.plot(x, y_test, marker='o', label="Actual")
plt.xticks(rotation=90)
plt.legend()
plt.show()
return predict, y_test, compare
#%%
在上述代码中,函数Performance_tune接受数据框(Data)、目标列名(col)、滞后列表(Lag_list)、滚动窗口列表(Roll_window)和互信息分数阈值(mic)作为参数。
(2)下面代码调用了上面的函数Performance_tune,传递了金价数据框 Gold,目标列为 'Close',滞后列表为从1到30,滚动窗口列表为从2到60,以及互信息分数阈值为3。函数Performance_tune将进行数据准备、模型训练、测试集预测和性能评估的操作,返回的Close 对象包含了预测结果、实际值以及它们的比较表。
Close = Performance_tune(Gold,'Close',[i for i in range(1,30)],[i for i in range(2,60)],3)
执行后会输出下面的信息,这些信息提供了有关模型性能和配置的详细信息,以便更好地理解模型的行为和准确性。并且会绘制可视化图表展示预测结果和实际值的对比,效果如图7-8所示。
188
2023-12-07 20:03:36.542539: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-07 20:03:36.655383: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-12-07 20:03:36.656169: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
####省略部分输出结果
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1425) 269325
_________________________________________________________________
dropout (Dropout) (None, 1425) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 1426
=================================================================
Total params: 270,751
Trainable params: 270,751
Non-trainable params: 0
_________________________________________________________________
MAE before inverse Scaling: 0.016588
RMSE before inverse Scaling: 0.018311
=======
MAE after inverse Scaling: 8.961983
RMSE after inverse Scaling: 9.891533
======Summary======
Lag_list: ===> From: 1 To: 29
Roll_window: ===> From: 2 To: 59
==============================
图7-8 预测结果和实际值的对比
(3)请看下面的代码,Close[2] 是在 "performance" 函数中创建的一个数据框,其中包含了实际价格和预测价格。该数据框的第二列(索引为2的列)包含了模型在测试集上的实际价格和相应的预测价格,使用户能够直观地比较模型的预测性能和真实情况。这样的数据框可以用于进一步的分析和可视化,以评估模型的准确性和行为。
Close[2]
执行后会输出:
Actual Predict
0 1704.5 1713.324
1 1702.4 1714.632
2 1709.2 1720.986
3 1710.0 1717.772
4 1699.5 1711.183
5 1712.7 1703.146
6 1727.1 1732.169
7 1719.0 1732.443
8 1717.7 1726.512
9 1719.1 1728.283
#省略部分输出
50 1645.3 1663.442
51 1623.3 1630.576
52 1626.7 1639.227
53 1660.4 1645.359
54 1658.5 1655.205
55 1662.4 1672.056
56 1692.9 1687.496
57 1721.1 1719.208
58 1711.4 1720.881
59 1711.7 1724.592