根据上图的目录结构放置数据集后执行下面的代码。
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
df = pd.read_csv("./ETTm1.csv")
X = df.iloc[:, 0:7]
Y = df.iloc[:, 7]
X_features = pd.DataFrame(data=X ,columns=["MUFL","MULL","LUFL","LULL"])
X_train, X_test, y_train, y_test = train_test_split(
X_features,Y, test_size=0.2, random_state=1)
forest = RandomForestRegressor(criterion='mse', max_depth=15, n_estimators=1000,n_jobs=-1)
forest.fit(X_train, y_train)
y_test_pred = forest.predict(X_test)
print('R^2 test: %.3f' % (r2_score(y_test, y_test_pred)))
RandomForestRegressor方法中的参数可以通过GridSearchCV去调优。原数据集有八个特征数据通过feature_importances_方法可以得出每一列的重要性,每一特征对应重要性如下["HUFL","HULL","MUFL","MULL","LUFL","LULL"]
[0.08494287 0.0578021 0.18068544 0.20789311 0.34723973 0.12143676]
选择后四列重要性比较高的作为特征。