sklearn 随机森林进行数据预测

最新推荐文章于 2024-07-29 21:22:31 发布

SpecialRiot

最新推荐文章于 2024-07-29 21:22:31 发布

阅读量1.8k

点赞数

分类专栏：机器学习文章标签：随机森林算法机器学习

本文链接：https://blog.csdn.net/SpecialRiot/article/details/124701869

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

本文介绍了如何利用RandomForestRegressor进行回归预测，并通过GridSearchCV优化参数。关键步骤包括数据集加载、特征选择（选取后四列重要性高者）、训练与测试，以及使用feature_importances_评估特征重要性。

摘要由CSDN通过智能技术生成

数据集： https://download.csdn.net/download/SpecialRiot/85339262https://download.csdn.net/download/SpecialRiot/85339262

根据上图的目录结构放置数据集后执行下面的代码。

from sklearn.ensemble import RandomForestRegressor
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

df = pd.read_csv("./ETTm1.csv")
X = df.iloc[:, 0:7]
Y = df.iloc[:, 7]
X_features = pd.DataFrame(data=X ,columns=["MUFL","MULL","LUFL","LULL"])
X_train, X_test, y_train, y_test = train_test_split(
    X_features,Y, test_size=0.2, random_state=1)
forest = RandomForestRegressor(criterion='mse', max_depth=15, n_estimators=1000,n_jobs=-1)
forest.fit(X_train, y_train)
y_test_pred = forest.predict(X_test)
print('R^2 test: %.3f' % (r2_score(y_test, y_test_pred)))

RandomForestRegressor方法中的参数可以通过GridSearchCV去调优。原数据集有八个特征数据通过feature_importances_方法可以得出每一列的重要性，每一特征对应重要性如下["HUFL","HULL","MUFL","MULL","LUFL","LULL"]

[0.08494287 0.0578021 0.18068544 0.20789311 0.34723973 0.12143676]

选择后四列重要性比较高的作为特征。