Random Forest

最新推荐文章于 2024-07-21 13:38:21 发布

kaggle菜市场菜鸟

最新推荐文章于 2024-07-21 13:38:21 发布

阅读量242

点赞数 13

文章标签：随机森林算法机器学习 python 决策树 scikit-learn pandas

本文链接：https://blog.csdn.net/weixin_59907082/article/details/137412990

版权

1---Introduction

In the previous chapter we talked about underfitting and overfitting, a phenomenon that still exists in modeling techniques today.🤷🏼
But today I'm going to introduce a clever model: random forest

Random forests use many trees, and it is predicted by the average prediction of each decision tree. There is no doubt that random forests are generally more accurate than single decision tree models. And it works well with other default parameters.👍👍

2---Following Variables👇🏻

train_X
val_X
train_y
val_y

3---Coding it

We build a random forest model similarly to how we built a decision tree in scikit-learn - this time using the RandomForestRegressor class instead of DecisionTreeRegressor.

import pandas as pd
    
# Load data
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
# Filter rows with missing values
melbourne_data = melbourne_data.dropna(axis=0)
# Choose target and features
y = melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 
                        'YearBuilt', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]

from sklearn.model_selection import train_test_split

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

forest_model = RandomForestRegressor(random_state=1)
forest_model.fit(train_X, train_y)
melb_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, melb_preds))

runcell(0, '/Users/mac/Desktop/untitled6.py')
191669.7536453626

4---Congrats👏👏👏

This is a big progress over the $250,000 error we found in "Measure your model validation". What's more, random forests allow parameter tuning, but even if we don't, it works reasonably.