Python学习从0开始——Kaggle机器学习002代码参考

最新推荐文章于 2024-06-01 11:57:29 发布

岁月无声-往事随风

最新推荐文章于 2024-06-01 11:57:29 发布

阅读量625

点赞数 6

分类专栏： Python 文章标签： python 机器学习

本文链接：https://blog.csdn.net/qy8189/article/details/139092233

版权

Python 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

一、Basic Data Exploration

基础数据探索

Step 1: Loading Data

import pandas as pd
# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
# Fill in the line below to read the file into a variable home_data
home_data = pd.read_csv(iowa_file_path)
# Call line below with no argument to check that you've loaded the data correctly
step_1.check()

Step 2: Review The Data

# Print summary statistics in next line
home_data.describe()

# What is the average lot size (rounded to nearest integer)?
avg_lot_size =round(sum(home_data.LotArea)/len(home_data))
# As of today, how old is the newest home (current year - the date in which it was built)
newest_home_age = 2024-max(home_data.YrSold)
# Checks your answers
step_2.check()

二、Selecting Data for Modeling

为模型选择数据

Step 1: Specify Prediction Target

# print the list of columns in the dataset to find the name of the prediction target
home_data.columns

y = home_data.SalePrice

# Check your answer
step_1.check()

Step 2: Create X

# Create the list of features below
feature_names = ['LotArea','YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr','TotRmsAbvGrd']

# Select data corresponding to features in feature_names
X = home_data[feature_names]

# Check your answer
step_2.check()

# Review data
# print description or statistics from X
#print(_)
X.describe
# print the top few lines
#print(_)

Step 3: Specify and Fit Model

# from _ import _
from sklearn.tree import DecisionTreeRegressor
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit the model
iowa_model.fit(X,y)

# Check your answer
step_3.check()

Step 4: Make Predictions

predictions = iowa_model.predict(X)
print(predictions)

# Check your answer
step_4.check()

三、Model Validation

模型验证

Step 1: Split Your Data

# Import the train_test_split function and uncomment
# from _ import _
from sklearn.model_selection import train_test_split
# fill in and uncomment
# train_X, val_X, train_y, val_y = ____
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 1)
# Check your answer
step_1.check()

Step 2: Specify and Fit the Model

# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit iowa_model with the training data.
iowa_model.fit(train_X,train_y)

# Check your answer
step_2.check()

Step 3: Make Predictions with Validation data

# Predict with all validation observations
val_predictions = iowa_model.predict(val_X)

# Check your answer
step_3.check()

Step 4: Calculate the Mean Absolute Error in Validation Data

from sklearn.metrics import mean_absolute_error
val_mae = mean_absolute_error(val_y,val_predictions)

# uncomment following line to see the validation_mae
# print(val_mae)

# Check your answer
step_4.check()

四、Underfitting and Overfitting

欠拟合和过拟合

Step 1: Compare Different Tree Sizes

candidate_max_leaf_nodes = [5, 25, 50, 100, 250, 500]
best_mae=[]
# Write loop to find the ideal tree size from candidate_max_leaf_nodes
for idea_size in candidate_max_leaf_nodes:
    my_mae=get_mae(idea_size,train_X,val_X,train_y,val_y)
    best_mae.append(my_mae)
    print("Max leaf nodes: %d  \t\t Mean Absolute Error:  %d" %(idea_size, my_mae))

# Store the best value of max_leaf_nodes (it will be either 5, 25, 50, 100, 250 or 500)
best_tree_size = candidate_max_leaf_nodes[best_mae.index(min(best_mae))]

# Check your answer
step_1.check()

Step 2: Fit Model Using All Data

# Fill in argument to make optimal size and uncomment
# final_model = DecisionTreeRegressor(____)
final_model = DecisionTreeRegressor(max_leaf_nodes=best_tree_size, random_state=0)
# fit the final model and uncomment the next two lines
# final_model.fit(____, ____)
final_model.fit(X, y)
# Check your answer
step_2.check()

五、Random Forests

随机森林

Step 1: Use a Random Forest

from sklearn.ensemble import RandomForestRegressor

# Define the model. Set random_state to 1
rf_model = RandomForestRegressor(random_state=1)

# fit your model
rf_model.fit(train_X,train_y)

# Calculate the mean absolute error of your Random Forest model on the validation data
rf_val_mae =mean_absolute_error(val_y, rf_model.predict(val_X))

print("Validation MAE for Random Forest Model: {}".format(rf_val_mae))

# Check your answer
step_1.check()

六、Machine Learning Competitions

机器学习竞赛

Train a model for the competition

# To improve accuracy, create a new Random Forest model which you will train on all training data
rf_model_on_full_data = RandomForestRegressor(random_state=1)

# fit rf_model_on_full_data on all data from the training data
rf_model_on_full_data.fit(train_X,train_y)

# path to file you will use for predictions
test_data_path = '../input/test.csv'

# read test data file using pandas
test_data = pd.read_csv(test_data_path)

# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = test_data[features]

# make predictions which we will submit. 
test_preds = rf_model_on_full_data.predict(test_X)

# Check your answer (To get credit for completing the exercise, you must get a "Correct" result!)
step_1.check()
# step_1.solution()

七、结束

所有教程和练习完成后：

岁月无声-往事随风

关注

6
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
Python学习从0开始——Kaggle机器学习002代码参考

【代码】Python学习从0开始——Kaggle机器学习002代码参考。
复制链接

扫一扫

专栏目录