案例：决策树decision tree泰坦尼克号幸存者预测

最新推荐文章于 2023-07-17 11:46:01 发布

hantuo001

最新推荐文章于 2023-07-17 11:46:01 发布

阅读量856

点赞数 2

分类专栏：案例分享文章标签： python 决策树机器学习

本文链接：https://blog.csdn.net/ZhanZhan1231/article/details/103688990

版权

本文通过Python实现决策树模型，预测泰坦尼克号乘客的生存情况。内容涵盖数据预处理、模型训练、手动调整参数(max_depth, min_impurity_split)以优化模型，并利用GridSearchCV进行自动参数选择，解决过拟合问题，寻找最佳参数组合。" 117752585,11018815,深入理解C语言浮点数内存存储：IEEE754详解,"['C语言', '数据存储', '算法', '程序人生']

摘要由CSDN通过智能技术生成

案例包括：（1）数据预处理（2）模型训练（3）做优参数组合选择（交叉验证）

1 数据预处理

import pandas as pd
def read_data(path):
    """数据预处理"""
    df=pd.read_csv(path,index_col=0)
    #丢弃无用数据
    df.drop(['Name','Cabin','Ticket'],axis=1,inplace=True)
    #处理性别数据
    df['Sex']=(df['Sex']=='male').astype('int')
    #处理Embarked数据
    labels=df['Embarked'].unique().tolist()
    df=df.replace(to_replace=labels,value=[0,1,2,3])
    #处理缺失数据
    df=df.fillna(0)
    return df
train=read_data('train.csv')
train.head(3)

	Survived	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked
PassengerId
1	0	3	1	22.0	1	0	7.2500	0
2	1	1	0	38.0	1	0	71.2833	1
3	1	3	0	26.0	0	0	7.9250	0

2 模型训练

from sklearn.cross_validation import train_test_split
X=train.iloc[:,1:]
y=train.iloc[:,0]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
print('train dataset:{0};test dataset:{1}'.format