决策树实例及原理

最新推荐文章于 2024-07-27 15:52:24 发布

明镜应缺

最新推荐文章于 2024-07-27 15:52:24 发布

阅读量379

点赞数

分类专栏：机器学习文章标签：决策树

本文链接：https://blog.csdn.net/weixin_42662126/article/details/96911491

版权

本文介绍了决策树的应用实例——泰坦尼克号预测生死，并深入探讨了决策树的原理，包括其显著的优缺点，如易于理解和解释、数据准备需求少等。

摘要由CSDN通过智能技术生成

一、实例

泰坦尼克号预测生死

import pandas as pd
from sklearn.tree import DecisionTreeClassifier #决策树分类器。
from sklearn.feature_extraction import DictVectorizer # 将特征值映射列表转换为向量
from sklearn.model_selection import train_test_split # 将数据集拆分成训练集和测试集


def decision():
    """
    决策树对泰坦尼克号进行预测生死
    :return:
    """
    # 读取数据
    titan = pd.read_excel(r"http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls")
    # 处理数据，找到特征值和目标值(目标值为是否存活，特征值为对是否存活有关联的值）
    x = titan[["pclass", "age", "sex"]]

    y = titan["survived"]

    # 处理缺失值
    x["age"].fillna(x["age"].mean(), inplace=True)

    # 分割数据集到训练集和测试集 train_test_split()
    x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size = 0.25)

    # print(x_train.to_dict(orient = "records"))

    # 进行处理 （特征工程）特征-》类别-》one_hot编码
    dict