机器学习之分类算法2

最新推荐文章于 2022-08-02 17:34:47 发布

W1nd_

最新推荐文章于 2022-08-02 17:34:47 发布

阅读量175

点赞数

分类专栏： just_play 文章标签：决策树 python 机器学习

本文链接：https://blog.csdn.net/W1nd_/article/details/112426579

版权

just_play 专栏收录该内容

46 篇文章 0 订阅

订阅专栏

Day11

机器学习之分类算法2

机器学习之分类算法2

决策树

信息论基础

在这里插入图片描述

信息增益

在这里插入图片描述

信息熵

在这里插入图片描述
注：此处的log都是2为底

信息增益的计算

在这里插入图片描述

常见决策树算法

在这里插入图片描述
基尼系数划分更细致
criterion：默认gini，也可entropy

决策树结构与本地保存

在这里插入图片描述

随机森林

集成学习

在这里插入图片描述

随机森林API

在这里插入图片描述
注：单颗树是随机有放回抽样

优点

在这里插入图片描述

代码实现

from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz
import pandas as pd
from sklearn.model_selection import train_test_split
def decision():
    titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titantic.txt")
    x = titan[['pclass', 'age', 'sex']]
    y = titan['survived']
    print(x)
    x['age'].fillna(x['age'].mean(), inplace=True)
    x_trian, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
    dict = DictVectorizer(sparse=False)
    x_train = dict.fit_transform(x_trian.to_dict(orient='records'))
    x_test = dict.transform(x_test.to_dict(orient='records'))
    dec = DecisionTreeClassifier(max_depth=5)
    dec.fit(x_train, y_train)
    print('预测准确率', dec.score(x_test, y_test))
    export_graphviz(dec, out_file='./tree.dot', feature_names=['年龄', 'pclass=1st', 'pclass=2nd', 'pclass=3rd', 'sex=female', 'sex=male'])
    return

决策树生成与剪枝：https://blog.csdn.net/am290333566/article/details/81187562.
sklearn中参数含义：https://blog.csdn.net/qq_16000815/article/details/80954039.

W1nd_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
机器学习之分类算法2

Day11机器学习之分类算法2决策树信息论基础信息增益信息熵信息增益的计算常见决策树算法决策树结构与本地保存随机森林集成学习随机森林API优点代码实现机器学习之分类算法2决策树信息论基础信息增益信息熵注：此处的log都是2为底信息增益的计算常见决策树算法基尼系数划分更细致criterion：默认gini，也可entropy决策树结构与本地保存随机森林集成学习随机森林API注：单颗树是随机有放回抽样优点代码实现from sklearn.feature_ex
复制链接

扫一扫