9、决策树实战

最新推荐文章于 2023-08-24 11:23:45 发布

T o r

最新推荐文章于 2023-08-24 11:23:45 发布

阅读量269

点赞数

分类专栏：数据分析实战篇文章标签：决策树机器学习 python 深度学习计算机视觉

本文链接：https://blog.csdn.net/qwe863226687/article/details/115915902

版权

数据分析实战篇专栏收录该内容

14 篇文章 0 订阅

订阅专栏

决策树回归（引入L2正则化比较，突出神坛级算法决策树）：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split, GridSearchCV

#划分x，y
x = data.iloc[:, :-1]
y = data.iloc[:, -1:]
#留出法，分训练集和测试集
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
#创建模型
model = DecisionTreeRegressor()
#网格搜索
param_grid={'max_depth':[3, 5, 7, 9, 11]}
tree = GridSearchCV(model, param_grid=param_grid, cv=3)
tree.fit(x_train, y_train)
print(tree.best_params_)

tree_score = tree.score(x_test, y_test)#R^2
print(tree_score)
tree_predict = tree.predict(x_test)
#给预测值的评分

model = Ridge()
param_grid = {'alpha': [1, 0.75, 0.5, 0.1, 0.05, 0.01]}
ridge = GridSearchCV(model, param_grid=param_grid, cv=3)
ridge.fit(x_train, y_train)
print(ridge.best_params_)
#用网格搜索交叉验证选择超参数

ridge_score = ridge.score(x_test, y_test)
print(ridge_score)
ridge_predict = ridge.predict(x_test)

使用线性回归的L2回归来和决策树进行性能对比，可知决策树的评分更高

plt.rcParams['font.sans-serif']=['SimHei']
m, n = y_test.shape#获取行和列
# print(m)
plt.plot(np.arange(m), y_test, 'r-', label='真实分布')
plt.plot(tree_predict, 'g-', label=u'决策树回归，$R^2$=%.4f' % tree_score)
plt.plot(ridge_predict, 'b-', label=u'岭回归, $R^2$=%.4f'% ridge_score)
plt.grid()
plt.legend()
plt.show()

可观察到决策树的模型和实际模型更加接近（包含实际值直接标的图像、决策树拟合后标的图像和L2正则化标的图像）

决策树分类（这里决策树的分类是不需要进行标签化操作的）：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
import warnings
from sklearn.metrics import classification_report
warnings.filterwarnings('ignore')

这里支取两个特征来参与分类，分类特征少的缺点就是分类的模型不够准确

x = data.iloc[:, :2]
y = data.iloc[:, -1:]
lable = LabelEncoder()
y = lable.fit_transform(y)
print(y[:10])

N, M = 100, 100  # 横纵各采样多少个值
plt.rcParams['font.sans-serif'] = ['SimHei']
x1_min, x1_max = x.iloc[:, 0].min(), x.iloc[:, 0].max()  # 第0列的范围
x2_min, x2_max = x.iloc[:, 1].min(), x.iloc[:, 1].max()  # 第1列的范围
t1 = np.linspace(x1_min, x1_max, N)
t2 = np.linspace(x2_min, x2_max, M)
x1, x2 = np.meshgrid(t1, t2)  # 生成网格采样点
x_show = np.stack((x1.flat, x2.flat), axis=1)  # 测试点
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF'])
cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
y_show_hat = model.predict(x_show)  # 预测值
y_show_hat = y_show_hat.reshape(x1.shape)  # 使之与输入的形状相同
plt.figure(facecolor='w')
plt.pcolormesh(x1, x2, y_show_hat, cmap=cm_light)  # 预测值的显示
plt.scatter(x_test.iloc[:, 0], x_test.iloc[:, 1], c=y_test.ravel(), edgecolors='k', s=100, cmap=cm_dark, marker='o')  # 测试数据
#edgecolors 边框颜色
plt.scatter(x.iloc[:, 0], x.iloc[:, 1], c=y.ravel(), edgecolors='k', s=40, cmap=cm_dark)  # 全部数据
plt.xlabel('x1', fontsize=15)
plt.ylabel('x2', fontsize=15)
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)

T o r

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
1
评论
9、决策树实战

决策树回归（引入L2正则化比较，突出神坛级算法决策树）：import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.tree import DecisionTreeRegressorfrom sklearn.linear_model import Ridgefrom sklearn.metrics import mean_squared_error, r2_scorefrom sklearn.
复制链接

扫一扫