Decision Tree Regression决策回归树原理与代码实现，并与MLP进行对比（Pytorch）， sklearn，numpy（超级详细，0基础！）

香蕉也是布拉拉

已于 2024-08-13 14:10:48 修改

阅读量753

点赞数 26

分类专栏： sklearn neural network 文章标签：决策树回归 sklearn python pytorch numpy

于 2024-08-13 14:03:05 首次发布

本文链接：https://blog.csdn.net/m0_62716099/article/details/141140806

版权

sklearn 同时被 2 个专栏收录

11 篇文章 0 订阅

订阅专栏

neural network

4 篇文章 0 订阅

订阅专栏

今天我们继续学习决策树，对于决策树中的回归树，我们进行原理的讲解与案例的实现，与此同时，我复习了一下MLP的Pytorch实现，其中并没有调用太多的库函数，大家应该也能看懂。我们对比一下两个模型的效果，发现对于非线性的拟合，神经网络的效果还是明显优于决策树。

原理

同样说明，以下内容（包括截图）依然是对于youtube博主statQuest的决策回归树内容的总结，视频连接：决策回归树-StatQuest,有兴趣的同学可以去看看原理，这里只是简要讲一下其思想。

与前文中的决策分类树类似，我们在决策回归树同样是进行决策的判断，从而进行回归分析。如下图所示，我们假设一个数据集，x轴是Drug Dosage，y轴是Drug Effectiveness，显然呈现出一个非线性的分布。在这种情况下，我们理解下面的决策树：如果用量（Dosage）小于14.5，那么有效性就是4.2，否则就继续跟29进行判断。如此循环，直至到达叶结点。

当然，与决策分类树相同，大家也会有疑问，为什么一开始与14.5判断？为什么小于14.5直接就是4.2的有效性？别着急，我在后面给大家解释，这里只是先让大家熟悉工作流程。

在这里，我们多说一句，如果我们的条件有多个，同样的，我们也是按照条件决策后进行回归，至于结点分割的选择：大家记得分类树中的gini系数吧？这里类似的，我们也有类似的误差计算方法，常用的就是MSE，在这里我就以这个为例子。

先回到只有一个特征剂量（Dosage）的例子，我们先假设一个x，并以Dosage < x 作为决策条件，那么我们则分为了两类（大于等于x与小于x），我们把两类中的均值m1与m2计算出来，并计算样本与均值的平方和误差。这样，我们就能够计算出x下的一个误差值。之后我们对于x进行迭代，直至找到误差值最小的那个x，作为我们真正应用的决策值。

同样的，如果我们有多个特征，那么我们就对于每个特征先找到一个最佳x，然后把三个特征的最小误差进行比较，选择最小的那个特征作为我们的决策条件。

以上就是我们决策回归树的大概原理，我们无需关注太多内部数学原理，只需要掌握代码使用即可。建议大家看一看官方文档：决策回归树decisionTreeRegressor官方接口文档。

拟合二次函数曲线（建议去Github看我的ipynb源文件）

导入所需要的包

from sklearn import tree
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F
%matplotlib inline

生成数据集并进行可视化

# we will polyfit y = x**2 - 10x in this code
# create dataset
steps = 100     # number of features
np.random.seed(42)
raw_data = np.array(sorted(np.random.rand(steps) * 10))  # shape : (200, )  data range(0, 10)
raw_data = raw_data[:, np.newaxis]  # shape: (200, 1)

features = np.concatenate([raw_data ** 2, raw_data], axis=1)    # train feature
y = features[:, 0] - features[:, 1] * 10   # (200, ) real data
label = y + np.random.randn(steps)    # add noise which mean is 0 and std is 50
idx_noise = np.random.randint(0, steps, size=5) # add big noize to some points
label[idx_noise] += np.random.randn(len(idx_noise)) * 10

# visualize data
plt.figure(figsize=(10, 5))
plt.scatter(raw_data, y, marker='o', s=50, label='real data')
plt.scatter(raw_data, label, marker='x',label='train data')
plt.legend()
plt.show()

生成的数据结构大概如下：我特意增加了几个明显噪声点，用于观察过拟合现象。

训练决策回归树并可视化

# use decision tree regression to fit data
clf1 = tree.DecisionTreeRegressor(
    max_depth=2
)
clf1 = clf1.fit(features, label)
clf2 = tree.DecisionTreeRegressor(
    max_depth=5
)
clf2 = clf2.fit(features, label)

# visualize the decision tree
plt.figure(figsize=(30, 15))
tree.plot_tree(clf1)
plt.show()

这里我们采用了最大深度的剪枝策略用于防止过拟合，选择2和5用于对比。由于depth为5的树结构较为复杂，我们就可视化一个树的。

MLP（Pytorch实现）不懂的可以看我Github的另一个仓库

这里给大家推荐我的网络老师Karpathy的油管课程，真0基础学习：Youtube-Karpathy:NeuralNetwork-from-0-to-hero有时间，对nlp，大模型感兴趣的同学都可以看

# try use mlp
import torch
import torch.nn as nn
import torch.nn.functional as F

torch.manual_seed(42)

train_X, train_y = torch.tensor(features, dtype=torch.float32), torch.tensor(label, dtype=torch.float32)    # (50, 2), (50, )
num_epochs = 10000
batch_size = 4
num_hiddens = 256
lr = 0.00005

W1 = torch.randn(2, num_hiddens) ; b1 = torch.zeros(num_hiddens)
W2 = torch.randn(num_hiddens, 1) ; b2 = torch.zeros(1)

parameters = [W1, b1, W2, b2]
for p in parameters:
    p.requires_grad_()

lossi = []
for epoch in range(num_epochs):
    idx = torch.randint(0, steps, size=(batch_size, ))
    X, y = train_X[idx], train_y[idx]   # (batch_size, 2), (batch_size, )
    
    # forward pass
    x = torch.matmul(X, W1) + b1    # (batch_size, 2) @ (2, num_hiddens) + (1, )  -> (batch_size, num_hiddens)
    x = torch.tanh(x)            
    outputs = torch.matmul(x, W2) + b2    # (batch_size, num_hiddens) @ (num_hiddens, 1) + (1, ) -> (batch_size, 1)
    
    l = 0.0
    for i, output in enumerate(outputs):
        l += (output - y[i]) ** 2
    l /= batch_size
    
    # backward pass
    for p in parameters:
        p.grad = None
    l.backward()
    for p in parameters:
        p.data += -lr * p.grad
    
    # track stats
    lossi.append(l.item())
    if epoch % 20 ==0:
        print(f'epoch {epoch:7d}, loss {l.item() :10f}')
        
print(f'epoch {epoch:7d}, loss {l.item() :10f}')

可视化对比结果

predict_y = []
with torch.no_grad():
    for x in train_X:
        x = x.unsqueeze(dim=0)  # (1, 2)
        # forward pass
        x = torch.matmul(x, W1) + b1    # (batch_size, 2) @ (2, num_hiddens) + (1, )  -> (batch_size, num_hiddens)
        x = torch.tanh(x)            
        output = torch.matmul(x, W2) + b2    # (batch_size, num_hiddens) @ (num_hiddens, 1) + (1, ) -> (batch_size, 1)
        predict_y.append(output.item())
        
plt.figure(figsize=(10, 5))
plt.scatter(raw_data.ravel(), label, marker='.', c='b', label='train data')
plt.plot(raw_data.ravel(), predict_y, label='mlp', c ='y')
plt.plot(raw_data, clf1.predict(features), label='clf1', linewidth=0.8, c='c')
plt.plot(raw_data, clf2.predict(features), label='clf2', linewidth=0.8, c='r')
plt.legend()
plt.show()

下图为结果，我们明显可以看到mlp曲线效果最优，最平滑，而clf2（depth=5）的决策树深度大，模型结构复杂，从而导致了过拟合，clf1效果在于二者之间。所以在一般情况下，拟合问题还是建议大家选择神经网络，毕竟效果差距确实有点大。