决策树介绍及使用

Naturino

已于 2023-01-12 11:17:25 修改

阅读量431

点赞数

分类专栏：基础知识 # 机器学习文章标签：决策树算法

于 2023-01-12 11:03:21 首次发布

本文链接：https://blog.csdn.net/Naturino/article/details/128655106

版权

基础知识同时被 2 个专栏收录

4 篇文章 0 订阅

订阅专栏

机器学习

3 篇文章 0 订阅

订阅专栏

决策树介绍及使用

最终形成的决策树就是一棵从根节点开始，每个节点代表一个特征，每条边代表该特征的一个取值，叶子节点代表一个预测结果的树结构。在预测时，对于新的样本，从根节点开始，按照该样本特征的取值走过决策树的边，直到到达叶子节点。叶子节点所对应的预测结果就是对该样本的预测结果。

决策树模型是一种较为简单，直观，易于实现和理解的算法，并且在许多场景下都有着良好的表现。然而，决策树也有其缺点，容易过拟合，在高维数据和具有噪声的数据上表现较差。需要通过剪枝，限制树的深度等手段来缓解这些问题。

Python 中的使用方法

使用 Python 库 scikit-learn 构建决策树的简单样例：

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import joblib

iris = load_iris()
X = iris.data
y = iris.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Training
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

print("Accuracy on training set: {:.3f}".format(clf.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(clf.score(X_test, y_test)))

# Save the model
joblib.dump(clf, 'decision_tree.pkl')

# Load the model
clf_loaded = joblib.load('decision_tree.pkl')

# Use the loaded model to make predictions on new data
predictions = clf_loaded.predict(X_test)
print(predictions )

C++中的使用方法

将Python训练的模型在C++中的使用

sklearn-caffe 是一个 Python 库，可以将 scikit-learn 模型转换为 Caffe 模型。

from sklearn.externals import joblib
from sklearn2caffe import sklearn2caffe

clf = joblib.load('decision_tree.pkl')

# Convert the model
model_def, model_weights = sklearn2caffe(clf, 'input', 'output')

# Save the model to prototxt and caffemodel files
with open("decision_tree.prototxt", "w") as f:
    f.write(model_def)

with open("decision_tree.caffemodel", "wb") as f:
    f.write(model_weights.SerializeToString())

在这个示例中,我们首先加载我们之前训练并保存的决策树模型，并使用sklearn2caffe函数将其转换为Caffe模型。然后,我们使用写入文件的方式将转换后的模型分别保存为 prototxt 和 caffemodel 文件。

这个转换工具可以将很多Scikit-learn模型转化为Caffe模型，但是对于决策树，由于它不需要深度学习网络框架,所以不能转化为caffe模型。

决策树通常是基于树的结构，无需网络结构，在c++中通常是使用传统的算法结构构建的或者直接使用sklearn提供的接口进行预测。

在 C++ 中使用已经用 Python 保存的决策树模型需要一些额外的库和工具, 你需要在c++环境里安装 Caffe。

#include <caffe/caffe.hpp>
#include <caffe/proto/caffe.pb.h>

int main() {
    // Load the model
    caffe::Net<float> net("path_to_deploy_prototxt", caffe::TEST);
    net.CopyTrainedLayersFrom("path_to_caffemodel");

    // Prepare input
    std::vector<caffe::Blob<float>* > input_blobs = net.input_blobs();
    input_blobs[0]->Reshape(1, 3, 224, 224);
    net.Reshape();

    // Fill input with data
    float* input_data = input_blobs[0]->mutable_cpu_data();
    // ... fill input_data ...

    // Run forward pass
    net.Forward();

    // Get the output
    std::vector<caffe::Blob<float>* > output_blobs = net.output_blobs();
    float* output_data = output_blobs[0]->mutable_cpu_data();

    // ... do something with the output ...
    return 0;
}

需要注意的是, 在C++环境下使用模型可能会比在Python环境下慢很多，因为Python有许多优化和便利的库可供使用。如果需要高性能，建议使用C++或其他语言进行优化。

在C++中训练并使用

在 C++ 中也有很多库和工具可以直接训练决策树模型并使用它进行预测。这些库和工具通常提供了与 scikit-learn 类似的 API，并且可以在 C++ 中使用。

一个常用的 C++ 库是 OpenCV，它提供了一个 C++ 类 CvRTrees 用来构建和使用决策树。下面是一个简单的示例代码,展示了如何使用 OpenCV 构建决策树并进行训练、保存、加载及预测。

#include <opencv2/ml.hpp>

int main() {
    // Create decision tree
    cv::Ptr<cv::ml::DTrees> dtree = cv::ml::DTrees::create();

    // Set the training data
    cv::Mat_<float> train_data(n_samples, n_features);
    cv::Mat_<int> train_labels(n_samples, 1);
    // ... fill train_data and train_labels ...

    // Train the decision tree
    dtree->train(train_data, cv::ml::ROW_SAMPLE, train_labels);

    // Prepare input
    cv::Mat_<float> test_data(1, n_features);

    // Run prediction
    float result = dtree->predict(test_data);
    
	 // Save the model
    dtree->save("decision_tree.xml");

    // Load the model
    cv::Ptr<cv::ml::DTrees> dtree_loaded =cv::ml::DTrees::load("decision_tree.xml");

    // Use the loaded model to make predictions on new data
    float result = dtree_loaded->predict(test_data);
    
    return 0;
}