模型快速部署之决策树模型转SQL-case-when概率代码_如何把决策树判断条件转化为sql 的case when-CSDN博客

本文链接：https://blog.csdn.net/yang1015661763/article/details/131363298

众所周知，模型训练阶段是很容易的，把数据预处理好，然后把算法包调用下，调个参数就基本可以了，现实中涉及模型的上线我们就需要把模型转成pwwl进行上线。模型上线工作量大，开发周期长。而对于实际的数据分析人员而言，如果能够开发sql语句却十分的方便，本文提供一个决策树模型转sql代码上线的案例如下：

import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 定义随机种子和样本数
random_state = 42
n_samples = 10000

# 生成模拟数据集
X, y = make_classification(n_samples=n_samples, n_features=10, n_informative=5,
                           n_classes=2, weights=[0.8, 0.2], random_state=random_state)

# 将特征和标签转换为 Pandas 数据框
data = pd.DataFrame(X, columns=[f'feature{i}' for i in range(1, 11)])

# 添加几个类别型特征
cat_feature_1 = ['A', 'B', 'C', 'D']
cat_feature_2 = ['E', 'F', 'G', 'H']
data['cat_var1'] = np.random.choice(cat_feature_1, size=n_samples)
data['cat_var2'] = np.random.choice(cat_feature_2, size=n_samples)

# 添加一些噪声特征
noise = np.random.normal(0, 1, size=(n_samples, 3))
data['noise1'] = noise[:, 0]
data['noise2'] = noise[:, 1]
data['noise3'] = noise[:, 2]

data['target'] = y

# 打乱数据集并保存到 CSV 文件中
data = data.sample(frac=1, random_state=random_state).reset_index(drop=True)

data['cat_var1']=data['cat_var1'].replace({'A':1, 'B':2, 'C':3, 'D':4})
data['cat_var2']=data['cat_var2'].replace({'E':1, 'F':2, 'G':3, 'H':4})
X,y=data.drop(['target'],axis=1),data['target']

案例数据准备好以后，我们就开始训练模型

from sklearn import tree
from sklearn.tree import export_text
# 创建并训练决策树回归模型
clf = tree.DecisionTreeClassifier(criterion='entropy',max_depth=3)
clf.fit(X, y)

# 使用export_text函数获取决策树的文本表示
tree_text = export_text(clf,feature_names=list(X.columns))
# 打印决策树的文本表示
print("决策树的节点信息：")
print(tree_text)

决策树的节点信息：
|--- feature5 <= -0.21
|   |--- feature1 <= -1.79
|   |   |--- feature5 <= -2.15
|   |   |   |--- class: 0
|   |   |--- feature5 >  -2.15
|   |   |   |--- class: 1
|   |--- feature1 >  -1.79
|   |   |--- feature7 <= 0.68
|   |   |   |--- class: 0
|   |   |--- feature7 >  0.68
|   |   |   |--- class: 0
|--- feature5 >  -0.21
|   |--- feature1 <= -0.55
|   |   |--- feature5 <= 1.83
|   |   |   |--- class: 1
|   |   |--- feature5 >  1.83
|   |   |   |--- class: 0
|   |--- feature1 >  -0.55
|   |   |--- feature7 <= 0.05
|   |   |   |--- class: 0
|   |   |--- feature7 >  0.05
|   |   |   |--- class: 0

这里我们通过 export_text方法解析决策树模型的结构，然后通过自定义代export_graphviz_to_sql转成sql的case when 模型的概率值

tree.export_graphviz(clf,out_file='tree_rule.dot',
                            feature_names=list(X.columns),
                            class_names=['0','1'],
                            filled=True,
                            node_ids=True,
                            rounded=True,
                            special_characters=True,
                            proportion=True)

# 调用插件  
from export_graphviz_to_sql import to_sql 
to_sql('tree_rule.dot', 'tree_rule.sql')

=====Successfully converted decision tree to SQL======

最终的sql代码如下，这个模式是概率形式的。如果单纯想要target值，建议大家用graphviz_sql即可

select case
when feature5 <= -0.205 and feature1 <= -1.79 and feature5 <= -2.15 then 0.0
when feature5 <= -0.205 and feature1 <= -1.79 and feature5 > -2.15 then 0.981
when feature5 <= -0.205 and feature1 > -1.79 and feature7 <= 0.675 then 0.009
when feature5 <= -0.205 and feature1 > -1.79 and feature7 > 0.675 then 0.425
when feature5 > -0.205 and feature1 <= -0.547 and feature5 <= 1.834 then 0.887
when feature5 > -0.205 and feature1 <= -0.547 and feature5 > 1.834 then 0.07
when feature5 > -0.205 and feature1 > -0.547 and feature7 <= 0.048 then 0.108
when feature5 > -0.205 and feature1 > -0.547 and feature7 > 0.048 then 0.472
else 0 end;

export_graphviz_to_sql包的下载路径我这边也给，大家可以自己去下载使用