决策树:案例实战 -信用卡精准营销模型
数据
(信用卡精准营销模型.xlsx)
特征变量:年龄,月收入,月消费,性别,月消费/月收入;
目标变量:响应;
工具
jupyter notebook
建模
(1)读取代码所在文件夹中的“信用卡精准营销模型.xlsx”;
import pandas as pd
df = pd.read_excel(r'信用卡精准营销模型.xlsx')
df.head()
2)提取特征变量和目标变量;
X = df.drop(columns = '响应')
y = df['响应']
(3)划分训练集和测试集数据(其中
test_size=0.2);
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2, random_state=123)
4)建立决策树模型(使用默认参数);
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
(5)预测数据结果(预测测试集结果),并通过DataFrame进行展示前5行数据;
y_pred = model.predict(X_test)
a = pd.DataFrame()
a['预测值'] = list(y_pred)
a['实际值'] = list(y_test)
a.head(5)
6)预测概率,并通过DataFrame进行展示前5行数据;
y_pred_proba = model.predict_proba(X_test)
b = pd.DataFrame(y_pred_proba,columns=['响应概率','不响应概率'])
b.head(5)
7)查看模型预测准确度;
from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred,y_test)
score
(8)绘制模型ROC曲线;
from sklearn.metrics import roc_curve
fpr,tpr,thres = roc_curve(y_test,y_pred_proba[:,1])
a = pd.DataFrame()
a['阈值'] = list(thres)
a['假警报率'] = list(fpr)
a['命中率'] = list(tpr)
import matplotlib.pyplot as plt
plt.plot(fpr, tpr)
plt.show()
(9)计算模型AUC值;
from sklearn.metrics import roc_auc_score
score = roc_auc_score(y_test,y_pred_proba[:,1])
score
(10)查看此时模型各个特征的特征重要性;
features = X.columns
importances = model.feature_importances_
importances_df = pd.DataFrame()
importances_df['特征名称'] = features
importances_df['特征重要性'] = importances
importances_df.sort_values("特征重要性",ascending = False)
决策树的可视化
使用Graphviz
from sklearn.tree import export_graphviz
import graphviz
import os
os.environ['PATH'] = os.pathsep + r'D:\Grapgviz\Graphviz\bin'
dot_data = export_graphviz(model, out_file=None, class_names = ['0','1'])
graph = graphviz.Source(dot_data)
graph.render("result")
—关于Graphviz的安装见后一篇博客—
参数调优
from sklearn.model_selection import GridSearchCV
parameters = {'max_depth': [1,3,5,7,9]}
model = DecisionTreeClassifier()
grid_search = GridSearchCV(model,parameters,scoring='roc_auc',cv=5)
grid_search.fit(X_train,y_train)
grid_search.best_params_
model = DecisionTreeClassifier(max_depth=7)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred,y_test)
score
y_pred_proba = model.predict_proba(X_test)
from sklearn.metrics import roc_auc_score
score_new = roc_auc_score(y_test.values,y_pred_proba[:,1])
score_new
print("调优前AUC值:"+ str(score))
print("调优后AUC值:"+ str(score_new))