Kaggle案例精选——电信客户流失预测(Telecom Customer Churn Prediction)Part Three:模型构建与可视化展现

本文详细介绍了在Kaggle上的电信客户流失预测案例中,如何通过多种机器学习模型(如线性模型、SMOTE、RFE、单变量选择、决策树、KNN、随机森林、朴素贝叶斯、SVM、LightGBM和XGBoost)进行预测,并展示了相应的模型可视化。文章涵盖了数据集划分、特征选择和模型评估等步骤。
摘要由CSDN通过智能技术生成

5 Model Building:模型构建

划分数据集及相关函数构建

1. 加载相应库

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.metrics import roc_auc_score, roc_curve, scorer, f1_score
import statsmodels.api as sm
from sklearn.metrics import precision_score, recall_score
from yellowbrick.classifier import DiscriminationThreshold

2. 划分数据

# 区分训练集和测试集
train, test = train_test_split(telcom, test_size=0.25, random_state=111)
# 区分独立和非独立变量
cols = [i for i in telcom.columns if i not in Id_col+target_col]
train_X = train[cols]
train_Y = train[target_col]
test_X = test[cols]
test_Y = test[target_col]

3. 建模函数及可视化函数构建

3.1 建模函数变量说明
# 构建方程元素
# dataframe     -用于构建模型的数据表
# Algorithm     -使用的算法
# training_x    -用于训练的数据
# testing_x     -用于预测的测试集
# training_y    -目标变量(用于训练)
# testing_y     -目标变量(用于测试)
# cf            -[‘coefficient’, ‘feature’]逻辑回归的相关系数
# threshold_plot-如果为True,返回模型的阈值图
3.2 构建函数,实现传入算法及数据集,输出模型表现指标
def telecom_churn_prediction(algorithm, training_x, testing_x, training_y, testing_y, cols, cf, thredshold_plot):
    # 模型构建
    algorithm.fit(training_x, training_y)
    preds = algorithm.predict(testing_x)
    prob = algorithm.predict_proba(testing_x)

    # 相关性
    if cf == 'coefficients':
        coefficients = pd.DataFrame(algorithm.coef_.ravel())
    elif cf == 'features':
        coefficients = pd.DataFrame(algorithm.feature_importances_)

    column_df = pd.DataFrame(cols)
    coef_sumry = (pd.merge(coefficients, column_df, left_index=True,
                           right_index=True, how='left'))
    coef_sumry.columns = ['coefficients', 'features']
    coef_sumry = coef_sumry.sort_values(by='coefficients', ascending=False)

    print(algorithm)
    print('\n Classification report: \n', classification_report(testing_y, preds))
    print('Accuracy Score: ', accuracy_score(testing_y, preds))

    # 混淆矩阵
    conf_matrix = confusion_matrix(testing_y, preds)

    # roc_auc_score得分
    model_roc_auc = roc_auc_score(testing_y, preds)
    print('Area under curve: ', model_roc_auc, '\n')
    fpr, tpr, thredsholds = roc_curve(testing_y, prob[:, 1])

    # 绘制混淆矩阵图
    trace1 = go.Heatmap(z=conf_matrix, x=['Not Churn', 'Churn'], y=['Not Churn', 'Churn'],
                        showscale=False, colorscale='Picnic', name='Matrix')

    # 绘制roc曲线
    trace2 = go.Scatter(x=fpr, y=tpr, name='Roc: '+str(model_roc_auc),
                        line=dict(color='rgb(22,96,167)', width=2))
    trace3 = go.Scatter(x=[0, 1], y=[0, 1], line=dict(color=('rgb(205, 12, 24)'),
                                                      width=2, dash='dot'))
    # 绘制相关性图
    trace4 = go.Bar(x=coef_sumry['features'], y=coef_sumry['coefficients'],
                    name='Coefficients',
                    marker=dict(color=coef_sumry['coefficients'],
                                colorscale='Picnic',
                                line=dict(width=0.6, color='black')))
    # 合并绘制
    fig = tls.make_subplots(rows=2, cols=2, specs=[[{
   }, {
   }], [{
   'colspan':2}, None]],
                            subplot_titles=('Confusion Matrix', 'Receiver operating characteristic',
                                            'Feature Importances'))

    fig.append_trace(trace1, 1, 1)
    fig.append_trace(trace2, 1, 2)
    fig.append_trace(trace3, 1, 2)
    fig.append_trace(trace4, 2, 1)

    fig['layout'].update(showlegend=False, title='Model perfomance', autosize=False,
                         height=900, width=800,
                         plot_bgcolor='rgba(240,240,240,0.95)', paper_bgcolor='rgba(240,240,240,0.95)',
                         margin=dict(b=195))
    fig['layout']['xaxis2'].update(dict(title='false positive rate')
  • 7
    点赞
  • 74
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值