逻辑回归模型——股票客户流失预警模型（2）

最新推荐文章于 2022-08-15 21:18:08 发布

遇鱼语渔

最新推荐文章于 2022-08-15 21:18:08 发布

阅读量778

点赞数 1

分类专栏： python数据分析与挖掘文章标签：逻辑回归机器学习 python

本文链接：https://blog.csdn.net/weixin_45451576/article/details/126176316

版权

python数据分析与挖掘专栏收录该内容

22 篇文章 21 订阅

订阅专栏

首先回顾上一章的内容：逻辑回归模型 - 股票客户流失预警模型

代码如下：

# 1.读取数据
import pandas as pd
df = pd.read_excel('股票客户流失.xlsx')

# 2.划分特征变量和目标变量
X = df.drop(['是否流失'],axis=1) 
y = df['是否流失']

# 3.划分训练集和测试集
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# 4.模型搭建
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)

# 5.模型使用1 - 预测数据结果
y_pred = model.predict(X_test)
print(y_pred[0:100])  # 打印预测内容的前100个看看

# 查看全部的预测准确度
from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred, y_test)
print(score)  # 打印整体的预测准确度

# 6.模型使用2 - 预测概率
y_pred_proba = model.predict_proba(X_test)  
print(y_pred_proba[0:5])  # 打印前5个客户的分类概率

接下来学习：

模型评估方法 - ROC曲线与KS曲线

1.分类模型的评估方法 - ROC曲线

from sklearn.metrics import confusion_matrix
m = confusion_matrix(y_test, y_pred)  # 传入预测值和真实值
print(m)

a = pd.DataFrame(m, index=['0（实际不流失）', '1（实际流失）'], columns=['0（预测不流失）', '1（预测流失）'])
a

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))  # 传入预测值和真实值

2. 案例实战 - 评估股票客户流失预警模型

y_pred_proba[:,1]

# 1.计算ROC曲线需要的假警报率（fpr）、命中率（tpr）及阈值（thres）
from sklearn.metrics import roc_curve
fpr, tpr, thres = roc_curve(y_test, y_pred_proba[:,1])

# 2.查看假警报率（fpr）、命中率（tpr）及阈值（thres）
a = pd.DataFrame()  # 创建一个空DataFrame 
a['阈值'] = list(thres)
a['假警报率'] = list(fpr)
a['命中率'] = list(tpr)
a.head()

# 3.绘制ROC曲线
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置中文
plt.plot(fpr, tpr)  # 通过plot()函数绘制折线图
plt.title('ROC曲线')  # 添加标题，注意如果要写中文，需要在之前添加一行代码：plt.rcParams['font.sans-serif'] = ['SimHei']
plt.xlabel('FPR')  # 添加X轴标签
plt.ylabel('TPR')  # 添加Y轴标
plt.show()

# 4.求出模型的AUC值
from sklearn.metrics import roc_auc_score
score = roc_auc_score(y_test, y_pred_proba[:,1])
score

对阈值取值的理解

max(y_pred_proba[:,1])

a = pd.DataFrame(y_pred_proba, columns=['分类为0概率', '分类为1概率'])
a = a.sort_values('分类为1概率', ascending=False)
a.head(15)

3.KS曲线绘制

from sklearn.metrics import roc_curve
fpr, tpr, thres = roc_curve(y_test, y_pred_proba[:,1])
a = pd.DataFrame()  # 创建一个空DataFrame 
a['阈值'] = list(thres)
a['假警报率'] = list(fpr)
a['命中率'] = list(tpr)
a.head()

plt.plot(thres[1:], tpr[1:])
plt.plot(thres[1:], fpr[1:])
plt.plot(thres[1:], tpr[1:] - fpr[1:])
plt.xlabel('threshold')
plt.legend(['tpr', 'fpr', 'tpr-fpr'])
plt.gca().invert_xaxis() 
plt.show()

# KS值对应的阈值
a['TPR-FPR'] = a['命中率'] - a['假警报率']
a.head()

遇鱼语渔

关注

1
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归模型——股票客户流失预警模型（2）

逻辑回归模型——股票客户流失预警模型
复制链接

扫一扫

专栏目录