逻辑回归--信用卡欺诈检测

最新推荐文章于 2022-03-31 21:34:43 发布

孜孜不倦就是我

最新推荐文章于 2022-03-31 21:34:43 发布

阅读量939

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/qq_36448051/article/details/81568506

版权

本文通过逻辑回归分析信用卡欺诈数据，探讨了样本不均衡问题的解决策略，包括下采样和过采样，以及逻辑回归模型的构建、最佳惩罚力度选择。交叉验证和混淆矩阵用于评估模型性能，结果显示过采样策略和调整阈值对于提升召回率至关重要。

摘要由CSDN通过智能技术生成

逻辑回归–信用卡欺诈检测

交叉验证 recall 正则化惩罚项已经安排上了
代码和数据链接：https://pan.baidu.com/s/1E-n0iCNr4oFr4VxPPFrSBg 密码：fjvk
机器学习慢慢入门

1.看数据

#信用卡欺诈检测
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

data = pd.read_csv("creditcard.csv")
print(data.head())

count_classes = pd.value_counts(data['Class'], sort=True).sort_index()
print(count_classes)
#value_counts统计Class列的属性值0，1各有多少个
count_classes.plot(kind='bar') #条形图，可以用pandas直接画
plt.title("Fraud class histogram")
plt.xlabel("Class")
plt.ylabel("Frequency")
plt.xticks(rotation=0)
plt.show()

这里写图片描述

0，1分度差异

这里写图片描述
按照常识，我们推断 Class列为0，表示正常的交易，为1，表示异常交易，可能存在信用卡欺诈，这也就是我们要找的一类。

看表发现：

1.1数据分布查异大

V1-V28列数值基本在-2到1之间，而Amount列中数据分布查异很大，从前5条数据看最小的2.69，最大的378.66。可能会对机器学习算法造成误导，认为数值大的特征更重要，数值小的特征不重要。为了使每个特征重要程度相当，用sklearn的preprocessing模块导进来StandardScaler标准化模块
fit_transform对数据进行变换


from sklearn.preprocessing import StandardScaler
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].reshape(-1, 1)) #转换成新的特征
data = data.drop(['Time', 'Amount'], axis=1)#删除没用的特征
print(data.head())

打印前五条，已经变换好了
这里写图片描述

1.2 正常样本和异常样本，样本极度不均衡

样本不均衡解决方案：

过采样：
对少的那个进行生成，使样本一样多
下采样：
从多的里面找与少的数量一样的样本，再组合起来就ok了。使样本一样少

2 下采样策略

2.1 获取下采样数据集

X = data.ix[:, data.columns != 'Class']  #拿出所有样本 ，不包括Class列
y = data.ix[:, data.columns == 'Class']  ##拿出所有样本 ，只拿出Class列，我们的label列

# Number of data points in the minority class
number_records_fraud = len(data[data.Class == 1]) #class=1的有多少个
fraud_indices = np.array(data[data.Class == 1].index) #把class=1的样本的索引拿出来
#print(fraud_indices)
# Picking the indices of the normal classes
normal_indices = data[data.Class == 0].index  #把class=0的样本的索引拿出来

# Out of the indices we picked, randomly select "x" number (number_records_fraud) replace是否进行代替
random_normal_indices = np.random.choice(normal_indices, number_records_fraud, replace=False)
random_normal_indices = np.array(random_normal_indices)
#拿出索引后再转换为np.array格式

# Appending the 2 indices 合并索引
under_sample_indices = np.concatenate([fraud_indices, random_normal_indices])

# Under sample dataset  现在合并之后的数据
under_sample_data = data.iloc[under_sample_indices, :]

X_undersample = under_sample_data.ix[:, under_sample_data.columns != 'Class']
y_undersample

最低0.47元/天解锁文章

孜孜不倦就是我

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归--信用卡欺诈检测

逻辑回归–信用卡欺诈检测1.看数据#信用卡欺诈检测import pandas as pdimport matplotlib.pyplot as pltimport numpy as npdata = pd.read_csv(&amp;amp;quot;creditcard.csv&amp;amp;quot;)print(data.head())count_classes = pd.value_counts(data[...
复制链接

扫一扫

专栏目录