异常数据检测（信用卡欺诈）逻辑回归实战案例

最新推荐文章于 2023-09-10 22:06:50 发布

Amanda_ABAP_Python

最新推荐文章于 2023-09-10 22:06:50 发布

阅读量769

点赞数 1

分类专栏： python 数据分析文章标签： python

本文链接：https://blog.csdn.net/Amanda_python/article/details/109365990

版权

1. 读取数据

import numpy as np  # 矩阵计算
import pandas as pd # 数据处理和数据分析
import matplotlib.pyplot as plt  # 数据可视化展示
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline  # 画图可以镶嵌到当前页面，不指定画图要指定一些东西，比较麻烦

Amount数据需要标准化的原因，因为Amount和v1-v28的数值差异太大

# 数据读取
import os
os.chdir('C:/Users/Liu/Desktop')
data = pd.read_csv('creditcard.csv')
data.head()
count_class = pd.value_counts(data['Class'],sort = True)
#count_class.plot(kind = 'bar') # 画图观察一下
 
# 数据标准化
from sklearn.preprocessing import StandardScaler
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1,1)) # values将ndarry转换成数值
data = data.drop(['Time','Amount'],axis = 1)
data.head()

2. 针对问题给出解决方案

解决方案：将标签0和1的数据的数目转化为一致然后再模型训练
第一种方案：标签0数据和标签1数据一样少，数据多的一方降到和数据少的一方一致（降采样 under sample）
第二种方案：标签1数据和标签0数据一样多，数据少的一方增到和数据多的一方一致（过采样 over sample）

#       所有样本     找到指定的列
x = data.iloc[:,data.columns !='Class']
y = data.iloc[:,data.columns == 'Class']


#得到所有异常样本的索引
number_records_fraud = len(data[data.Class ==1])
#fraud_indices = np.array(data[data.Class == 1].index)
fraud_indices = data[data.Class == 1].index


# 得到所有异常样本的索引
#number_records_normal = len(data[data.Class == 0])
#normal_indices = np.array(data[data.Class == 0].index)
normal_indices = data[data.Class == 0].index

# 在正常样本中随机采样出指定个数的样本，并取其索引
random_normal_indices = np.random.choice(normal_indices,number_records_fraud,replace = False)
random_normal_indices = np.array(random_normal_indices) # 转换成这个结构更方便操作

# 有了正常和异常样本后把它们的索引都拿到手,组合到一起
under_sample_indices = np.concatenate([fraud_indices,random_normal_indices])

# 根据索引得到下采样所有样本点
under_sample_data = data.iloc[under_sample_indices,:]
print(len(under_sample_data))
X_undersample = under_sample_data.iloc[:,under_sample_data.columns !='Class']
Y_undersample = under_sample_data.iloc[:,under_sample_data.columns =='Class']