机器学习之信用卡欺诈检测（零基础，附数据及详细python代码2022年Tensorflow2）_data.iloc[ :,data.columns!='class']代表什么-CSDN博客

本文链接：https://blog.csdn.net/soga235/article/details/123648081

首先该数据参考：机器学习项目实战之信用卡欺诈检测（零基础，附数据及详细python代码）

(4条消息) 机器学习项目实战之信用卡欺诈检测（零基础，附数据及详细python代码）_西南交大-Liu_z的博客-CSDN博客_信用卡欺诈检测pythonhttps://blog.csdn.net/qq_40683479/article/details/89221558

对于参考链接上的pandas的ix 提取数据列，会出现错误，故相应进行修改使用panda的iloc功能实现：

(6条消息) Python: pandas中ix的详细讲解_anshuai_aw1的博客-CSDN博客_.ixhttps://blog.csdn.net/anshuai_aw1/article/details/82801435

参考代码如下：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data = pd.read_csv('creditcard.csv')
data.head()

#分别计算不同的属性有多少个
count_classes = pd.value_counts(data['Class'], sort = True).sort_index() 
#以柱状图的形式绘制出
count_classes.plot(kind = 'bar')
plt.title("Fraud class histogram")
plt.xlabel("Class")
plt.ylabel("Frequency")

from sklearn.preprocessing import StandardScaler
#StandardScaler作用：去均值和方差归一化。且是针对每一个特征维度来做的，而不是针对样本。
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1, 1))
#删除Time和Amount所在的列
data = data.drop(['Time','Amount'],axis=1)
data.head()

前面代码和最上的链接一致，然后后续代码使用iloc

X = data.iloc[:, data.columns != 'Class']
y = data.iloc[:, data.columns == 'Class']
print(X)
print(y)

#X为取出所有属性，不包含class的这一列

#y为取出class这一列

#计算出class==1（存在欺诈行为）元素有多少个
number_records_fraud = len(data[data.Class == 1])
#取出class==1的行索引
fraud_indices = np.array(data[data.Class == 1].index)
 
#取出class==0的行索引
normal_indices = data[data.Class == 0].index
 
#随机选择和1这个属性样本个数相同的0样本
random_normal_indices = np.random.choice(normal_indices, number_records_fraud, replace = False)
#转换成numpy的格式
random_normal_indices = np.array(random_normal_indices)
 
#将class=0和1的样本的索引拼接在一起
under_sample_indices = np.concatenate([fraud_indices,random_normal_indices])
 
# #下采样的数据集
# under_sample_data = data.iloc[under_sample_indices,:]
#  #下采样数据集的数据
# X_undersample = under_sample_data.ix[:, under_sample_data.columns != 'Class']
# #下采样数据集的label
# y_undersample = under_sample_data.ix[:, under_sample_data.columns == 'Class']
#下采样的数据集
under_sample_data = data.iloc[under_sample_indices,:]
 #下采样数据集的数据
X_undersample = under_sample_data.iloc[:, under_sample_data.columns != 'Class']
#下采样数据集的label
y_undersample = under_sample_data.iloc[:, under_sample_data.columns == 'Class']
 
#打印Class == 0的样本数目
print("Percentage of normal transactions: ", len(under_sample_data[under_sample_data.Class == 0])/len(under_sample_data))
#打印Class == 0的样本数目
print("Percentage of fraud transactions: ", len(under_sample_data[under_sample_data.Class == 1])/len(under_sample_data))
#打印下采样の1总数量
print("Total number of transactions in resampled data: ", len(under_sample_data))