数据集来源于Kaggle https://www.kaggle.com/mlg-ulb/creditcardfraud, 用于预测信用卡用户是否会落入诈骗组,这里发一个中文版本的存稿
数据初探
首先导入数据
dat = pd.read_csv("E:/study/machine learning/credit card fraud/creditcard.csv")
df = pd.DataFrame(dat)
df.describe()
Time V1 V2 ... Amount Class Amount_log
0 0.0 -1.359807 -0.072781 ... 149.62 0 5.008166
1 0.0 1.191857 0.266151 ... 2.69 0 0.993252
2 1.0 -1.358354 -1.340163 ... 378.66 0 5.936665
3 1.0 -0.966272 -0.185226 ... 123.50 0 4.816322
4 2.0 -1.158233 0.877737 ... 69.99 0 4.248495
5 2.0 -0.425966 0.960523 ... 3.67 0 1.302913
6 4.0 1.229658 0.141004 ... 4.99 0 1.609438
7 7.0 -0.644269 1.417964 ... 40.80 0 3.708927
8 7.0 -0.894286 0.286157 ... 93.20 0 4.534855
9 9.0 -0.338262 1.119593 ... 3.68 0 1.305626
10 10.0 1.449044 -1.176339 ... 7.80 0 2.055405
11 10.0 0.384978 0.616109 ... 9.99 0 2.302585
12 1