写在jupyter里面比较漂亮:
https://douzujun.github.io/page/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/%E7%B1%BB%E4%B8%8D%E5%B9%B3%E8%A1%A1%E9%97%AE%E9%A2%98_%E4%BF%A1%E7%94%A8%E5%8D%A1%E6%AC%BA%E8%AF%88%E6%A3%80%E6%B5%8B.html
Out[35]:
|
Time |
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
V9 |
... |
V21 |
V22 |
V23 |
V24 |
V25 |
V26 |
V27 |
V28 |
Amount |
Class |
0 |
0.0 |
-1.359807 |
-0.072781 |
2.536347 |
1.378155 |
-0.338321 |
0.462388 |
0.239599 |
0.098698 |
0.363787 |
... |
-0.018307 |
0.277838 |
-0.110474 |
0.066928 |
0.128539 |
-0.189115 |
0.133558 |
-0.021053 |
149.62 |
0 |
1 |
0.0 |
1.191857 |
0.266151 |
0.166480 |
0.448154 |
0.060018 |
-0.082361 |
-0.078803 |
0.085102 |
-0.255425 |
... |
-0.225775 |
-0.638672 |
0.101288 |
-0.339846 |
0.167170 |
0.125895 |
-0.008983 |
0.014724 |
2.69 |
0 |
2 |
1.0 |
-1.358354 |
-1.340163 |
1.773209 |
0.379780 |
-0.503198 |
1.800499 |
0.791461 |
0.247676 |
-1.514654 |
... |
0.247998 |
0.771679 |
0.909412 |
-0.689281 |
-0.327642 |
-0.139097 |
-0.055353 |
-0.059752 |
378.66 |
0 |
3 |
1.0 |
-0.966272 |
-0.185226 |
1.792993 |
-0.863291 |
-0.010309 |
1.247203 |
0.237609 |
0.377436 |
-1.387024 |
... |
-0.108300 |
0.005274 |
-0.190321 |
-1.175575 |
0.647376 |
-0.221929 |
0.062723 |
0.061458 |
123.50 |
0 |
4 |
2.0 |
-1.158233 |
0.877737 |
1.548718 |
0.403034 |
-0.407193 |
0.095921 |
0.592941 |
-0.270533 |
0.817739 |
... |
-0.009431 |
0.798278 |
-0.137458 |
0.141267 |
-0.206010 |
0.502292 |
0.219422 |
0.215153 |
69.99 |
0 |
5 rows × 31 columns
样本不均衡解决方案
-
过采样 (两者一样多)
- 1的样本中采取生成策略,生成的数据和 0号样本一样多
标准化
- 比如Amount,数据量区间很大,会对模型有一个“数据大就重要的”误区; 所以要进行归一化,或者标准化
- 把他们区间放在 [-1,1] 或者[0, 1]区间上
Out[37]:
|
V1 |
V2 |
V3 |
V4 |
V5 |
V6 |
V7 |
V8 |
V9 |
V10 |
... |
V21 |
V22 |
V23 |
V24 |
V25 |
V26 |
V27 |
V28 |
Class |
normAmount |
0 |
-1.359807 |
-0.072781 |
2.536347 |
1.378155 |
-0.338321 |
0.462388 |
0.239599 |
0.098698 |
0.363787 |
0.090794 |
... |
-0.018307 |
0.277838 |
-0.110474 |
0.066928 |
0.128539 |
-0.189115 |
0.133558 |
-0.021053 |
0 |
0.244964 |
1 |
1.191857 |
0.266151 |
0.166480 |
0.448154 |
0.060018 |
-0.082361 |
-0.078803 |
0.085102 |
-0.255425 |
-0.166974 |
... |
-0.225775 |
-0.638672 |
0.101288 |
-0.339846 |
0.167170 |
0.125895 |
-0.008983 |
0.014724 |
0 |
-0.342475 |
2 |
-1.358354 |
-1.340163 |
1.773209 |
0.379780 |
-0.503198 |
1.800499 |
0.791461 |
0.247676 |
-1.514654 |
0.207643 |
... |
0.247998 |
0.771679 |
0.909412 |
-0.689281 |
-0.327642 |
-0.139097 |
-0.055353 |
-0.059752 |
0 |
1.160686 |
3 |
-0.966272 |
-0.185226 |
1.792993 |
-0.863291 |
-0.010309 |
1.247203 |
0.237609 |
0.377436 |
-1.387024 |
-0.054952 |
... |
-0.108300 |
0.005274 |
-0.190321 |
-1.175575 |
0.647376 |
-0.221929 |
0.062723 |
0.061458 |
0 |
0.140534 |
4 |
-1.158233 |
0.877737 |
1.548718 |
0.403034 |
-0.407193 |
0.095921 |