【一周算法进阶】--任务一数据预处理

Task1 数据预处理

说明:数据集是关于金融方面,预测贷款用户是否会逾期。表格中“status”是结果标签,0表示未逾期,1表示逾期。

1.导入相关包 &读取数据

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer,OneHotEncoder,Imputer

%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
data_original=pd.read_csv('data.csv',skipinitialspace=True)

将csv文件用UTF8编码才能用

data=data_original.copy()
data.head(5)
Unnamed: 0custidtrade_nobank_card_nolow_volume_percentmiddle_volume_percenttake_amount_in_later_12_month_highesttrans_amount_increase_rate_latelytrans_activity_monthtrans_activity_daytransd_mcctrans_days_interval_filtertrans_days_intervalregional_mobilitystudent_featurerepayment_capabilityis_high_usernumber_of_trans_from_2011first_transaction_timehistorical_trans_amounthistorical_trans_dayrank_trad_1_monthtrans_amount_3_monthavg_consume_less_12_valid_monthabstop_trans_count_last_1_monthavg_price_last_12_monthavg_price_top_last_12_valid_monthreg_preference_for_tradtrans_top_time_last_1_monthtrans_top_time_last_6_monthconsume_top_time_last_1_monthconsume_top_time_last_6_monthcross_consume_count_last_1_monthtrans_fail_top_count_enum_last_1_monthtrans_fail_top_count_enum_last_6_monthtrans_fail_top_count_enum_last_12_monthconsume_mini_time_last_1_monthmax_cumulative_consume_later_1_monthmax_consume_count_later_6_monthrailway_consume_count_last_12_monthpawns_auctions_trusts_consume_last_1_monthpawns_auctions_trusts_consume_last_6_monthjewelry_consume_count_last_6_monthstatussourcefirst_transaction_daytrans_day_last_12_monthid_nameapply_scoreapply_credibilityquery_org_countquery_finance_countquery_cash_countquery_sum_countlatest_query_timelatest_one_month_applylatest_three_month_applylatest_six_month_applyloans_scoreloans_credibility_behaviorloans_countloans_settle_countloans_overdue_countloans_org_count_behaviorconsfin_org_count_behaviorloans_cash_countlatest_one_month_loanlatest_three_month_loanlatest_six_month_loanhistory_suc_feehistory_fail_feelatest_one_month_suclatest_one_month_failloans_long_timeloans_latest_timeloans_credit_limitloans_credibility_limitloans_org_count_currentloans_product_countloans_max_limitloans_avg_limitconsfin_credit_limitconsfin_credibilityconsfin_org_count_currentconsfin_product_countconsfin_max_limitconsfin_avg_limitlatest_query_dayloans_latest_day
05279185820180507115231274000000023057383卡号10.010.9900.900.550.31317.027.026.03.0NaN19890030.020130817.0149050151.00.40340307.039200.1510200.55一线城市4.019.04.019.01.01.02.02.05.021706.00.01970180400.01xs1738.085.0蒋红583.079.08.02.06.010.02018-04-252.05.08.0552.073.037.034.02.010.01.09.01.01.013.037.07.01.00.0341.02018-04-192200.072.09.010.02900.01688.01200.075.01.02.01200.01200.012.018.0
11053404720180507121002192000000023073000卡号10.020.9420001.281.000.45819.030.014.04.01.016970023.020160402.0302910224.00.35105905.069500.0512100.50一线城市13.030.013.030.00.00.03.03.0330.021009.00.01820156800.00xs779.084.0崔向朝653.073.07.04.02.08.02018-05-032.06.08.0635.076.037.036.00.017.05.012.02.02.08.049.04.02.01.0353.02018-05-052000.074.012.012.03500.01758.015100.080.05.06.022800.09360.04.02.0
212284978720180507125159718000000023114911卡号10.040.9601.001.000.11413.068.022.01.0NaN971009.020170617.01152031.01.0057105.08400.655700.65一线城市0.068.00.068.00.03.06.06.00.003.00.0000.01xs338.095.0王中云654.076.011.05.05.016.02018-05-055.05.014.0633.083.04.02.00.03.01.02.02.02.04.02.02.01.01.0157.02018-05-011500.077.02.02.01600.01250.04200.087.01.01.04200.04200.02.06.0
313180970820180507121358683000000388283484卡号10.000.9620000.130.570.77722.014.06.03.0NaN6210033.020130516.0491130360.00.15916907.0468500.0512900.45三线城市6.08.06.08.00.01.08.08.031700.081409.00.02700279700.00xs1831.082.0何洋洋595.079.012.07.04.022.02018-05-053.016.017.0542.075.085.081.04.022.05.017.02.04.034.091.026.02.00.0355.02018-05-031800.074.017.018.03200.01541.016300.080.05.05.030000.012180.02.04.0
414249982920180507115448545000000388205844卡号10.010.9900.461.000.17513.066.042.01.0NaN11150012.020170312.06147063.00.6597706.07601.0011100.50一线城市0.066.00.066.00.03.03.03.00.010003.00.0064100.01xs435.088.0赵洋541.075.011.03.04.014.02018-04-156.08.09.0479.073.037.032.06.012.02.010.00.00.010.036.025.00.00.0360.02018-01-071800.072.010.010.02300.01630.08300.079.02.02.08400.08250.022.0120.0

2.数据探索分析 EDA

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

有时候DataFrame中的行列数量太多,print打印出来会显示不完全。
#显示所有列 pd.set_option(‘display.max_columns’, None)
#显示所有行 pd.set_option(‘display.max_rows’, None)
#设置value的显示长度为100,默认为50 pd.set_option(‘max_colwidth’,100)

data.describe()
Unnamed: 0custidlow_volume_percentmiddle_volume_percenttake_amount_in_later_12_month_highesttrans_amount_increase_rate_latelytrans_activity_monthtrans_activity_daytransd_mcctrans_days_interval_filtertrans_days_intervalregional_mobilitystudent_featurerepayment_capabilityis_high_usernumber_of_trans_from_2011first_transaction_timehistorical_trans_amounthistorical_trans_dayrank_trad_1_monthtrans_amount_3_monthavg_consume_less_12_valid_monthabstop_trans_count_last_1_monthavg_price_last_12_monthavg_price_top_last_12_valid_monthtrans_top_time_last_1_monthtrans_top_time_last_6_monthconsume_top_time_last_1_monthconsume_top_time_last_6_monthcross_consume_count_last_1_monthtrans_fail_top_count_enum_last_1_monthtrans_fail_top_count_enum_last_6_monthtrans_fail_top_count_enum_last_12_monthconsume_mini_time_last_1_monthmax_cumulative_consume_later_1_monthmax_consume_count_later_6_monthrailway_consume_count_last_12_monthpawns_auctions_trusts_consume_last_1_monthpawns_auctions_trusts_consume_last_6_monthjewelry_consume_count_last_6_monthstatusfirst_transaction_daytrans_day_last_12_monthapply_scoreapply_credibilityquery_org_countquery_finance_countquery_cash_countquery_sum_countlatest_one_month_applylatest_three_month_applylatest_six_month_applyloans_scoreloans_credibility_behaviorloans_countloans_settle_countloans_overdue_countloans_org_count_behaviorconsfin_org_count_behaviorloans_cash_countlatest_one_month_loanlatest_three_month_loanlatest_six_month_loanhistory_suc_feehistory_fail_feelatest_one_month_suclatest_one_month_failloans_long_timeloans_credit_limitloans_credibility_limitloans_org_count_currentloans_product_countloans_max_limitloans_avg_limitconsfin_credit_limitconsfin_credibilityconsfin_org_count_currentconsfin_product_countconsfin_max_limitconsfin_avg_limitlatest_query_dayloans_latest_day
count4754.0000004.754000e+034752.0000004752.0000004754.0000004751.0000004752.0000004752.0000004752.0000004746.0000004752.0000004752.0000001756.0000004.754000e+034754.0000004752.0000004.752000e+034.754000e+034752.0000004752.0000004.754000e+034752.0000004754.0000004752.0000004754.0000004650.0000004746.0000004746.0000004746.0000004746.0000004328.0000004738.0000004738.0000004738.0000004.728000e+034754.0000004746.0000004742.0000004754.0000004754.0000004742.0000004754.0000004752.0000004752.0000004450.0000004450.0000004450.0000004450.0000004450.0000004450.0000004450.0000004450.0000004450.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004457.0000004450.0000004457.000000
mean6008.4141781.690993e+060.0218060.9012941940.19772814.1606740.8044110.36542517.50294629.02992021.7512632.6786621.0011391.870201e+040.01114923.0338802.015109e+072.307359e+05176.1094280.4769263.896430e+046.5726019344.3500210.3557451237.0887670.5146677.13400820.1746737.04719820.6496000.6423291.6561844.5297595.2321651.553622e+052886.9646616.0556260.0307891321.20109418958.4602440.0143400.2509471036.27462189.006944576.63258475.99887611.9743826.0200003.78471916.8912364.3294388.77191012.364270543.20596875.43863635.95221031.0399372.30895212.8454124.7323318.1130810.9658962.82185313.92685743.14561417.7085481.2243661.311420335.1599732089.29773471.9923728.1130818.6852143390.0381421820.3578649187.00919976.0426304.7323315.22750716153.6908238007.69688124.11280955.181512
std3452.0714281.034235e+060.0415270.1448563923.971494694.1804730.1969200.1701964.47561622.72243216.4749160.8903600.0337395.221783e+040.10500710.0578371.480487e+043.204931e+0599.6872850.2637691.017461e+051.39072327007.5978860.350595765.8736490.1003975.31825412.9629795.45605013.1252242.3432281.9088874.4559234.7569743.742672e+0510813.4519085.6845290.4784996616.69184328191.1322600.2017770.433603537.10872919.06992751.1673754.1689167.0414933.8053692.59924411.2997874.5255217.6219619.27498260.9542662.23182224.61436321.6940683.1528817.4483932.9745965.3744651.4955663.45581710.82847530.35361825.0893481.9449123.89360735.770102708.95140610.8519265.3744655.7590251474.206546583.4182917371.25704314.5368192.9745963.40929214301.0376285679.41858537.72572453.486408
min5.0000001.140000e+020.0000000.0000000.0000000.0000000.1200000.0330002.0000000.0000004.0000001.0000001.0000000.000000e+000.0000001.0000002.011010e+070.000000e+002.0000000.0500000.000000e+000.0000000.0000000.0500000.0000000.0500000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000e+000.0000000.0000000.0000000.0000000.0000000.0000000.000000127.00000082.000000450.00000050.0000001.0000000.0000000.0000001.0000000.0000000.0000000.000000413.00000056.0000001.0000000.0000000.0000001.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00000026.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000-2.000000-2.000000
25%3106.0000007.593358e+050.0100000.8800000.0000000.6150000.6700000.23300015.00000016.00000012.0000002.0000001.0000008.590000e+030.00000016.0000002.014102e+077.949750e+04102.0000000.3000001.168250e+046.0000001290.0000000.087500920.0000000.4500003.25000012.0000003.00000012.0000000.0000000.0000002.0000002.0000000.000000e+00700.0000003.0000000.0000000.0000005252.5000000.0000000.000000632.00000082.000000535.00000074.0000007.0000003.0000002.0000009.0000001.0000003.0000006.000000493.00000074.00000017.00000015.0000000.0000007.0000002.0000004.0000000.0000000.0000006.00000021.0000003.0000000.0000000.000000329.0000001700.00000072.0000004.0000004.0000002300.0000001535.0000004800.00000077.0000002.0000003.0000007800.0000004737.0000005.00000010.000000
50%6006.5000001.634942e+060.0100000.960000500.0000000.9700000.8600000.35000017.00000023.00000017.0000003.0000001.0000001.221000e+040.00000021.0000002.015111e+071.623350e+05160.0000000.4500002.555500e+047.0000003345.0000000.2000001140.0000000.5000007.00000017.0000007.00000018.0000000.0000001.0000003.0000004.0000002.400000e+011530.0000005.0000000.00000070.00000012725.0000000.0000000.000000919.00000083.000000549.00000076.00000011.0000005.0000003.00000015.0000003.0000007.00000010.000000511.00000075.00000031.00000027.0000001.00000012.0000004.0000007.0000000.0000002.00000011.00000037.00000010.0000000.0000000.000000349.0000002100.00000074.0000007.0000008.0000003100.0000001810.0000007700.00000079.0000004.0000005.00000013800.0000007050.00000014.00000036.000000
75%8999.0000002.597905e+060.0200000.9900002000.0000001.6000001.0000000.48000020.00000032.00000027.0000003.0000001.0000001.764750e+040.00000029.0000002.016083e+072.985600e+05231.0000000.6000004.795000e+047.0000008067.5000000.6500001400.0000000.55000010.00000026.00000010.00000026.0000001.0000002.0000006.0000006.0000007.478850e+042760.0000007.0000000.000000980.00000023740.0000000.0000001.0000001310.25000087.000000629.00000078.00000016.0000008.0000005.00000023.0000006.00000012.00000017.000000602.00000077.00000050.00000043.0000003.00000017.0000007.00000011.0000001.0000004.00000020.00000059.00000022.0000002.0000001.000000356.0000002400.00000075.00000011.00000012.0000004300.0000002100.00000011700.00000080.0000007.0000007.00000020400.00000010000.00000024.00000091.000000
max11992.0000004.004694e+061.0000001.00000068000.00000047596.7400001.0000000.94100042.000000285.000000234.0000005.0000002.0000002.459390e+061.00000085.0000002.018011e+071.360130e+07907.0000001.0000006.024100e+0611.000000918450.0000001.00000023140.0000001.00000027.000000124.00000027.000000151.00000069.00000030.000000120.000000120.0000002.392316e+06496010.000000147.00000030.000000238380.000000525360.0000006.0000001.0000002697.000000382.000000687.00000093.00000054.00000024.00000016.00000098.00000038.00000075.00000080.000000688.00000085.000000158.000000154.00000025.00000041.00000018.00000031.00000015.00000052.00000074.000000254.000000345.00000020.00000058.000000360.0000006900.00000089.00000031.00000032.00000010000.0000006900.00000087100.00000087.00000018.00000020.000000266400.00000082800.000000360.000000323.000000
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4754 entries, 0 to 4753
Data columns (total 90 columns):
Unnamed: 0                                    4754 non-null int64
custid                                        4754 non-null int64
trade_no                                      4754 non-null object
bank_card_no                                  4754 non-null object
low_volume_percent                            4752 non-null float64
middle_volume_percent                         4752 non-null float64
take_amount_in_later_12_month_highest         4754 non-null int64
trans_amount_increase_rate_lately             4751 non-null float64
trans_activity_month                          4752 non-null float64
trans_activity_day                            4752 non-null float64
transd_mcc                                    4752 non-null float64
trans_days_interval_filter                    4746 non-null float64
trans_days_interval                           4752 non-null float64
regional_mobility                             4752 non-null float64
student_feature                               1756 non-null float64
repayment_capability                          4754 non-null int64
is_high_user                                  4754 non-null int64
number_of_trans_from_2011                     4752 non-null float64
first_transaction_time                        4752 non-null float64
historical_trans_amount                       4754 non-null int64
historical_trans_day                          4752 non-null float64
rank_trad_1_month                             4752 non-null float64
trans_amount_3_month                          4754 non-null int64
avg_consume_less_12_valid_month               4752 non-null float64
abs                                           4754 non-null int64
top_trans_count_last_1_month                  4752 non-null float64
avg_price_last_12_month                       4754 non-null int64
avg_price_top_last_12_valid_month             4650 non-null float64
reg_preference_for_trad                       4752 non-null object
trans_top_time_last_1_month                   4746 non-null float64
trans_top_time_last_6_month                   4746 non-null float64
consume_top_time_last_1_month                 4746 non-null float64
consume_top_time_last_6_month                 4746 non-null float64
cross_consume_count_last_1_month              4328 non-null float64
trans_fail_top_count_enum_last_1_month        4738 non-null float64
trans_fail_top_count_enum_last_6_month        4738 non-null float64
trans_fail_top_count_enum_last_12_month       4738 non-null float64
consume_mini_time_last_1_month                4728 non-null float64
max_cumulative_consume_later_1_month          4754 non-null int64
max_consume_count_later_6_month               4746 non-null float64
railway_consume_count_last_12_month           4742 non-null float64
pawns_auctions_trusts_consume_last_1_month    4754 non-null int64
pawns_auctions_trusts_consume_last_6_month    4754 non-null int64
jewelry_consume_count_last_6_month            4742 non-null float64
status                                        4754 non-null int64
source                                        4754 non-null object
first_transaction_day                         4752 non-null float64
trans_day_last_12_month                       4752 non-null float64
id_name                                       4478 non-null object
apply_score                                   4450 non-null float64
apply_credibility                             4450 non-null float64
query_org_count                               4450 non-null float64
query_finance_count                           4450 non-null float64
query_cash_count                              4450 non-null float64
query_sum_count                               4450 non-null float64
latest_query_time                             4450 non-null object
latest_one_month_apply                        4450 non-null float64
latest_three_month_apply                      4450 non-null float64
latest_six_month_apply                        4450 non-null float64
loans_score                                   4457 non-null float64
loans_credibility_behavior                    4457 non-null float64
loans_count                                   4457 non-null float64
loans_settle_count                            4457 non-null float64
loans_overdue_count                           4457 non-null float64
loans_org_count_behavior                      4457 non-null float64
consfin_org_count_behavior                    4457 non-null float64
loans_cash_count                              4457 non-null float64
latest_one_month_loan                         4457 non-null float64
latest_three_month_loan                       4457 non-null float64
latest_six_month_loan                         4457 non-null float64
history_suc_fee                               4457 non-null float64
history_fail_fee                              4457 non-null float64
latest_one_month_suc                          4457 non-null float64
latest_one_month_fail                         4457 non-null float64
loans_long_time                               4457 non-null float64
loans_latest_time                             4457 non-null object
loans_credit_limit                            4457 non-null float64
loans_credibility_limit                       4457 non-null float64
loans_org_count_current                       4457 non-null float64
loans_product_count                           4457 non-null float64
loans_max_limit                               4457 non-null float64
loans_avg_limit                               4457 non-null float64
consfin_credit_limit                          4457 non-null float64
consfin_credibility                           4457 non-null float64
consfin_org_count_current                     4457 non-null float64
consfin_product_count                         4457 non-null float64
consfin_max_limit                             4457 non-null float64
consfin_avg_limit                             4457 non-null float64
latest_query_day                              4450 non-null float64
loans_latest_day                              4457 non-null float64
dtypes: float64(70), int64(13), object(7)
memory usage: 3.3+ MB

可以看出数据集的数据类型有:float64(70), int64(13), object(7),部分特征有缺失情况。

(1)删除无用特征

data.nunique()
Unnamed: 0                                    4754
custid                                        4754
trade_no                                      4754
bank_card_no                                     1
low_volume_percent                              40
middle_volume_percent                           90
take_amount_in_later_12_month_highest          166
trans_amount_increase_rate_lately              782
trans_activity_month                            84
trans_activity_day                             512
transd_mcc                                      41
trans_days_interval_filter                     147
trans_days_interval                            114
regional_mobility                                5
student_feature                                  2
repayment_capability                          2390
is_high_user                                     2
number_of_trans_from_2011                       70
first_transaction_time                        1693
historical_trans_amount                       4524
historical_trans_day                           476
rank_trad_1_month                               20
trans_amount_3_month                          3524
avg_consume_less_12_valid_month                 12
abs                                           1697
top_trans_count_last_1_month                     8
avg_price_last_12_month                        330
avg_price_top_last_12_valid_month               20
reg_preference_for_trad                          5
trans_top_time_last_1_month                     28
trans_top_time_last_6_month                     97
consume_top_time_last_1_month                   28
consume_top_time_last_6_month                   94
cross_consume_count_last_1_month                19
trans_fail_top_count_enum_last_1_month          15
trans_fail_top_count_enum_last_6_month          25
trans_fail_top_count_enum_last_12_month         26
consume_mini_time_last_1_month                1971
max_cumulative_consume_later_1_month           863
max_consume_count_later_6_month                 29
railway_consume_count_last_12_month              6
pawns_auctions_trusts_consume_last_1_month     572
pawns_auctions_trusts_consume_last_6_month    2730
jewelry_consume_count_last_6_month               7
status                                           2
source                                           1
first_transaction_day                         1693
trans_day_last_12_month                        132
id_name                                       4309
apply_score                                    205
apply_credibility                               41
query_org_count                                 46
query_finance_count                             25
query_cash_count                                17
query_sum_count                                 74
latest_query_time                              207
latest_one_month_apply                          36
latest_three_month_apply                        56
latest_six_month_apply                          65
loans_score                                    247
loans_credibility_behavior                      25
loans_count                                    134
loans_settle_count                             123
loans_overdue_count                             26
loans_org_count_behavior                        41
consfin_org_count_behavior                      19
loans_cash_count                                32
latest_one_month_loan                           14
latest_three_month_loan                         31
latest_six_month_loan                           67
history_suc_fee                                171
history_fail_fee                               151
latest_one_month_suc                            19
latest_one_month_fail                           41
loans_long_time                                202
loans_latest_time                              232
loans_credit_limit                              54
loans_credibility_limit                         33
loans_org_count_current                         32
loans_product_count                             32
loans_max_limit                                 91
loans_avg_limit                                961
consfin_credit_limit                           327
consfin_credibility                             24
consfin_org_count_current                       19
consfin_product_count                           20
consfin_max_limit                              175
consfin_avg_limit                             1677
latest_query_day                               210
loans_latest_day                               235
dtype: int64

先看数据的nunique情况,看这个主要确定是否采用one-hot以及删除某些特征(主要是所有行和列都一样的),可以看出‘source’ ‘bank_card_no’ 的值只有一个,可以直接删除;‘Unnamed: 0’ ’custid ‘ ‘trade_no’ 这三个特征的唯一值为4754,并且根据属性名可知,可以删除。

data.drop(['Unnamed: 0', 'custid', 'trade_no', 'bank_card_no', 'source','id_name'], axis=1, inplace=True)

在删除DataFrame对象中的字段时,出现找不到字段的错误,可以在读取csv文件时添加一个参数:skipinitialspace=True 即可。

data.shape
(4754, 84)

(2)数据类型转换

object类型转换

object_cols = [col for col in data.columns if data[col].dtypes == 'O']
object_cols

#data.select_dtypes(include=[object]).columns
['reg_preference_for_trad', 'latest_query_time', 'loans_latest_time']
data[object_cols].head(5)
reg_preference_for_tradlatest_query_timeloans_latest_time
0一线城市2018-04-252018-04-19
1一线城市2018-05-032018-05-05
2一线城市2018-05-052018-05-01
3三线城市2018-05-052018-05-03
4一线城市2018-04-152018-01-07
data_obj=data[object_cols]
data_num=data.drop(object_cols,axis=1)

(3)缺失值的填充

缺失值处理大致可以分为删除和填充两种方法。删除又分为删除行(样本)和删除列(特征)两种,之前我们已经删除了缺失大量特征的样本和部分无用特征,目前剩下的特征所含缺失值不多,所以我们不采用删除的方法处理缺失值。

缺失值填充的方法有很多,需要根据特征的情况进行不同类型的填充,常见的有:均值填充、众数填充、中位数填充、前值填充等等。

imputer=Imputer(strategy='mean')
mean_num=imputer.fit_transform(data_num)
data_num=pd.DataFrame(mean_num,columns=data_num.columns)
data_obj.ffill(inplace=True)

object类型转换

encoder = LabelBinarizer()
reg_preference_1hot = encoder.fit_transform(data_obj[['reg_preference_for_trad']])
data_obj.drop(['reg_preference_for_trad'], axis=1, inplace=True)
reg_preference_df = pd.DataFrame(reg_preference_1hot, columns=encoder.classes_)
data_obj = pd.concat([data_obj, reg_preference_df], axis=1)

data_obj['latest_query_time'] = pd.to_datetime(data_obj['latest_query_time'])
data_obj['latest_query_time_month'] = data_obj['latest_query_time'].dt.month
data_obj['latest_query_time_weekday'] = data_obj['latest_query_time'].dt.weekday

data_obj['loans_latest_time'] = pd.to_datetime(data_obj['loans_latest_time'])
data_obj['loans_latest_time_month'] = data_obj['loans_latest_time'].dt.month
data_obj['loans_latest_time_weekday'] = data_obj['loans_latest_time'].dt.weekday

data_obj = data_obj.drop(['latest_query_time', 'loans_latest_time'], axis=1)

data_obj.head(5)
一线城市三线城市二线城市其他城市境外latest_query_time_monthlatest_query_time_weekdayloans_latest_time_monthloans_latest_time_weekday
0100004243
1100005355
2100005551
3010005553
4100004616
data=pd.concat([data_num,data_obj],axis=1)
data.shape
(4754, 90)
data.info(5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4754 entries, 0 to 4753
Data columns (total 90 columns):
low_volume_percent                            4754 non-null float64
middle_volume_percent                         4754 non-null float64
take_amount_in_later_12_month_highest         4754 non-null float64
trans_amount_increase_rate_lately             4754 non-null float64
trans_activity_month                          4754 non-null float64
trans_activity_day                            4754 non-null float64
transd_mcc                                    4754 non-null float64
trans_days_interval_filter                    4754 non-null float64
trans_days_interval                           4754 non-null float64
regional_mobility                             4754 non-null float64
student_feature                               4754 non-null float64
repayment_capability                          4754 non-null float64
is_high_user                                  4754 non-null float64
number_of_trans_from_2011                     4754 non-null float64
first_transaction_time                        4754 non-null float64
historical_trans_amount                       4754 non-null float64
historical_trans_day                          4754 non-null float64
rank_trad_1_month                             4754 non-null float64
trans_amount_3_month                          4754 non-null float64
avg_consume_less_12_valid_month               4754 non-null float64
abs                                           4754 non-null float64
top_trans_count_last_1_month                  4754 non-null float64
avg_price_last_12_month                       4754 non-null float64
avg_price_top_last_12_valid_month             4754 non-null float64
trans_top_time_last_1_month                   4754 non-null float64
trans_top_time_last_6_month                   4754 non-null float64
consume_top_time_last_1_month                 4754 non-null float64
consume_top_time_last_6_month                 4754 non-null float64
cross_consume_count_last_1_month              4754 non-null float64
trans_fail_top_count_enum_last_1_month        4754 non-null float64
trans_fail_top_count_enum_last_6_month        4754 non-null float64
trans_fail_top_count_enum_last_12_month       4754 non-null float64
consume_mini_time_last_1_month                4754 non-null float64
max_cumulative_consume_later_1_month          4754 non-null float64
max_consume_count_later_6_month               4754 non-null float64
railway_consume_count_last_12_month           4754 non-null float64
pawns_auctions_trusts_consume_last_1_month    4754 non-null float64
pawns_auctions_trusts_consume_last_6_month    4754 non-null float64
jewelry_consume_count_last_6_month            4754 non-null float64
status                                        4754 non-null float64
first_transaction_day                         4754 non-null float64
trans_day_last_12_month                       4754 non-null float64
apply_score                                   4754 non-null float64
apply_credibility                             4754 non-null float64
query_org_count                               4754 non-null float64
query_finance_count                           4754 non-null float64
query_cash_count                              4754 non-null float64
query_sum_count                               4754 non-null float64
latest_one_month_apply                        4754 non-null float64
latest_three_month_apply                      4754 non-null float64
latest_six_month_apply                        4754 non-null float64
loans_score                                   4754 non-null float64
loans_credibility_behavior                    4754 non-null float64
loans_count                                   4754 non-null float64
loans_settle_count                            4754 non-null float64
loans_overdue_count                           4754 non-null float64
loans_org_count_behavior                      4754 non-null float64
consfin_org_count_behavior                    4754 non-null float64
loans_cash_count                              4754 non-null float64
latest_one_month_loan                         4754 non-null float64
latest_three_month_loan                       4754 non-null float64
latest_six_month_loan                         4754 non-null float64
history_suc_fee                               4754 non-null float64
history_fail_fee                              4754 non-null float64
latest_one_month_suc                          4754 non-null float64
latest_one_month_fail                         4754 non-null float64
loans_long_time                               4754 non-null float64
loans_credit_limit                            4754 non-null float64
loans_credibility_limit                       4754 non-null float64
loans_org_count_current                       4754 non-null float64
loans_product_count                           4754 non-null float64
loans_max_limit                               4754 non-null float64
loans_avg_limit                               4754 non-null float64
consfin_credit_limit                          4754 non-null float64
consfin_credibility                           4754 non-null float64
consfin_org_count_current                     4754 non-null float64
consfin_product_count                         4754 non-null float64
consfin_max_limit                             4754 non-null float64
consfin_avg_limit                             4754 non-null float64
latest_query_day                              4754 non-null float64
loans_latest_day                              4754 non-null float64
一线城市                                          4754 non-null int64
三线城市                                          4754 non-null int64
二线城市                                          4754 non-null int64
其他城市                                          4754 non-null int64
境外                                            4754 non-null int64
latest_query_time_month                       4754 non-null int64
latest_query_time_weekday                     4754 non-null int64
loans_latest_time_month                       4754 non-null int64
loans_latest_time_weekday                     4754 non-null int64
dtypes: float64(81), int64(9)
memory usage: 3.3 MB

参考:

1.https://blog.csdn.net/bear507/article/details/86649069
2.科大讯飞AI广告点击预测比赛

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值