第二次打卡
查看数据大体分布情况
# 导入相关库
import seaborn as sns
import pandas as pd
import pandas_profiling as pp
import matplotlib.pyplot as plt
import warnings
import time
import datetime
import numpy as np
import missingno
warnings.filterwarnings('ignore')
%matplotlib inline
np.set_printoptions(suppress=True)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
pd.set_option('display.max_rows',None)
pd.set_option('display.max_columns',None)
plt.rcParams['font.sans-serif']='SimHei'
%%time
data_train = pd.read_csv('./train.csv')
80万条数据,共47个属性,其中employmentylength(就业年限)、匿名特征n0-n14存在较多的缺失值; annualIncome、dti、delinquency_2years、revolBal revolUtil、totalAcc、 title等字段存在较多的异常值