目的:
对于特征进行进一步分析,并对于数据进行处理。
1、
查看分类变量中包含的类型数
cate_feature=['grade','subGrade','employmentTitle','homeOwnership',
'verificationStatus','purpose','regionCode','postCode',
'applicationType','initialListStatus','title','policyCode']
for f in cate_feature:
print(f,'类型数:',data[f].nunique())
查看这些分类变量有哪些类型:
cate_feature=['grade','subGrade','employmentTitle','homeOwnership',
'verificationStatus','purpose','regionCode','postCode',
'applicationType','initialListStatus','title','policyCode']
for f in cate_feature:
print(f,'类型数:',data[f].unique())
得到结果:
2、查看缺失值数量:
print(data.isnull().any().sum())#缺失值数量
22
3、查看缺失值大于0.5的特征:
print(data.isnull().any().sum())#缺失值数量
have_null_fea_dict=((data.isnull().sum()/len(data)).to_dict())
fea_null_moreThanHalf={}
for key,value in have_null_fea_dict.items():
if value>0.5:
fea_null_moreThanHalf[key]=value
print(fea_null_moreThanHalf)
{}
4、查看训练集中特征属性只有一值的特征
one_value_fea=[col for col in data.columns if data[col].nunique()<=1]
print(one_value_fea)
['policyCode']