集成学习下 03幸福感预测实战
项目和代码开源地址:datawhale
1. 赛题思路
- 首先分析变量维数和特征,赛题变量维数139维,均为离散变量/特征。对训练集中数据进行清洗,根据不同变量对应index的值,进行异常值的去除、补充缺失值、修改错误值的操作。如下所示:
#填充缺失值 共25列 去掉4列 填充21列
#以下的列都是缺省的,视情况填补
data['work_status'] = data['work_status'].fillna(9)#修改为其他
data['work_yr'] = data['work_yr'].fillna(0)
data['work_manage'] = data['work_manage'].fillna(0)
data['work_type'] = data['work_type'].fillna(0)
data['edu_yr'] = data['edu_yr'].fillna(0)
data['edu_status'] = data['edu_status'].fillna(0)
data['s_work_type'] = data['s_work_type'].fillna(0)
data['s_work_status'] = data['s_work_status'].fillna(9)#修改其他
data['s_political'] = data['s_political'].fillna(0)
data['s_hukou'] = data['s_hukou'].fillna(8)
data['s_income'] = data['s_income'].fillna(0)
data['s_birth'] = data['s_birth'].fillna(0)
data['s_edu'] = data['s_edu'].fillna(14)
data['s_work_exper'] = data['s_work_exper'].fillna(0)
data['minor_child'] = data['minor_child'].fillna(0)
data['marital_now'] = data['marital_now'].fillna(0)
data['marital_1st'] = data['marital_1st'].fillna(0)
data['social_neighbor']