用 数字标号的定类数据 转化成 类别
如sex 0 1 其中 1是男性 0是女性
可以转化成 male 和female
本文使用的是uci 心脏病数据集
将整数编码转为实际对应的字符串
df['sex'][df['sex']==0]='female'
df['sex'][df['sex']==1]='male'
df['chest_pain_type'][df['chest_pain_type']==0]='typical angina'
df['chest_pain_type'][df['chest_pain_type']==1]='antypical angina'
df['chest_pain_type'][df['chest_pain_type']==2]='non-anginal pain'
df['chest_pain_type'][df['chest_pain_type']==3]='asymptomatic'
df['fasting_blood_sugar'][df['fasting_blood_sugar']==0]='lower than 120mg/ml'
df['fasting_blood_sugar'][df['fasting_blood_sugar']==1]='greater than 120mg/ml'
df['resting_electrocardiographic'][df['resting_electrocardiographic']==0]='normal'
df['resting_electrocardiographic'][df['resting_electrocardiographic']==1]='ST-T wave abnarmality'
df['resting_electrocardiographic'][df['resting_electrocardiographic']==2]='left wentricular hapertorphy'
df['exercise_induced_angina'][df['exercise_induced_angina']==0]='no'
df['exercise_induced_angina'][df['exercise_induced_angina']==1]='yes'
df['ST_slope'][df['ST_slope']==0]='upsloping'
df['ST_slope'][df['ST_slope']==1]='flat'
df['ST_slope'][df['ST_slope']==2]='downsloping'
df['thal'][df['thal']==0]='unknown'
df['thal'][df['thal']==1]='normal'
df['thal'][df['thal']==2]='fixed defect'
df['thal'][df['thal']==3]='reversable defect'
效果
在pandas中,
离散的定类和定序特征列应该是object类型
连续的定距和定比特征列应该是int64或者float64的浮点数类型
将定类和定序的特征类转化为One-Hot独热编码
df=pd.get_dummies(df)
df.columns