get_dummies 会生成n-1个虚拟变量
e.g. pd.get_dummies(data)
LabelEcoder
e.g.
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([“paris”, “paris”, “tokyo”, “amsterdam”])
le.transform([“tokyo”, “tokyo”, “paris”])
map:
e.g.
title_mapping = {“Mr”: 1, “Miss”: 2, “Mrs”: 3, “Master”: 4, “Rare”: 5}
for dataset in combinne:
dataset[‘Title’] = dataset[‘Title’].map(title_mapping)
dataset[‘Title’] = dataset[‘Title’].fillna(0)
对于区间分组,转换成ordinal的:
dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0
dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1
dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2
dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3
dataset.loc[ dataset['Age'] > 64, 'Age'] = 4
train_df.head()