1、当变量为类别变量,且变量的类别较少时,可以考虑考转换成虚拟变量来处理
In [34]:
embark_dummies = pd.get_dummies(train_data.Embarked) # drop the original column train_data.drop('Embarked',axis=1,inplace=True) train_data = train_data.join(embark_dummies)
In [35]:
sex_dummies = pd.get_dummies(train_data.Sex) # drop the original column train_data.drop('Sex',axis=1,inplace=True) train_data = train_data.join(sex_dummies)
In [36]:
train_data.head()
Out[36]:
PassengerId | Survived | Pclass | Name | Age | SibSp | Parch | Ticket | Fare | Cabin | C | Q | S | female | male | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | 1 | 0 |