先来看一个符合预期的代码,输出one-hot编码:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit_transform(range(0, 3))
输出:
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
再看一个神奇的案例,把上面的代码3改成2:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit_transform(range(0, 2))
输出:
array([[0],
[1]])
结果不是one-hot编码???
解决方法:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
enc.fit_transform([['yes'], ['no'], ['no'], ['yes']]).toarray()
输出:
array([[0., 1.],
[1., 0.],
[1., 0.],
[0., 1.]])