基于scikit-learn
注意 OneHotEncoder(sparse=False),不然返回的就是索引值的形式
from numpy import array
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
values = array(data)
print(values)
# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print(integer_encoded)
# binary encode
onehot_encoder = OneHotEncoder(sparse=False)
values = values.reshape(len(values), 1) #这一步很有必要
onehot_encoded = onehot_encoder.fit_transform(values)
print(onehot_encoded)
结果:
['cold' 'cold' 'warm' 'cold' 'hot' 'hot' 'warm' 'cold' 'warm' 'hot']
[0 0 2 0 1 1 2 0 2 1]
[[ 1. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]
[ 0. 0. 1.]
[ 0. 1. 0.]]
把one-hot编码还原成标签编码
# invert first example
int_endode = np.argmax(onehot_encoded, axis=1)
print(int_endode)
结果:
[0 0 2 0 1 1 2 0 2 1]
基于keras
假设已经有了标签编码[0 0 2 0 1 1 2 0 2 1],利用keras.utils.to_categorical()可以把标签编码转化成one-hot编码。
encoded = to_categorical(integer_encoded)
print(integer_encoded)
print(encoded)
得到结果:
[0 0 2 0 1 1 2 0 2 1]
[[ 1. 0. 0.]
[ 1. 0. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]
[ 1. 0. 0.]
[ 0. 0. 1.]
[ 0. 1. 0.]]
一般来说,深度学习都要用到one-hot编码对y,也就是label进行处理。
参考:
https://blog.csdn.net/gdh756462786/article/details/79161525