【sklearn】数据预处理之LabelEncoder()、OneHotEncoder()

最新推荐文章于 2024-07-28 21:01:31 发布

一起来学深度学习鸭

最新推荐文章于 2024-07-28 21:01:31 发布

阅读量869

点赞数

分类专栏： python 文章标签： sklearn python

本文链接：https://blog.csdn.net/weixin_69722030/article/details/127871374

版权

python 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

基于scikit-learn

注意 OneHotEncoder(sparse=False)，不然返回的就是索引值的形式

from numpy import array
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
# define example
data = ['cold', 'cold', 'warm', 'cold', 'hot', 'hot', 'warm', 'cold', 'warm', 'hot']
values = array(data)
print(values)

# integer encode
label_encoder = LabelEncoder()
integer_encoded = label_encoder.fit_transform(values)
print(integer_encoded)

# binary encode
onehot_encoder = OneHotEncoder(sparse=False)
values = values.reshape(len(values), 1)  #这一步很有必要
onehot_encoded = onehot_encoder.fit_transform(values)
print(onehot_encoded)

结果：

['cold' 'cold' 'warm' 'cold' 'hot' 'hot' 'warm' 'cold' 'warm' 'hot']
[0 0 2 0 1 1 2 0 2 1]
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]

把one-hot编码还原成标签编码

# invert first example
int_endode = np.argmax(onehot_encoded, axis=1)
print(int_endode)

结果：

[0 0 2 0 1 1 2 0 2 1]

基于keras

假设已经有了标签编码[0 0 2 0 1 1 2 0 2 1]，利用keras.utils.to_categorical()可以把标签编码转化成one-hot编码。

encoded = to_categorical(integer_encoded)
print(integer_encoded)
print(encoded)

得到结果：

[0 0 2 0 1 1 2 0 2 1]
[[ 1.  0.  0.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]
 [ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]]