LabelEncoder和OneHotEncoder

最新推荐文章于 2022-04-20 15:38:03 发布

lixin051435

最新推荐文章于 2022-04-20 15:38:03 发布

阅读量525

点赞数

分类专栏： python linux 文章标签：深度学习

本文链接：https://blog.csdn.net/tsfx051435adsl/article/details/85679916

版权

python 同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

linux

5 篇文章 0 订阅

订阅专栏

# 将离散型的数据转换成0到n-1之间的数，这里的n是一个列表的不同取值的个数，可以认为是某个特征的所有不同取值的个数
def testLabelEncoder():
    from sklearn.preprocessing import LabelEncoder

    labelencoder = LabelEncoder()

    x = labelencoder.fit_transform([1, 1, 100, 67, 5])

    # [0 0 3 2 1]
    print(x)

# 对于离散的特征基本就是按照one-hot（独热）编码，该离散特征有多少取值，就用多少维来表示该特征
def testOneHotEncoder():
    from sklearn.preprocessing import OneHotEncoder

    oneHotEncoder = OneHotEncoder()

    x = oneHotEncoder.fit_transform([[2], [1], [3], [4]]).toarray()

    '''
    [[0. 1. 0. 0.]
    [1. 0. 0. 0.]
    [0. 0. 1. 0.]
    [0. 0. 0. 1.]]
    '''
    print(x)

def testOneHotEncoder2():
    from sklearn import preprocessing
    enc = preprocessing.OneHotEncoder()
    enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])    # fit来学习编码
    x = enc.transform([[0, 1, 3]]).toarray()    # 进行编码
    # [[1. 0. 0. 1. 0. 0. 0. 0. 1.]]
    print(x)

    '''
    数据矩阵是4*3，即4个数据，3个特征维度。

    0 0 3        观察左边的数据矩阵，第一列为第一个特征维度，有两种取值0\1. 所以对应编码方式为10 、01

    1 1 0        同理，第二列为第二个特征维度，有三种取值0\1\2，所以对应编码方式为100、010、001

    0 2 1        同理，第三列为第三个特征维度，有四中取值0\1\2\3，所以对应编码方式为1000、0100、0010、0001

    1 0 2

    再来看要进行编码的参数[0 , 1,  3]， 0作为第一个特征编码为10,  1作为第二个特征编码为010， 3作为第三个特征编码为0001.  故此编码结果为 1 0 0 1 0 0 0 0 1
    '''

if __name__ == "__main__":
    testOneHotEncoder2()

lixin051435

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LabelEncoder和OneHotEncoder

# 将离散型的数据转换成0到n-1之间的数，这里的n是一个列表的不同取值的个数，可以认为是某个特征的所有不同取值的个数def testLabelEncoder(): from sklearn.preprocessing import LabelEncoder labelencoder = LabelEncoder() x = labelencoder.fit_tran...
复制链接

扫一扫