[多标签分类]MultiLabelBinarizer: 从one-hot 到multi-hot

原创

已于 2023-08-30 09:07:58 修改 · 1.5k 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#分类 #python #numpy

于 2023-08-29 19:00:47 首次发布

本文介绍了Scikit-learn中的OneHotEncoder用于标签独热编码，LabelEncoder用于建立标签索引映射，以及MultiLabelBinarizer用于多标签multi-hot编码。它们的fit、transform和inverse_transform方法及其用法进行了详细阐述。

]MultiLabelBinarizer: 从one-hot 到multi-hot

背景知识
One hot encoder
LabelEncoder
MultiLabelBinarizer
总结
References

背景知识

多类别分类: label space至少有3个label, 且默认每个sample有一个label, 与之相对应的是二元分类Binary classification,

多标签分类: 每个sample有1至多个labels, 一般多标签分类都是多类别, 有时又称之为多标签多类别分类.

One hot encoder

Scikit-learn中实现了该功能,

from sklearn.preprocessing import  OneHotEncoder

如下展示了使用OneHotEncoder对label进行度热编码的过程,

encoder = OneHotEncoder()
labels = ['red', 'green', 'blue', 'blue', 'red']
data = np.array(labels).reshape(-1, 1) # shape: (n, 1)
encoder.fit(data)
print(f'encoder.categories_: {encoder.categories_}')
ans = encoder.transform(data).toarray()
ans_rev = encoder.inverse_transform(ans)
print(f'ans: {ans}')
print(f'ans_rev: {ans_rev}')