背景
对于给定具体标签,如何将其转为one-hot形式?比如标签集合体育', '娱乐', '家居', '房产', '教育', '时尚', '时政', '游戏', '科技', '财经'
,对于multiclass任务,如何将训练数据集中的样本的标签转为one-hot形式?
方法1:
import numpy as np
label_list = ['体育', '娱乐', '家居', '房产', '教育', '时尚', '时政', '游戏', '科技', '财经']
label_dict = { l: i for i, l in enumerate(label_list)}
data_labels = np.array(["娱乐", "体育", "房产", "科技", "财经"])
data_label_ids = list(map(label_dict.get, data_labels))
one_hot = np.zeros((data_labels.size, len(label_list)), dtype=np.int8)
one_hot[np.arange(data_labels.size), data_label_ids] = 1
print(one_hot)
输出结果如下:
[[0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]]
方法2:
使用np.eye
import numpy as np
label_list = ['体育', '娱乐', '家居', '房产', '教育', '时尚', '时政', '游戏', '科技', '财经']
label_dict = { l: i for i, l in enumerate(label_list)}
data_labels = np.array(["娱乐", "体育", "房产", "科技", "财经"])
data_label_ids = list(map(label_dict.get, data_labels))
# 方法2
one_hot = np.eye(len(label_list), dtype=np.int8)[data_label_ids]
print(one_hot)
输出结果如下:
[[0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]]
方法3:
使用sklearn.preprocessing.LabelBinarizer
import numpy as np
label_list = ['体育', '娱乐', '家居', '房产', '教育', '时尚', '时政', '游戏', '科技', '财经']
label_dict = { l: i for i, l in enumerate(label_list)}
data_labels = np.array(["娱乐", "体育", "房产", "科技", "财经"])
data_label_ids = list(map(label_dict.get, data_labels))
# 方法3:
import sklearn.preprocessing
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(len(label_list)))
one_hot = label_binarizer.transform(data_label_ids)
print(one_hot)