参考:http://scikit-learn.org/stable/modules/preprocessing_targets.html
没什么好翻译的,直接给例子。
1、Label binarization
LabelBinarizer is a utility class to help create a label indicator matrix from a list of multi-class labels:
>>>
>>> from sklearn import preprocessing
>>> lb = preprocessing . LabelBinarizer ()
>>> lb . fit ([ 1 , 2 , 6 , 4 , 2 ])
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb . classes_
array([1, 2, 4, 6])
>>> lb . transform ([ 1 , 6 ])
array([[1, 0, 0, 0],
[0, 0, 0, 1]])
Binary targets transform to a column vector
>>>
>>> lb = preprocessing . LabelBinarizer ()
>>> lb . fit_transform ([ 'yes' , 'no' , 'no' , 'yes' ])
array([[1],
[0],
[0],
[1]])
Passing a 2D matrix for multilabel classification
>>>
>>> import numpy as np
>>> lb . fit ( np . array ([[ 0 , 1 , 1 ], [ 1 , 0 , 0 ]]))
LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
>>> lb . classes_
array([0, 1, 2])
>>> lb . transform ([ 0 , 1 , 2 , 1 ])
array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[0, 1, 0]])
For multiple labels per instance, use MultiLabelBinarizer :
>>>
>>> lb = preprocessing . MultiLabelBinarizer ()
>>> lb . fit_transform ([( 1 , 2 ), ( 3 ,)])
array([[1, 1, 0],
[0, 0, 1]])
>>> lb . classes_
array([1, 2, 3])
2、Lable encoding
LabelEncoder is a utility class to help normalize labels such that they contain only values between 0 and n_classes-1 . LabelEncoder can be used as follows:
>>>
>>> from sklearn import preprocessing
>>> le = preprocessing . LabelEncoder ()
>>> le . fit ([ 1 , 2 , 2 , 6 ])
LabelEncoder()
>>> le . classes_
array([1, 2, 6])
>>> le . transform ([ 1 , 1 , 2 , 6 ])
array([0, 0, 1, 2])
>>> le . inverse_transform ([ 0 , 0 , 1 , 2 ])
array([1, 1, 2, 6])
It can also be used to transform non-numerical labels (as long as they are hashable and comparable) to numerical labels:
>>>
>>> le = preprocessing . LabelEncoder ()
>>> le . fit ([ "paris" , "paris" , "tokyo" , "amsterdam" ])
LabelEncoder()
>>> list ( le . classes_ )
['amsterdam', 'paris', 'tokyo']
>>> le . transform ([ "tokyo" , "tokyo" , "paris" ])
array([2, 2, 1])
>>> list ( le . inverse_transform ([ 2 , 2 , 1 ]))
['tokyo', 'tokyo', 'paris']