编码的对象是数组,编码逻辑是将行认为是sample,列认为是feature。
将每列出现的值按一定的规律排列(比如大小),加入fit的数据又N列,encoder后的categories便会有N个。
对于需要transform的数组来说,第一列中的值在categories的相应位置存在的,则为1,不存在,则为0 。以此类推,第N列中的值在第N个categories中存在就为1,不存在就为0。将所有 categories中的返回值以行链接,(相当于np.c_[]函数的作用)返回。
接着对下一行中的每个列的值做以上运算。
举例如下:
enc=OneHotEncoder()
data=[[0,0,3],[1,1,0],[0,2,1],[1,0,2]]
enc.fit(data)
Warning (from warnings module):
File "/Users/bnz/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/encoders.py", line 368
warnings.warn(msg, FutureWarning)
FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify “categories=‘auto’”.
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
OneHotEncoder(categorical_features=None, categories=None,
dtype=<class ‘numpy.float64’>, handle_unknown=‘error’,
n_values=None, sparse=True)
>>> enc.categories
[array([0., 1.]), array([0., 1., 2.]), array([0., 1., 2., 3.])]
enc.transform([[0,1,1]]).toarray()
array([[1., 0., 0., 1., 0., 0., 1., 0., 0.]])data2=[[0, 0, 3], [3, 1, 0], [0, 2, 1], [1, 0, 2]]
enc.fit(data2)
Warning (from warnings module):
File “/Users/bnz/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/_encoders.py”, line 368
warnings.warn(msg, FutureWarning)
FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify “categories=‘auto’”.
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
OneHotEncoder(categorical_features=None, categories=None,
dtype=<class ‘numpy.float64’>, handle_unknown=‘error’,
n_values=None, sparse=True)
enc.categories_
[array([0., 1., 3.]), array([0., 1., 2.]), array([0., 1., 2., 3.])]
参考文章:https://blog.csdn.net/lanchunhui/article/details/72794317