目录
OrdinalEncoder
sklearn.preprocessing.OrdinalEncoder(*, categories='auto', dtype=<class 'numpy.float64'>, handle_unknown='error', unknown_value=None, encoded_missing_value=nan)
将分类特征转化为整数数组
编码器的输入应该是以整数或字符串为元素的类数组,表示由分类的(离散的)特征所获得的值,这些特征被转换为序列整数,这将导致每个特征产生一个整数列
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories - 1) per feature
参数
categories
‘auto’ or a list of array-like, default=’auto’
参数可选值 | |
---|---|
auto | 根据数据选择编码规则 |
list | categories[i] 保存第i列中期望的类别 |
dtype
number type, default np.float64
期望的输出数据类型
handle_unknown
{‘error’, ‘use_encoded_value’}, default=’error’
当被设置为error时,当transform过程中遇到未知分类特征时将会抛出一个错误
unknown_value
int or np.nan, default=None
当参数handle_unknown
被设置为use_encoded_value
时,该参数是必须的
encoded_missing_value
int or np.nan, default=np.nan
缺失类别的编码值。如果设置为np.Nan,那么参数dtype
必须是浮点型
属性
categories_
list of arrays
在拟合过程中确定每个特征的类别
The categories of each feature determined during fit (in order of the features in X and corresponding with the output of transform). This does not include categories that weren’t seen during fit.
n_features_in_
int
拟合过程中的特征数量
feature_names_in_
ndarray of shape (n_features_in_,)
拟合过程中的特征名称
Names of features seen during fit. Defined only when X has feature names that are all strings.
方法
fit(X[, y])
拟合数据
Fit the OrdinalEncoder to X.
fit_transform(X[, y])
拟合数据并进行转换
Fit to data, then transform it.
get_feature_names_out([input_features])
返回输出特征名称
Get output feature names for transformation.
get_params([deep])
返回模型参数
Get parameters for this estimator.
inverse_transform(X)
还原数据
Convert the data back to the original representation.
set_params(**params)
设置模型参数
Set the parameters of this estimator.
transform(X)
转换数据为序列代码
Transform X to ordinal codes.
使用示例
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
x = [['Male', 1], ['Female', 3], ['Female', 2]]
x_transform=encoder.fit_transform(x)
x_transform
>>> array([[1., 0.],
[0., 2.],
[0., 1.]])
encoder.inverse_transform(x_transform)
>>>array([['Male', 1],
['Female', 3],
['Female', 2]], dtype=object)