二元化
scikit-learn 提供的Binarizer能够将数据二元化
from sklearn.preprocessing import Binarizer
X = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",X)
binarizer=Binarizer(threshold=2.5)
print("after trandform :" , binarizer.trandform(X))
阈值设定为2.5。运行结果如下
before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
after trandform : [[0 0 1 1 1]
[1 1 1 0 0]
[1 1 1 1 1]
[0 0 0 0 0]]
独热码编码
from sklearn.preprocessing import OneHotEncoder
X = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",X)
encoder=OneHotEncoder(sparse=False)
encoder.fit(X)
print("active_features_:",encoder.active_features_)
print("feature_indices_",encoder.feature_indices_)
print("n_values_",encoder.n_values)
print("after transform:",encoder.transform([[1,2,3,4,5]]))
before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
active_features_: [ 1 3 5 7 8 9 10 12 14 16 17 18 19 21 23 25]
feature_indices_ [ 0 6 11 15 20 26]
n_values_ auto
after transform: [[1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1.]]
第一个原始特征最大值为5,因此第一个原始特征种类为6中(0,1,2,3,4,5)则原始数据用一个六元元祖来编码
- 0 编码为(1,0,0,0,0,0)
- 1编码为(0,1,0,0,0,0)
第一个原始特征最大值为4,因此第一个原始特征种类为5中(0,1,2,3,4)则原始数据用一个六元元祖来编码 - 0 编码为(1,0,0,0,0)
1编码为(0,1,0,0,0)
标准化
MINMaxScaler
from sklearn.preprocessing import MinMaxScaler
X = [[1,2,3,4,5],
[5,4,3,2,1],
[3,3,3,3,3],
[1,1,1,1,1]]
print("before transform:",X)
scaler=MinMaxScaler(feature_range=(0,2))
scaler.fit(X)
print("min_is:", scaler.min_)
print("scale is",scaler.scale_)
print("data_max_ is",scaler.data_max_)
print("data_min_ is",scaler.data_min_)
print("data_range_ is",scaler.data_range_)
print("after transform is",scaler.transform(X))
before transform: [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [3, 3, 3, 3, 3], [1, 1, 1, 1, 1]]
min_is: [-0.5 -0.66666667 -1. -0.66666667 -0.5 ]
scale is [0.5 0.66666667 1. 0.66666667 0.5 ]
data_max_ is [5. 4. 3. 4. 5.]
data_min_ is [1. 1. 1. 1. 1.]
data_range_ is [4. 3. 2. 3. 4.]
after transform is [[0. 0.66666667 2. 2. 2. ]
[2. 2. 2. 0.66666667 0. ]
[1. 1.33333333 2. 1.33333333 1. ]
[0. 0. 0. 0. 0. ]]
其他如:
MaxAbsScaler
sklearn.preprocessing.MaxAbsScaler(copy=True)
StandardScaler(z-score)
正则化