- sklearn.preprocessing:
博主写的很详细:
https://blog.csdn.net/pipisorry/article/details/52247679
The preprocessing module further provides a utility class StandardScaler that implements the Transformer API to computethe mean and standard deviation on a training set so as to beable to later reapply the same transformation on the testing set.This class is hence suitable for use in the early steps of a sklearn.pipeline.Pipeline:
scaler = preprocessing.StandardScaler().fit(X)
scaler
StandardScaler(copy=True, with_mean=True, with_std=True)
scaler.mean_
array([ 1. …, 0. …, 0.33…])
scaler.scale_
array([ 0.81…, 0.81…, 1.24…])
scaler.transform(X)
array([[ 0. …, -1.22…, 1.33…],
[ 1.22…, 0. …, -0.26…],
[-1.22…, 1.22…, -1.06…]])
The scaler instance can then be used on new data to transform it thesame way it did on the training set:
scaler.transform([[-1., 1., 0.]])
array([[-2.44…, 1.22…, -0.26…]])
作者:-柚子皮-
来源:CSDN
原文:https://blog.csdn.net/pipisorry/article/details/52247679
版权声明:本文为博主原创文章,转载请附上博文链接!
二值化 Binarization
sklearn.preprocessing.Binarizer
binarizer=preprocessing.Binarizer().fit(x)# here fit does nothing
binarizer
Out[136]: Binarizer(copy=True, threshold=0.0)
binarizer.transform(x)
Out[137]:
array([[1, 0, 1],
[1, 0, 0],
[0, 1, 0]])
x
Out[138]:
array([[ 1, -1, 2],
[ 2, 0, 0],
[ 0, 1, -1]])
SimpleImputer的用法见sklearn相关的documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
SimpleImputer fills value for each feature也即对每列的空值进行填充,故而和老版本(即将be deprecated in newer version)的Imputer相比,没有了axis=0(即逐行)的参数的位置。
import numpy as np
from sklearn.impute import SimpleImputer
#类似于填补模型imp造出来
imp=SimpleImputer(missing_values=np.nan,strategy='mean')
imp.fit([[1,2],[np.nan,3],[7,6]])
Out[163]:
SimpleImputer(copy=True, fill_value=None, missing_values=nan, strategy=‘mean’,
verbose=0)
x=[[np.nan,2],[6,np.nan],[7,6]]
#用imp模型填补并且转换x
print(imp.transform(x))
[[4. 2. ]
[6. 3.66666667]
[7. 6. ]]
fit_transform与transform的区别见博文:
https://blog.csdn.net/quiet_girl/article/details/72517053