1.class sklearn.preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)
通过将每个特征缩放到给定范围来。
该估计器(estimator)单独地将每个特征缩放和转换,使数值落在给定的范围内,例如, 介于0和1之间。
MinMaxScaler类的参数有:
feature_range : tuple (min, max), default=(0, 1)
copy : boolean, optional, default True Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
MinMaxScaler类的属性有:
data_min_ : ndarray, shape (n_features,) Per feature minimum seen in the data
data_max_ : ndarray, shape (n_features,) Per feature maximum seen in the data
data_range_ : ndarray, shape (n_features,) Per feature range (data_max_ - data_min_) seen in the data
from sklearn.preprocessing import MinMaxScaler
data = [[1, 4, 2], [18, -1, 2], [4, 7, 8], [-4, 2, 10]]
scaler = MinMaxScaler()
print(scaler.fit(data))
print('----------------')
#返回每一列的最大值
print(scaler.data_max_)
print('----------------')
#将每列进行归一化
print(scaler.transform(data))
print('----------------')
#将训练好的模型用于新的数据上([2, 2,2]),所谓训练好的模型指上面已获得每个特征的最大最小值
print(scaler.transform([[2, 2, 2]]))
输出:
MinMaxScaler(copy=True, feature_range=(0, 1))
----------------
[18. 7. 10.]
----------------
[[0.22727273 0.625 0. ]
[1. 0. 0. ]
[0.36363636 1. 0.75 ]
[0. 0.375 1. ]]
----------------
[[0.27272727 0.375 0. ]]
scikit-learn官方文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler
2.class sklearn.preprocessing.StandardScaler(copy=True, with_mean=True, with_std=True)
Standardize features by removing the mean and scaling to unit variance
将特征值的分布转化为标准正态分布。
The standard score of a sample x is calculated as:
z = (x - u) / s
where u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
参数:
with_mean : boolean, True by default
If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std : boolean, True by default
If True, scale the data to unit variance (or equivalently, unit standard deviation).
copy : boolean, optional, default True
If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
属性:
n_samples_seen__ : int The number of samples processed by the estimator. Will be reset on new calls to fit, but increments across partial_fit calls. 样本数量,可以通过patial_fit 增加
mean_ : array of floats with shape [n_features] The mean value for each feature in the training set. 每个特征的平均值
var_ : array of floats with shape [nfeatures] The variance for each feature in the training set. Used to compute scale 每个特征的方差
scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. 缩放比例,同时也是标准差
from sklearn.preprocessing import StandardScaler
import numpy as np
x=np.arange(10).reshape(5,2)
ss=StandardScaler()
ss.fit(x)
print(x)
print('----------------------')
print(ss.n_samples_seen_ )
print('----------------------')
print(ss.mean_) #每个特征的平均值
print('----------------------')
print(ss.var_) #每个特征的方差
print('----------------------')
print(ss.scale_)
x=ss.fit_transform(x)
print(x)
输出:
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
----------------------
5
----------------------
[4. 5.]
----------------------
[8. 8.]
----------------------
[2.82842712 2.82842712]
[[-1.41421356 -1.41421356]
[-0.70710678 -0.70710678]
[ 0. 0. ]
[ 0.70710678 0.70710678]
[ 1.41421356 1.41421356]]
scikit-learn官方文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler