一. 数据的标准化与归一化(zero-mean normalization): class sklearn.preprocessing.StandardScaler(*, copy=True, with_mean=True, with_std=True)
-
standard score(z) of a sample x: z = (x - u) / s
u: the mean of training samples (u = 0 if with_mean = False)
s: the standard deviation of the training samples (s = 1 if with_std = False)
-
Parameters and Attributes:
例子:
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit(data))
output: StandardScaler()
print(scaler.mean_)
print(scaler.var_)
output:
array([0.5, 0.5])
array([0.25, 0.25])
其中scaler.fit(data),即StandardScaler.fit(data)计算出数据的平均值和标准差,并存储在StandardScaler()中便于之后的使用;
调用attributes中的mean_和var_求数据的平均值和方差.
除了fit()之外,StandardScaler()还有许多不同的methods:
- Popular Methods:
- fit(): compute the mean and std to be used for later scaling
- fit_transform(): fit to data, then transform it