以下是基于Python_sklearn来实现的:
No1.标准化(Standardization or Mean Removal and Variance Scaling)
变换后各维特征有0均值,单位方差。也叫z-score规范化(零均值规范化)。计算方式是将特征值减去均值,除以标准差。
>>> from sklearn import preprocessing
>>> X=[[1.,-1.,2.],
[2.,0.,0.],
[0.,1.,-1.]]
>>> X_scaled = preprocessing.scale(X)
>>> X_scaled
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])
scale处理之后为零均值和单位方差:
>>> X_scaled.mean(axis=0)
array([ 0., 0., 0.])
>>> X_scaled.std(axis=0)
array([ 1., 1., 1.])
同样我们也可以通过preprocessing模块提供的StandardScaler 工具类来实现这个功能:
>>> scaler = preprocessing.StandardScaler().fit(X)
>>> scaler
StandardScaler(copy=True, with_mean=True, with_std=True)
>>> scaler.mean_
array([ 1. , 0. , 0.33333333])
>>> scaler.std_
array([ 0.81649658, 0.81649658, 1.24721913])
>>> scaler.transform(X)
array([[ 0. , -1.22474487, 1.33630621],
[ 1.22474487, 0. , -0.26726124],
[-1.22474487, 1.22474487, -1.06904497]])
简单来说,我们一般会把train和test集放在一起做标准化,或者在train集上做标准化后,用同样的标准化器去标准化test集,此时可以用scaler :
>>> scaler = sklearn.preprocessing.StandardScaler().fit(train)
>>> scaler.transform(train)
>>> scaler.transform(test)
No2.最小-最大规范化(Scaling features to a range&#x