①sklearn.preprocessing.Normalizer(norm=’l2’, copy=True)
norm:可以为l1、l2或max,默认为l2
若为l1时,样本各个特征值除以各个特征值的绝对值之和
若为l2时,样本各个特征值除以各个特征值的平方之和
若为max时,样本各个特征值除以样本中特征值最大的值
In [8]: from sklearn import preprocessing
...: X = [[ 1., -1., 2.], [ 2., 0., 0.],[ 0., 1., -1.]]
...: normalizer = preprocessing.Normalizer().fit(X)#fit does nothing
...: normalizer
...:
Out[8]: Normalizer(copy=True, norm='l2')
In [9]: normalizer.transform(X)
Out[9]:
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])
Normalizer估计器是无状态的,即此时的fit方法没有做任何事情
In [11]: import numpy as np
In [12]: X_test = np.array([[1,1,6],[2,3,5],[4,1,2]]).astype(float)
...: normalizer.transform(X_test)
...:
Out[12]:
array([[ 0.16222142, 0.16222142, 0.97332853],
[ 0.32444284, 0.48666426, 0.81110711],
[ 0.87287156, 0.21821789, 0.43643578]])
②preprocessing.normalize(X, norm='l2', axis=1, copy=True, return_norm=False)
利用normalize方法进行标准化In [13]: from sklearn import preprocessing
...: X = [[ 1., -1., 2.], [ 2., 0., 0.],[ 0., 1., -1.]]
...: X_normalized = preprocessing.normalize(X, norm='l2')
...: X_normalized
...:
Out[13]:
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])