数据预处理的几种方法
1. 标准化,也称去均值、按方差比例缩放
import numpy as np
from sklearn import preprocessing
data = np.array([[ 3, -1.5, 2, -5.4],
[ 0, 4, -0.3, 2.1],
[ 1, 3.3, -1.9, -4.3]])
# mean removal
data_standardized = preprocessing.scale(data)
print( "\nMean =", data_standardized.mean(axis=0))
print( "Std deviation =", data_standardized.std(axis=0))
输出结果是:
Mean = [ 5.55111512e-17 -1.11022302e-16 -7.40148683e-17 -7.40148683e-17]
Std deviation = [1. 1. 1. 1.]
Mean 用的是科学计数法,其实,Mean 的值都是零:
print('{:.20f}'.format(Mean[0]))
print('{:.20f}'.format(Mean[1]))
print('{:.20f}'.format(Mean[2]))
print('{:.20f}'.format(Mean[3]))
输出结果是: