- 模块preprocessing:几乎包含数据预处理的所有内容
- 模块Impute:填补缺失值专用
- 模块feature_selection:包含特征选择的各种方法的实践
- 模块decomposition:包含降维算法
preprocessing.MinMaxScaler [0,1]”归一化“
preprocessing.StandardScaler 处理后标准正态分布 “标准化”
- 会选择StandardScaler来进行特征缩放,因为MinMaxScaler对异常值非常敏感。
- MinMaxScaler在不涉及距离度量、梯度、协方差计算以及数据需要被压缩到特定区间时使用广泛,like Quantifying pixel intensity in digital image processing。
- 可以先使用MinMaxScaler来看看效果。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing as fch
housevalue = fch()
#Use California housing price data as an example for data just for standardization.
X = pd.DataFrame(housevalue.data)
y = housevalue.target
X.head(3)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
#from preprocessing import StandardScaler and Instantiate it.
X_std = scaler.fit_transform(X)
X_std = pd.DataFrame(X_std)
X_std.head(3)