Transformers for data prepocessing:
- DictVectorizer -- one-hot encoder
- StandardScaler -- scaled data has zero mean and unit variance. Noticed: if the object could be saved, then the same operation could be applied to test data
- MinMaxScaler, MaxAbsScaler : #parameters: afeature_range=(min, max)
- RobustScaler: #parameters: with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True
- QuantileTransformer and quantile_transform provide a non-parametric transformation based on the quantile function to map the data to a uniform distribution with values between 0 and 1:
- PowerTransformer: Power transforms are a family of parametric, monotonic transformations that aim to map data from any distribution to as close to a Gaussian distribution as possible in order to stabilize variance and minimize skewness. PowerTransformer currently provides two such power transformations, the Yeo-Johnson transform and the Box-Cox transform.
- Encoders: OrdinalEncoder, LabelEncoder, OneHotEncoder
- Discrimization: KBinsDiscretizer, Binarizer
- Nans Filling: SimilyImputer, MissingIndicator
- Generating polynomial features: PolynomialFeatures
- Dimensional reduction:
- sklearn.decomposition.PCA
- Random Projection.GaussianRandomProjection , SparseRandomProjection -- reduce dimensions
- FeatureAgglomeration
- Kernel functions