Feature Extractors(特征提取)
TF
TF-IDF
Word2Vec
CountVectorizer
Feature Transformers(特征变换)
Tokenizer(分词器)
StopWordsRemover(停用字清除)
n-gram
Binarizer(二元化方法)
PCA(主成成分分析)
PolynomialExpansion(多项式扩展)
Discrete Cosine Transform (DCT-离散余弦变换)
StringIndexer(字符串-索引变换)
IndexToString(索引-字符串变换)
OneHotEncoder(独热编码)
VectorIndexer(向量类型索引化)
Normalizer(范数p-norm规范化)
StandardScaler
MinMaxScaler(最大-最小规范化)
MaxAbsScaler(绝对值规范化)
Bucketizer(分箱器)
ElementwiseProduct (Hadamard乘积)
SQLTransformer(SQL变换)
VectorAssembler(特征向量合并)
QuantileDiscretizer(分位数离散化)
Feature Selectors(特征选择)
VectorSlicer(向量选择)
RFormula(R模型公式)
ChiSqSelector(卡方特征选择)
参考资料:
https://vimsky.com/article/2049.html#Extracting,transformingandselectingfeatures-FeatureExtractors