目录
前言
特征组合是将若干个transformer(转换器)对象组合成一个新的transformer,一个FeatureUnion对象接受输入一个transformer对象列表。
FeatureUnion
FeatureUnion 将列表中的transformer并行应用于数据,然后将多transform的结果横向连接。实现多个特征提取机制组合到一个transformer中,拼接成一个更大的特征向量矩阵。通常会生成比原始数据维度更大的数据集。
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import PolynomialFeatures
# 构建FeatureUnion,需要为每个transformer命名
fea_un = FeatureUnion([('pca', PCA(n_components=4)),
('poly',PolynomialFeatures(degree=3))])
#PolynomialFeatures(degree=3)将原始的8维变成了164维
#data 训练集特征集X
data_new = fea_un.fit_transform(data)
# 查看转换前后数据的维度
print(data.shape)
print(data_new.shape)
使用make_union构建FeatureUnion
from sklearn.pipeline import make_union
# make_union构建FeatureUnion,不需要为每个transformer命名
fea_un_make = make_union(PCA(n_components=4), PolynomialFeatures(degree=3))
# 查看每个transformer的名称
data_new=fea_un_make.fit_transform(data)
fea_un_make.transformer_list #输出transform的名称等
'''
[('pca',
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)),
('polynomialfeatures',
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False))]
'''
流水线中使用特征联合
pipeline=Pipeline([("fea",fea_un_make),("lr",LogisticRegression(random_state=10))])
param_grid={"fea__pca__n_components":[1,2,3,4,5,6],"lr__C":[0.01,0.1,0.2,0.5,1],"lr__class_weight":[None,"balanced"],
"lr__penalty":["l1","l2"]}
grid_search=GridSearchCV(estimator=pipeline,param_grid=param_grid,cv=5)
grid_search.fit(X_train,y_train)
grid_search.score(X_test,y_test)