将数据预处理加入模型选择过程

最新推荐文章于 2024-03-11 11:10:05 发布

炼丹师666

最新推荐文章于 2024-03-11 11:10:05 发布

阅读量428

点赞数

分类专栏：机器学习案例算法

本文链接：https://blog.csdn.net/wj1298250240/article/details/103719777

版权

算法同时被 2 个专栏收录

101 篇文章 5 订阅

订阅专栏

机器学习案例

40 篇文章 3 订阅

订阅专栏

将数据预处理加入模型选择过程

# 将数据预处理加入模型选择过程
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# set random seed  设置随机数种子
np.random.seed(0)

# load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# create a preprocessing obejct taht includes StandardScaler features and PCA
# 创建预处理对象
preprocess = FeatureUnion([("std", StandardScaler()), ("pca", PCA())])

# create a pipeline  创建一个流水线
pipe = Pipeline([("preprocess", preprocess),
                 ("classifier", LogisticRegression())
                ])

# create a space of candidate values 创建候选值
search_space = [{
    "preprocess__pca__n_components": [1, 2, 3],
    "classifier__penalty": ["l1", "l2"],
    "classifier__C": np.logspace(0, 4, 10)
}]

# create grid search  执行网格搜索
clf = GridSearchCV(pipe, search_space, cv=5, verbose=0, n_jobs=-1)

# fit grid search   训练模型
best_model = clf.fit(features, target)

# view best model  查看最佳参数
best_model.best_estimator_.get_params()['preprocess__pca__n_components']
2