【autosklearn 小系列】pipeline模块实现

最新推荐文章于 2024-06-17 09:47:08 发布

weixin_33962621

最新推荐文章于 2024-06-17 09:47:08 发布

阅读量200

点赞数

文章标签：人工智能 python shell

原文链接：https://my.oschina.net/kakablue/blog/3045634

版权

2019独角兽企业重金招聘Python工程师标准>>>

代码包

来自：sklearn.pipeline

简述

不太像linux/编程语言中的pipeline概念，更像shell中的管道
实现的代码并不优雅
	有大量基于以下定义的过滤
`
     classifiers_ = ["adaboost", "decision_tree", "extra_trees",
                    "gradient_boosting", "k_nearest_neighbors",
                    "libsvm_svc", "random_forest", "gaussian_nb",
                    "decision_tree", "xgradient_boosting"]
    feature_learning = ["kitchen_sinks", "kernel_pca", "nystroem_sampler"]
    for c, f in product(classifiers_, feature_learning):
        if c not in classifiers:
            continue
        if f not in preprocessors:
            continue
		...
`

内容

建立多个 estimator 结合体, 包括 transforms 和 estimator
实际是一系列 steps 简单循环处理的过程
  	- steps参数传递格式：
		key:value = step名称:step对象
		- step名称：一个算法的名称
		- step对象：一个算法的实现

形式

一系列 transforms ，并以一个 estimator 结尾(estimator可以是任意类型：transformer，classifier，regresser)
    - transforms需实现方法： fit()、transform()
        - 可用 memory 参数缓存
    - estimator需实现：fit()
estimator的类型，决定了这个pipeline的类型

目的：

- 交叉验证：通过赋予不同参数，对比验证
    - 通过格式约定，来指定不同参数

遗留疑问：

并没有看到较多提升效率相关的实现，只是减少了封装和多次来回调用的消耗？
	- 有 transformer/estimator 缓存，但效率重点不在于此
估计更注重对调用者表现形式良好吧

具体代码

pipeline/pipeline.py
    - fit_transform()：
        调用steps最后的 estimator
    - fit()
        --> self._fit()：执行transforms
        --> self._final_estimator.fit：执行estimator

转载于:https://my.oschina.net/kakablue/blog/3045634