Automated(AutoML) Machine Learning 探索: TPOT文档阅读

最新推荐文章于 2024-05-15 07:31:48 发布

小帅的私人空间

最新推荐文章于 2024-05-15 07:31:48 发布

阅读量1.6k

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/joshuajinxiaoshuai/article/details/80400203

版权

机器学习专栏收录该内容

33 篇文章 0 订阅

订阅专栏

http://epistasislab.github.io/tpot
花了半天时间探索自动机器学习工具包，主要探索了tpot，其他很著名的还有suto sklearn， datarobot（付费），还有基于java和图形界面的Auto-WEKA。更多见这里：
https://www.evget.com/article/2017/10/30/27128.html

概述：
- 采用遗传算法，genetic programming generation
- 相比于auto sklearn（mac安装还没成功哈哈），基于贝叶斯优化
怎么进行模型评价（通常我们需要多元的评估而不是单一标准）？
- 使用交叉验证sklearn.model_selection.cross_val_score
- 可以选择或者自定义评分方法，tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, scoring=sklearn.metrics.auc)
几个版本
- 标准版本Default TPOT
- light版本 tpot light：不会搜索所有，会推荐最简便和快速的版本
- TPOT MDR：TPOT will search over a series of feature selectors and Multifactor Dimensionality Reduction models to find a series of operators that maximize prediction accuracy. The TPOT MDR configuration is specialized for genome-wide association studies (GWAS), and is described in detail online here. 专门为基因序列任务
- TPOT sparse：专门应对稀疏矩阵
API用法特别之处
- 前面几个都是遗传算法的参数，
- scoring: ‘accuracy’, ‘adjusted_rand_score’, ‘average_precision’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_samples’, ‘f1_weighted’, ‘neg_log_loss’,’precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_samples’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_samples’, ‘recall_weighted’, ‘roc_auc’ 当然也可以自己写
- n_jobs: 支持并行处理
- max_eval_time_mins:还可以限制总时间，防止太多代时间太久了哈哈哈哈
- random_state： random seed
- config_dict：版本，我发现light很好用，Possible inputs are:

Python dictionary, TPOT will use your custom configuration,
    string 'TPOT light', TPOT will use a built-in configuration with only fast models and preprocessors, or
    string 'TPOT MDR', TPOT will use a built-in configuration specialized for genomic studies, or
    string 'TPOT sparse': TPOT will use a configuration dictionary with a one-hot encoder and the operators normally included in TPOT that also support sparse matrices, or
    None, TPOT will use the default TPOTClassifier configuration.

Attributes

fitted_pipeline_：输出结果
pareto_front_fitted_pipelines_
evaluated_individuals_

Functions:

fit(features, classes[, sample_weight, groups]) Run the TPOT optimization process on the given training data.
predict(features)   Use the optimized pipeline to predict the classes for a feature set.
predict_proba(features) Use the optimized pipeline to estimate the class probabilities for a feature set.
score(testing_features, testing_classes)    Returns the optimized pipeline's score on the given testing data using the user-specified scoring function.
export(output_file_name)    Export the optimized pipeline as Python code.

案例：见网站

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
                      config_dict='TPOT light' )
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')

print(time.time()-s)

小帅的私人空间

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Automated(AutoML) Machine Learning 探索: TPOT文档阅读

http://epistasislab.github.io/tpot 花了半天时间探索自动机器学习工具包，主要探索了tpot，其他很著名的还有suto sklearn， datarobot（付费），还有基于java和图形界面的Auto-WEKA。更多见这里： https://www.evget.com/article/2017/10/30/27128.html概述：采用遗传算法，g...
复制链接

扫一扫

专栏目录