[机器学习] AutoML --- 介绍

最新推荐文章于 2024-07-04 17:10:12 发布

舒克与贝克

最新推荐文章于 2024-07-04 17:10:12 发布

阅读量1.6k

点赞数

分类专栏：机器学习文章标签： AutoML 机器学习

机器学习专栏收录该内容

85 篇文章 62 订阅

订阅专栏

What is AutoML?

https://www.automl.org/

Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.

Machine learning (ML) has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human machine learning experts to perform the following tasks:

Preprocess and clean the data.
Select and construct appropriate features.
Select an appropriate model family.
Optimize model hyperparameters.
Postprocess machine learning models.
Critically analyze the results obtained.

As the complexity of these tasks is often beyond non-ML-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.

Examples of AutoML

Research in Automated Machine Learning is very diverse and brought up packages and methods targeted at both researchers and end users.

AutoML systems

Throughout recent years several off-the-shelf packages have been developed which provide automated machine learning. While there are more packages than the one listed below, we restrict ourselves to a subset of the most well-known ones:

AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets.
Auto-sklearn is an extension of AutoWEKA using the Python library scikit-learn which is a drop-in replacement for regular scikit-learn classifiers and regressors; it improves over AutoWEKA by using meta-learning to increase search efficiency and post-hoc ensemble building to combine the models generated during the hyperparameter optimization process.
TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming.
H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform.
Google CLOUD AUTOML is an could-based machine learning service which so far provides the automated generation of computer vision pipelines.

AutoML to advance and improve research

Making a science of model search argues that the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning and that it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. To improve the situation, Bergstra et al. propose reporting results obtained by tuning all algorithms with the same hyperparameter optimization toolkit. Sculley et al.’s recent ICLR workshop paper Winner’s Curse argues in the same direction and gives recent examples in which correct hyperperameter optimization of baselines improved over the latest state-of-the-art results and newly proposed methods.
Hyperparameter optimization and algorithm configuration provide methods to automate the tedious, time-consuming and error-prone process of tuning hyperparameters to new tasks at hand and provide software packages implement the suggestion from Bergstra et al.’s Making a science of model search. These include:
- Hyperopt, including the TPE algorithm
- Sequential Model-based Algorithm Configuration (SMAC)
- Spearmint
We also provide two packages for hyperparameter optimization:
- RoBO – Robust Bayesian Optimization framework
- SMAC3 – a python re-implementation of the SMAC algorithm