机器学习的展望_展望自动化机器学习的未来

机器学习的展望

I recently gave a presentation at Venture Cafe describing how I see automation changing python, machine-learning workflows in the near future.

我最近在Venture Cafe上做了一个演讲 ,描述了我如何看到自动化在不久的将来改变python,机器学习工作流程。

In this post, I highlight the presentation’s main points. You can find the slides here.

在这篇文章中,我重点介绍了演示文稿的要点。 您可以在此处找到幻灯片。

From Ray Kurzweil’s excitement about a technological singularity to Elon Musk’s warnings about an A.I. Apocalypse, automated machine-learning evokes strong feelings. Neither of these futures will be true in the near-term, but where will automation fit in your machine-learning workflows?

雷·库兹韦尔(Ray Kurzweil)技术奇异 之处的兴奋到埃隆·马斯克(Elon Musk)关于AI Apocalypse的警告,自动化的机器学习都引起了强烈的共鸣。 这些未来都不是短期的,但是自动化在您的机器学习工作流程中将适合什么呢?

Our existing machine-learning workflows might look a little like the following (please forgive the drastic oversimplification of a purely sequential progression across stages!).

我们现有的机器学习工作流程可能看起来类似于以下内容(请原谅过分简化了纯连续顺序的跨阶段!)。

Where does automation exist in this workflow? Where can automation improve this workflow?

该工作流程在哪里存在自动化? 自动化在哪里可以改善此工作流程?

Not all these stages are within the scope of machine-learning. For instance, while you should automate gathering data, I view this as a data engineering problem. In the image below, I depict the stages that I consider ripe for automation, and the stages I consider wrong for automation. For example, data cleaning is too idiosyncratic to each dataset for true automation. I “X” out model evaluation as wrong for automation. In retrospect, I believe this is a great place for automation, but I don’t know of any existing python packages handling it.

并非所有这些阶段都在机器学习的范围之内。 例如,虽然您应该自动收集数据,但我将其视为数据工程问题。 在下图中,我描述了自动化的成熟阶段和自动化的错误阶段。 例如,对于真正的自动化而言,数据清理对于每个数据集来说都太特殊了。 我“ X”出模型评估对自动化是错误的。 回想起来,我相信这是自动化的好地方,但是我不知道有任何现有的处理Python的软件包。

I depict feature engineering and model selection as the most promising areas for automation. I consider feature engineering as the stage where advances in automation can have the largest impact on your model performance. In the presentation, I include a strong quote from a Quora user saying that hyper-parameter tuning (a part of model selection) “hardly matters at all.” I agree with the sentiment of this quote, but it’s not true. Choosing roughly the correct hyper-parameter values is VERY important, and choosing the very best hyper-parameter values can be equally important depending on how your model is used. I highlight feature engineering over model selection because automated model selection is largely solved. For example grid-search automates model selection. It’s not a fast solution, but given infinite time, it will find the best hyper-parameter values!

我将特征工程和模型选择描述为最有希望的自动化领域。 我认为要素工程是自动化进步对模型性能产生最大影响的阶段。 在演示中,我引用了Quora用户的一句话,说超参数调整 (模型选择的一部分)“根本不重要”。 我同意这句话的观点,但事实并非如此。 大致选择正确的超参数值非常重要,根据模型的使用方式,选择最佳的超参数值也同样重要。 我强调了特征工程而不是模型选择,因为自动模型选择已在很大程度上解决了。 例如, 网格搜索可自动执行模型选择。 这不是一个快速的解决方案,但是给定无限的时间,它将找到最佳的超参数值!

There are many python libraries automating these parts of the workflow. I highlight three libraries that automate feature engineering.

有许多python库可自动执行工作流程的这些部分。 我重点介绍了三个可自动执行要素工程的库。

The first is teapot. Teapot (more or less) takes all the different operations and models available in scikit-learn, and allows you to assemble these operations into a pipeline. Many of these operations (e.g., PCA) are forms of feature engineering. Teapot measures which operations lead to the best model performance. Because Teapot enables users to assemble SO MANY different operations, it utilizes a genetic search algorithm to search through the different possibilities more efficiently than grid-search would.

首先是茶壶 。 Teapot(或多或少)采用了scikit-learn中可用的所有不同操作和模型,并允许您将这些操作组装到管道中。 其中许多操作(例如PCA )是要素工程的形式。 茶壶可以衡量哪些操作可以带来最佳的模型性能。 因为Teapot使用户能够组合很多不同的操作,所以它使用遗传搜索算法比网格搜索更有效地搜索各种可能性。

The second is auto_ml. In auto_ml users simply pass a dataset to the software and it will do model selection and hyper-parameter tuning for you. Users can also ask the software to train a deep learning model that will learn new features from your dataset. The authors claim this approach can improve model accuracy by 5%.

第二个是auto_ml 。 在auto_ml中,用户只需将数据集传递给软件,它将为您进行模型选择和超参数调整。 用户还可以要求该软件训练深度学习模型,该模型将从数据集中学习新功能 。 作者声称这种方法可以将模型准确性提高5%。

The third is feature tools. Feature Tools is the piece of automation software whose future I am most excited about. I find this software exciting because users can feed it pre-aggregated data. Most machine-learning models expect that for each value of the response variable, you supply a vector of explanatory variables. This is an example of aggregated data. Teapot and auto_ml both expect users to supply aggregated data. Lots of important information is lost in the aggregation process, and allowing automation to thoroughly explore different aggregations will lead to predictive features that we would not have created otherwise (any many believe this is why deep learning is so effective). Feature tools explores different aggregations all while creating easily interpreted variables (in contrast to deep learning). While I am excited about the future of feature tools, it is a new piece of software and has a ways to go before I use it in my workflows. Like most automation machine-learning software it’s very slow/resource intensive. Also, the software is not very intuitive. That said, I created a binder notebook demoing feature tools, so check it out yourself!

第三是功能工具 。 Feature Tools是自动化软件,我对它的未来感到最兴奋。 我发现此软件令人兴奋,因为用户可以将其预先汇总的数据提供给它。 大多数机器学习模型都希望为响应变量的每个值提供一个解释变量向量。 这是聚合数据的示例。 Teapot和auto_ml都希望用户提供汇总数据。 聚合过程中丢失了许多重要信息,允许自动化彻底探索不同的聚合将导致我们无法创建的预测功能(许多人认为这就是深度学习如此有效的原因)。 功能工具在创建易于解释的变量的同时探索了不同的聚合(与深度学习相反)。 尽管我对功能工具的未来感到很兴奋,但它是一款新软件,并且在我将其用于工作流之前还有一段路要走。 像大多数自动化机器学习软件一样,它非常慢/资源密集。 另外,该软件不是很直观。 也就是说,我创建了活页夹笔记本演示功能工具,所以请自己检查!

翻译自: https://www.pybloggers.com/2018/11/looking-towards-the-future-of-automated-machine-learning/

机器学习的展望

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值