沃森 大题
Surprised! Pleasantly surprised, that is in this sixth install of m autoML series. While I used IBM’s data science cloud platform a few years ago, I have not had any exposure to Watson Studio before nor their AutoAI offering. I wasn’t sure what I was going to find, but I was pleased with what I saw. For someone who wants (needs) constant feedback of progress during the training runs, I loved the wealth of information provided. The accuracy isn’t as tuned as DataRobot or H2O Driverless AI, but that is to be expected based on the extreme difference in price (tens of thousands of dollars a year).
惊讶! 令人惊喜的是,这是m autoML系列的第六次安装。 几年前,虽然我使用IBM的数据科学云平台,但之前没有接触过Watson Studio,也没有接触过它们的AutoAI产品。 我不确定会找到什么,但对看到的内容感到满意。 对于需要(在训练过程中)不断获得进步反馈的人,我喜欢所提供的大量信息。 准确性不如DataRobot或H2O无人驾驶AI那样微调,但这是根据价格的极端差异(每年数万美元)来预期的。
为什么选择IBM Watson Studio AutoAI? (Why IBM Watson Studio AutoAI?)
Big Blue isn’t ‘just’ mainframes and SPSS. I used the IBM data science cloud notebook environment a few years ago. Watson has a storied history at IBM. While you aren’t running ‘on’ Watson, the reputation brings authority.
蓝色巨人不是“仅仅”大型机和SPSS。 几年前,我使用了IBM数据科学云笔记本环境。 沃森在IBM拥有悠久的历史。 当您不在沃森上运行时,声誉会带来权威。
成本 (The Cost)
I was able to run this experiment for FREE. The pricing tiers are very reasonable for the individual data scientist.
我能够免费运行此实验。 定价层对于单个数据科学家而言非常合理。
设置(The Setup)
“These capabilities are available as part of a fully managed starter set of Cloud Pak for Data services on the IBM Cloud. Provision the integrated Lite versions of Watson Studio and Watson Machine Learning for free today as part of Cloud Pak for Data as a Service.”
这些功能是IBM Cloud上完全托管的Cloud Pak数据服务入门套件的一部分。 作为Cloud Pak for Data as a Service的一部分,今天免费提供了Watson Studio和Watson Machine Learning的集成Lite版本。”
The link above will take you to a Try AutoAI on Watson Studio button. AutoAI is a hosted cloud solution, so the process of setting up a new project is pretty straightforward.
上面的链接将带您进入“在Watson Studio上尝试AutoAI”按钮。 AutoAI是托管的云解决方案,因此设置新项目的过程非常简单。
Once you are in Watson Studio, you add an AutoAI asset.
进入Watson Studio后,您将添加AutoAI资产。
From there, you can set up a new experiment.
从那里,您可以设置一个新实验。
数据(The Data)
To keep parity across the tools in this series, I will stick to the Kaggle training file. Contradictory, My Dear Watson. Detecting contradiction and entailment in the multilingual text using TPUs. In this Getting Started Competition, we’re classifying pairs of sentences (consisting of a premise and a hypothesis) into three categories — entailment, contradiction, or neutral.
为了使本系列中的工具保持一致,我将坚持使用Kaggle培训文件。 矛盾的,亲爱的沃森。 使用TPU检测多语言文本中的矛盾和牵连。 在本入门竞赛中,我们将成对的句子(由前提和假设组成)分为三类-蕴涵,矛盾或中立。
6 Columns x 13k+ rows — Stanford NLP documentation
6列x 13k +行— Stanford NLP文档
- idID
- premise 前提
- hypothesis假设
- lang_abvlang_abv
- language 语言
- label标签
加载数据(Loading the data)
It couldn’t be simpler.
这再简单不过了。
训练模型(Training your model)
The interface to configure and run your experiment is very minimal. There are some other options under Experiment Settings, but I wanted to run it just as basic as I could. Pick your label and hit Run Experiment.
用于配置和运行实验的界面非常小。 在“实验设置”下还有其他一些选项,但是我想尽可能基本地运行它。 选择您的标签,然后点击运行实验。
This is where it gets fun! There is an interactive visualization that allows you to see where in the process you are with the experiment. I love this! Leaderboards are also available for you to review during the training.
这就是它的乐趣! 有一个交互式的可视化文件,使您可以查看实验过程中的位置。 我喜欢这个! 排行榜也可在培训期间供您查看。
评估培训结果 (Evaluate Training Results)
There is a small indicator that the experiment has completed. I would have expected something more eye-catching, but I am happy it finished in a reasonable amount of time, 22 minutes.
有一个小的指标表明实验已完成。 我本来希望看到更多吸引眼球的东西,但我很高兴它在合理的时间内(22分钟)完成了。
The leaderboards provide information on accuracy and other success metrics as well as the model type and the enhancements make (such as feature engineering). I didn’t see a wide variety of models attempted, thus the short training time.
排行榜提供有关准确性和其他成功指标以及模型类型和增强功能(例如功能工程)的信息。 我没有尝试过各种各样的模型,因此训练时间很短。
For Pipeline 3, I looked at the engineered features. You have to look into the details because ‘NewFeature_2’ isn’t very descriptive.
对于管道3,我研究了设计功能。 您必须仔细研究细节,因为“ NewFeature_2”描述性不是很高。
结论(Conclusions)
Of interest, after the training, I got a popup that introducing feature engineering on multiple datasets. Definitely, something to be investigated! If they can identify relationships between datasets and create new features, that would be amazing.
有趣的是,在培训之后,我弹出了一个介绍多个数据集的特征工程的弹出窗口。 绝对是要研究的东西! 如果他们能够识别数据集之间的关系并创建新功能,那就太好了。
Overall, I enjoyed the Watson AutoAI experience itself. The process ran FAST (22 minutes versus H2O Driverless AI’s 4+hours), but at the expense of model variety and accuracy right out of the box. Additional experimental setup would be needed. But for the $78k lower price than DataRobot and H2O Driverless AI, that might be an acceptable trade-off.
总体而言,我很享受Watson AutoAI的体验。 该过程运行速度非常快(22分钟,而H2O无人驾驶AI则需要4个小时以上),但是却以开箱即用的模型多样性和准确性为代价。 需要其他实验设置。 但是,相对于DataRobot和H2O无人驾驶AI而言,其价格低78,000美元,这可能是一个可以接受的折衷方案。
I would encourage you to consider trying AutoAI. This IBM offering is a reasonably-priced autoML tool.
我鼓励您考虑尝试使用AutoAI。 该IBM产品是价格合理的autoML工具。
翻译自: https://towardsdatascience.com/watson-autoai-7fb1e82471ea
沃森 大题