机器学习学习吴恩达实验二_使用神圣的方法创造可重复的机器学习实验

机器学习学习吴恩达实验二

Every experiment is sacredEvery experiment is greatIf an experiment is wastedGod gets quite irate

每个实验都是神圣的 每个实验都是伟大的 如果浪费实验, 上帝会很生气

Sacred lets you configure, organize, log and reproduce experiments. It was designed for ML experiments specifically, but can actually be used for any kind of experiment.

Sacred可让您配置,组织,记录和复制实验。 它是专门为ML实验而设计的,但实际上可用于任何类型的实验。

To give an example of how to use this powerful framework, I am going to use the dataset from a Kaggle competition, Real or Not? NLP with Disaster Tweets. This competition is a binary classification problem where you are supposed to decide whether a tweet is describing an actual disaster or not. Here are two examples:

为了举例说明如何使用此功能强大的框架,我将使用Kaggle竞赛中的数据集, 无论是真实的还是非真实的? NLP与灾难鸣叫 。 此竞赛是一个二进制分类问题,您应在其中确定一条推文是否描述了实际的灾难。 这是两个示例:

Real disaster tweet:

真实的灾难鸣叫:

Forest fire near La Ronge Sask. Canada

Not a disaster tweet:

不是灾难性的推文:

I love fruits

Sooner or later, the Data Scientist will notice that the performance of the models heavily depend on specific configurations and countless modifications of the data.

数据科学家迟早会注意到,模型的性能在很大程度上取决于特定的配置和对数据的无数修改。

Let’s say that we want to run some experiments where we build a model to classify these tweets and measure the classifier’s F1-score using k-fold cross-validation. Most data scientists would probably fire up a Jupyter notebook and start to explore the data (which indeed is always the right thing to do, btw), run some ad-hoc experiments, build and evaluate models. Sooner or later, the Data Scientist will notice that the performance of the models heavily depend on specific configurations and countless modifications of the data. This is where the power of reproducibility starts to pay off.

假设我们要进行一些实验,在此过程中,我们将建立一个模型来对这些推文进行分类,并使用k折交叉验证来衡量分类器的F1得分 。 大多数数据科学家可能会启动Jupyter笔记本并开始浏览数据(顺便说一句,这确实是正确的选择),进行一些临时实验,建立和评估模型。 数据科学家迟早会注意到,模型的性能在很大程度上取决于特定的配置和对数据的无数修改。 在这里,可再现性的力量开始得到回报。

为什么神圣? (Why Sacred?)

The following are the main features and advantages of using Sacred:

以下是使用Sacred的主要功能和优点:

  • Easily define and encapsulate the configuration of each experiment

    轻松定义封装每个实验的配置

  • Automatically collect metadata of each run

    自动收集每次运行的元数据

  • Log custom metrics

    记录自定义指标

  • Collect logs in various places using observers

    使用观察员在各个地方收集日志

  • Ensure deterministic runs with automatic seeding

    通过自动播种确保确定性运行

如何建立神圣的实验 (How to set up a Sacred Experiment)

We start off by creating a base experiment in Sacred as follow:

我们首先在Sacred中创建一个基础实验,如下所示:

logreg_experiment = Experiment(‘logreg’)

A Sacred experiment is defined by a configuration, so let’s create one:

一个神圣的实验是由配置定义的,因此让我们创建一个:

@logreg_experiment.config
def baseline_config():
max_features = None
classifier = Pipeline([
(‘tfidf’, TfidfVectorizer(max_features=max_features)),
(‘clf’, LogisticRegression())
])

Notice that the config attribute of the experiment object is used as a function decorator. This enables Sacred to automatically detect that the function should be used to configure the experiment.

请注意,实验对象的config属性用作函数装饰器。 这使Sacred能够自动检测到应使用该功能来配置实验。

This very simple config defines a scikit-learn pipeline with two steps: compute the TF-IDF representation of all tweets and then classify them using Logistic Regression. I added a variable for one of the hyper parameters, max_features, to showcase how you can easily create new experiments by modifying the config.

这个非常简单的配置通过两个步骤定义了scikit-learn管道 :计算所有tweet的TF-IDF表示形式,然后使用Logistic Regression对它们进行分类。 我为超级参数之一max_features添加了一个变量,以展示如何通过修改配置轻松创建新实验。

Now, before you can run this experiment, a main function must be defined:

现在,在您可以运行此实验之前,必须定义一个主要功能:

@logreg_experiment.automain
def main(classifier):
datadir = Path(‘../data’)
train_df = pd.read_csv(datadir / ‘train.csv’)
scores = cross_val_score(classifier, train_df[‘text’],
train_df[‘target’], cv=5, scoring=’f1')
mean_clf_score = scores.mean()
logreg_experiment.log_scalar(‘f1_score’, mean_clf_score)

As you can see, we once again use an attribute of the experiment object as a decorator, in this case automain. This lets the main function automatically access any variables defined within this experiment’s config. In this case, we only pass classifier which will be evaluated with respect to how well it can classify the Twitter data using 5-fold cross-validation on the training set. In the last line of code, the metric that we want to measure is logged using the log_scalar method.

如您所见,我们再次将实验对象的属性用作修饰符,在本例中为automain 。 这使主要功能可以自动访问此实验的配置中定义的所有变量。 在这种情况下,我们仅通过classifier ,该classifier器将使用训练集上的5倍交叉验证来评估其对Twitter数据的分类程度。 在代码的最后一行中,我们要测量的指标是使用log_scalar方法记录的。

运行实验 (Run the Experiment)

To run the experiment, simply call its run() method. To run it with different parameter values, you can conveniently pass a dict config_updates specifying the exact configuration for this experiment run. Pretty neat!

要运行实验,只需调用其run()方法。 要使用不同的参数值运行它,您可以方便地传递dict config_updates来指定此实验运行的确切配置。 漂亮整齐!

# Run with default values
logreg_experiment.run()# Run with config updates
logreg_experiment.run(config_updates={‘max_features’: 1000})

I usually put the experiments themselves in different files, and then have a separate script which runs all of the experiments at once.

我通常将实验本身放在不同的文件中,然后有一个单独的脚本同时运行所有实验。

记录结果 (Log your results)

If you run the above, you will not see a lot of results. You first need to attach an observer to the experiment. The observer will then send the logs to some destination, usually a database. For local and non-production usage, you can use the FileStorageObserver to simply write to disk.

如果执行上述操作,则不会看到很多结果。 您首先需要将观察者附加到实验中。 然后, 观察者日志发送到某个目的地 ,通常是数据库 。 对于本地和非生产用途,可以使用FileStorageObserver轻松写入磁盘。

logreg_experiment.observers.append(FileStorageObserver(‘logreg’))

If you include this line in the runner script above and run it, a new folder logreg is created with one sub-folder per run. One for the default run, and one with the updated max_features value. Each has created four separate files, with the following content:

如果将此行包含在上面的运行器脚本中并运行, logreg创建一个新的文件夹logreg ,每次运行一个子文件夹。 一种用于默认运行,另一种具有更新的max_features值。 每个文件都创建了四个单独的文件,其内容如下:

  • config.json: The state of each object in the configuration, and the seed parameter which is automatically used in all non-deterministic functions to ensure reproducibility.

    config.json配置中每个对象的状态,以及在所有不确定性函数中自动使用的seed参数,以确保可重复性

  • cout.txt: All standard output produced during the run.

    cout.txt :运行期间产生的所有标准输出

  • metrics.json: Custom metrics that were logged during the run, e.g. the F1-score in our case.

    metrics.json :运行期间记录的自定义指标 ,例如本例中的F1得分。

  • run.json: Metadata e.g. about the source code (git repo, files, dependencies, etc.), the running host, start/stop time, etc.

    run.json元数据,例如有关源代码(git repo,文件,依赖项等),正在运行的主机,开始/停止时间等的元数据

放在一起 (Putting it all together)

For the sake of completeness, I will create a final example to show how you can run multiple experiments from the same runner script:

为了完整起见,我将创建一个最终示例,以展示如何从同一个运行器脚本运行多个实验:

from pathlib import Path
import pandas as pd
from sacred import Experiment
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipelinerand_forest_experiment = Experiment(‘randforest’)
@rand_forest_experiment.config
def baseline_config():
n_estimators = 100
classifier = Pipeline([
(‘tfidf’, TfidfVectorizer()),
(‘clf’, RandomForestClassifier(n_estimators=n_estimators))
])@rand_forest_experiment.automain
def main(classifier):
datadir = Path(‘../data’)
train_df = pd.read_csv(datadir / ‘train.csv’)
scores = cross_val_score(classifier, train_df[‘text’],
train_df[‘target’], cv=5, scoring=’f1')
mean_clf_score = scores.mean()
rand_forest_experiment.log_scalar(‘f1_score’, mean_clf_score)

Now, let’s run both experiments with some config updates…

现在,让我们运行两个配置更新的实验...

from sacred.observers import FileStorageObserver
from experiments.logreg import logreg_experiment
from experiments.randforest import rand_forest_experimentlogreg_experiment.observers.append(FileStorageObserver(‘logreg’))
rand_forest_experiment.observers.append(FileStorageObserver(‘randforest’))# Run with default values
logreg_experiment.run()# Run with config updates
logreg_experiment.run(config_updates={‘max_features’: 1000})# Run different experiment
rand_forest_experiment.run()
rand_forest_experiment.run(config_updates={‘n_estimators’: 500})

By looking at the metrics.json file of each run, we can conclude that the default logistic regression model was the best performing, with an F1-score of ~0.66, while the random forest with 100 estimators was the worst one, with an F1-score of ~0.53.

通过查看每次运行的metrics.json文件,我们可以得出结论,默认逻辑回归模型的效果最佳,F1得分约为0.66,而具有100个估计量的随机森林则是最差的,其F1 -得分约为0.53。

Of course, all of that json-formatted output is not very appealing to look at, but there are several visualization tools you can use with Sacred. This is however outside the scope of this article, but do have a look here: https://github.com/IDSIA/sacred#Frontends

当然,所有这些json格式的输出看起来都不是很吸引人,但是您可以在Sacred中使用多种可视化工具 。 但是,这超出了本文的范围,但请在此处查看: https : //github.com/IDSIA/sacred#Frontends

Experiment safely!

安全实验!

This article is part of a series on best practices when building and designing machine learning systems. Read the first part here: https://medium.com/analytics-vidhya/how-to-get-data-science-to-truly-work-in-production-bed80e6bcfee

本文是构建和设计机器学习系统时有关最佳实践的系列文章的一部分。 在此处阅读第一部分: https : //medium.com/analytics-vidhya/how-to-get-data-science-to-truly-work-in-production-bed80e6bcfee

翻译自: https://medium.com/analytics-vidhya/create-reproducible-machine-learning-experiments-using-sacred-f8176ea3d42d

机器学习学习吴恩达实验二

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值