hydra扫数据库弱口令_hydra cc简介强大的框架,可配置您的数据科学项目

hydra扫数据库弱口令

动机 (Motivation)

It is fun to play with different feature engineering methods and machine learning models, but you will most likely need to adjust your feature engineering methods and tuning your machine learning models before getting a good result.

与不同的特征工程方法和机器学习模型一起玩很有趣,但是您很可能需要调整特征工程方法并调整机器学习模型,然后才能获得良好的结果。

For example, in the speed dating data below, you might want to drop iid, id, idg, wave, career considering that they are not important features. But after doing more research about the data, you realize that career would be an important feature to predict whether two people would have the next date. So you decide not to dropcareer column.

例如,在下面的快速约会数据中,您可能想要删除iid, id, idg, wave, career ,因为它们不是重要功能。 但是,在对数据进行了更多研究之后,您意识到career将是预测两个人下一次约会的重要功能。 因此,您决定不删除career专栏。

If you are hard coding, which means to embed data directly into the source code of a script, like below

如果您要进行硬编码,则意味着将数据直接嵌入脚本的源代码中,如下所示

and your file is long, it might take a while for you to find the code that specifies which columns to drop. Wouldn’t it be great if you fix the columns from a simple text that solely contains information about the data without other python code like this instead?

并且文件很长,可能需要一段时间才能找到指定要删除的列的代码 。 如果您从仅包含有关数据信息的简单文本中修复列而不用其他类似这样的python代码,那不是很好吗?

This is when you need a configuration file.

这是您需要配置文件的时候。

什么是配置文件? (What is Configuration File?)

A configuration file contains plain text parameters that define settings running a program. It is a good practice to avoid hard-coding in your python scripts while keeping all information related to the data such as which columns to drop, categorical variables in your config file.

配置文件包含纯文本参数,这些纯文本参数定义了运行程序的设置。 最好避免在python脚本中进行硬编码 ,同时保留与数据相关的所有信息,例如要删除的列, 配置文件中的分类变量

This practice not only saves you from wasting time searching for a specific variable in your scripts but also make your scripts more reproducible.

这种做法不仅可以避免浪费时间在脚本中搜索特定变量,还可以使脚本更具可复制性。

For example, I could reuse this code for an entirely different data because there are no columns’ names specified in the code. To make the code work for the new data, all I need to fix is to change the columns’ names in my config file!

例如,我可以将此代码重用于完全不同的数据,因为在代码中没有指定列的名称。 为了使代码适用于新数据,我需要修复的就是更改配置文件中的列名!

A common language of a config file is YAML. YAML is a human-friendly data serialization standard for all programming languages. The syntax is easy to read and almost similar to Python. Find out more about YAML syntax here.

配置文件的通用语言是YAML。 YAML是适用于所有编程语言的人性化数据序列化标准。 语法易于阅读,几乎类似于Python。 在此处了解有关YAML语法的更多信息。

Hydra.cc简介 (Introduction to Hydra.cc)

I hope the short explanation above helps you somewhat understand the importance of a config file. But how do we go about accessing the parameters inside a config file?

我希望上面的简短说明可以帮助您在某种程度上了解配置文件的重要性。 但是,我们该如何访问配置文件中的参数?

There are some tools out there to read a config file such as PyYaml but my favorite one is Hydra.cc. Why? Because it allows me to:

有一些工具可以读取配置文件,例如PyYaml,但我最喜欢的是Hydra.cc 。 为什么? 因为它允许我:

  • Seamlessly change my default parameters in the terminal

    在终端中无缝更改我的默认参数
  • Switch between different config groups

    在不同的配置组之间切换
  • Automatically log the results

    自动记录结果

Let’s find out how to get started with hydra.cc and explore the benefits of using this powerful tool.

让我们找出如何开始使用hydra.cc,并探索使用此强大工具的好处。

开始使用 (Get Started)

Install Hydra.cc with

使用以下命令安装Hydra.cc

pip install hydra-core --upgrade

Let’s start with a concrete example:

让我们从一个具体的例子开始:

For example, if you have the config file like below with all the specific information about the data path, encoding, the kind of pipeline, and target column

例如,如果您有如下所示的配置文件,其中包含有关数据路径,编码,管道类型和目标列的所有特定信息

All you need to do is to add the decorator @hydra.main(config_path='path/to/config.yaml') to the function that will use the config file. Make sure to add config inside the function to get access to the config file.

您需要做的就是将装饰器@hydra.main(config_path='path/to/config.yaml')到将使用配置文件的函数中。 确保在函数内添加config以获得对配置文件的访问。

Now you are all set to use any parameters in your config file! If you want to get the name of the target,

现在您都可以使用配置文件中的任何参数了! 如果要获取目标名称,

target: match

all you need to do is to call config.target to get the string ‘match’!

您需要做的就是调用config.target以获取字符串“ match”!

Notice that you don’t need to put the quotation mark around the word ‘match’. The YAML file will consider it as a string if it is a word.

请注意,您无需在“匹配”一词旁加上引号。 如果是单词,则YAML文件会将其视为字符串。

简单命令行应用 (Simple Command Line Application)

Hydra.cc allows you to override your default parameters inside the config file in the terminal. For example, if you want to switch the machine learning model from decision tree to logistic regression

Hydra.cc允许您在终端的配置文件中覆盖默认参数。 例如,如果要将机器学习模型从决策树切换到逻辑回归

you don’t need to rewrite the config file. You could instead type in the terminal the alternative parameters of your variables when running a file

您无需重写配置文件。 您可以改为在终端中输入运行文件时变量的替代参数

python file.py model=logisticregression

And the model will be switched to logistic regression!

然后模型将切换为逻辑回归!

Better yet, if your config file is complex, Hydra.cc also allows you to access parameters in the file easier with the tab completion! You could find the details about tab completion here.

更好的是,如果您的配置文件很复杂,那么Hydra.cc还允许您通过制表符补全更轻松地访问文件中的参数! 您可以在此处找到有关制表符完成的详细信息。

在不同的配置组之间切换 (Switch Between Different Config Groups)

To keep your config files short and structured, you might want to create different files for different models along with their parameters such as this

为了使配置文件简短而结构化,您可能需要为不同的型号及其参数(例如,

Image for post

You can specify the config file of the model you want to train on the command line

您可以在命令行上指定要训练的模型的配置文件

python file.py model=logistic

Now you can switch between different models and get access to their hyperparameters effortlessly!

现在,您可以在不同的模型之间切换并轻松访问它们的超参数!

自动记录 (Automatic Logging)

Logging is important if you want to keep track of the results of your run. But many people don’t use Python logging because of the setup cost. Hydra.cc make it easy by automatically creating and saving all of your results in the folder ‘outputs’ based on the day

如果要跟踪运行结果,日志记录很重要。 但是由于设置成本,许多人不使用Python日志记录。 Hydra.cc通过自动创建所有结果并将其保存在“输出”文件夹中,从而简化操作过程

Image for post

Each day folder is organized based on hours and minutes. You will see all the logs associated with your runs as well as the config files you use for that run!

每天文件夹是根据小时和分钟来组织的。 您将看到与运行关联的所有日志以及用于该运行的配置文件!

Image for post

If you happened to change your config file and don’t remember how the config file you used to produce a certain output looks like, you can look at the folder that day to find out!

如果您碰巧更改了配置文件,并且不记得用来生成特定输出的配置文件的样子,则可以查看当天的文件夹以查找信息!

Find more about logging here.

在此处找到有关登录的更多信息。

Since you are in the ‘output’s directory when running the function that is wrapped around by the hydra decorator, make sure to use utils.to_absolute_path('path/to/file') if you want to get access to other files in the parent directory.

由于运行hydra装饰器包装的函数时您位于“输出”目录中,因此,如果要访问父目录中的其他文件,请确保使用utils.to_absolute_path('path/to/file')

Current working directory  : /Users/khuyentran/dev/hydra/outputs/2019-10-23/10-53-03Original working directory : /Users/khuyentran/dev/hydrato_absolute_path('foo')    : /Users/khuyentran/dev/hydra/footo_absolute_path('/foo')   : /foo

结论 (Conclusion)

Congratulations! You have learned about why the configuration file is important and how to seamlessly configure your data science projects. I find it much more organized when I have all information that is related to the data in a separate file. I also find it easier to experiment with different parameters when all I need to do is to call python file.py variable=new_value. I hope you also gain the same benefits by incorporating both the configuration files and Hydra.cc in your data science practice.

恭喜你! 您已经了解了为什么配置文件很重要以及如何无缝配置数据科学项目。 当我在单独的文件中拥有与数据相关的所有信息时,它会变得更有条理。 当我需要做的只是调用python file.py variable=new_value.时,我还发现更容易尝试不同的参数python file.py variable=new_value. 我希望您通过将配置文件和Hydra.cc并入数据科学实践中也能获得相同的好处。

Here is the example project that uses hydra.cc and config file.

这是使用hydra.cc和配置文件的示例项目

I like to write about basic data science concepts and play with different algorithms and data science tools. You could connect with me on LinkedIn and Twitter.

我喜欢写有关基本数据科学概念的文章,并喜欢使用不同的算法和数据科学工具。 您可以在LinkedInTwitter上与我联系。

Star this repo if you want to check out the codes for all of the articles I have written. Follow me on Medium to stay informed with my latest data science articles like these

如果您想查看我编写的所有文章的代码,请给此回购加注星号。 在Medium上关注我,以了解有关这些最新数据科学文章的最新信息

翻译自: https://towardsdatascience.com/introduction-to-hydra-cc-a-powerful-framework-to-configure-your-data-science-projects-ed65713a53c6

hydra扫数据库弱口令

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值