数据科学工具_数据科学项目的基本软件工具

数据科学工具

In this article, we will look into setting some of the tools to allow for a more reproducible and collaborative workflow. This guide will be helpful for people starting on their first ML project, or for data science veterans setting their MacBook again after, for example, an accidental coffee spill ☕️ 🤦‍♂️. Given this article is being drafted on Friday, a beer spillage is far more likely, however.

在本文中,我们将研究设置一些工具,以实现更具可重复性和协作性的工作流程。 该指南将对开始第一个ML项目的人们或数据科学资深人士(例如因咖啡意外泄漏☕️🤦️after)而再次设置MacBook很有帮助。 鉴于本文是在周五起草的,因此,啤酒泄漏的可能性要大得多。

Disclaimer: the “essential software” that will be described here is based on the author’s personal preference and a better set of tools bound to exist — please add your recommendations in the comments. This article in no way tries to advertise or promote any specific software, and all programs can be obtained free of charge.

免责声明:此处将要描述的“基本软件”是基于作者的个人喜好以及必然存在的一套更好的工具-请在评论中添加您的建议。 本文决不试图做广告或推广任何特定的软件,并且可以免费获得所有程序。

Requirements: the instructions here will be based on macOS (v10.15); however, installation for Linux or Windows is just as feasible.

要求 :此处的说明将基于macOS(v10.15); 但是,安装Linux或Windows也是可行的。

大纲: (Outline:)

The sections here are essentially independent, feel free to skip ahead!

这里的部分基本上是独立的,请随时跳过!

  1. Git: version control and code back-up is a must. GitHub allows you to create an appealing README for your project, and can even host your website for free.

    Git:版本控制和代码备份是必须的。 GitHub允许您为项目创建吸引人的自述文件,甚至可以免费托管您的网站。
  2. Python: set-up a virtual environment for package installation and assignment of a default beautiful plotting style macro.

    Python:为安装程序包和分配默认的漂亮绘图样式宏设置虚拟环境。
  3. VSCode: a more advanced way to use Python notebooks.

    VSCode:一种使用Python笔记本的更高级的方法。
  4. Notion: create and share beautiful notes about your project.

    概念:创建并共享有关您的项目的精美笔记。
  5. Grammarly: enhance your writing skills.

    语法:提高您的写作技巧。

1. Git (1. Git)

Here, we will assume some familiarity with git and GitHub. If you do want to learn more about version control, or need a refresher, head over to this amazing article by Anne Bonner. The most efficient way to interact with your GitHub repository is through a terminal app like iTerm2, which provides amazing git integration, auto-suggestions and syntax highlighting, as described here. Moreover, it allows opening images (and gifs) directly in the terminal.

在这里,我们假设对git和GitHub有所了解。 如果您确实想了解有关版本控制的更多信息,或者需要复习一下,请转到Anne Bonner撰写的这篇精彩文章 。 与GitHub存储库进行交互的最有效方式是通过iTerm2之类的终端应用程序,该应用程序提供了惊人的git集成,自动建议和语法高亮显示,如此处所述 。 而且,它允许直接在终端中打开图像(和gif)

Image for post
Image by author) 作者提供的图片 )

Your project deserves more than a bland README page, so write an appealing project description as described here, and throw in some badges (shields), as shown below.

您的项目不应该拥有平淡的README页面,因此,请按此处所述编写引人入胜的项目描述,并放入一些标志(防护罩) ,如下所示。

Image for post
https://github.com/badges/shields repository. ( https://github.com/badges/shields存储库的README页面。 ( Image by author) 作者提供的图片 )

Finally, if you need a website to host your project, or need an online portfolio to display multiple projects, GitHub provides tools to do that easily, as described here by Emile Gill. A good way to pick-up some HTML and CSS skills is to download a free website template from HTML5 UP and start tinkering.

最后,如果您需要一个网站来托管您的项目,或者需要一个在线投资组合来显示多个项目,那么GitHub提供了一些工具来轻松实现此目的,如Emile Gill所述 。 掌握一些HTML和CSS技能的好方法是从HTML5 UP下载免费的网站模板并开始修补。

Image for post
https://html5up.net/strata ( https://html5up.net/strata的免费HTML5模板Strata( Image by author) 图片作者提供 )

2. Python (2. Python)

Here we will proceed with a local installation of Python packages with pip and virtualenv.

在这里,我们将继续使用pipvirtualenv本地安装Python软件包。

If you need to get the latest version of Python usebrew install python(see here if you don’t have brew on your laptop). This installs the latest versions of pythonand pip. In case you already have an older version of Python installed (e.g. v2) and python command linked to it (check with python --version), brew makes python3and pip3 available.

如果您需要获得最新版本的Python,请使用 brew install python ( 如果您 的笔记本电脑上 没有 brew 请参见此处 )。 这将安装最新版本的 python pip 如果您已经安装了旧版本的Python(例如v2)并 链接了 python 命令(请使用 python --version 检查 ),brew使 python3 pip3 可用。

First, install the environment manager (use pip3 if python --version is v2)

首先,安装环境管理器( 如果 python --version 为v2 ,请使用 pip3 )

Next, create a new directory and instantiate a new Python environment in that directory

接下来,创建一个新目录并在该目录中实例化一个新的Python环境

This will allow us to install Python packages directly in that environment, as opposed to the ‘global’ installation. This environment needs to be activated (each login) with

与“全局”安装相反,这将使我们能够直接在该环境中安装Python软件包。 需要使用以下命令激活此环境(每次登录)

You should see (my_project_env) in your terminal. If you have many packages to install, simply list them in a file — where you can also specify versions — requirements.txt

您应该在终端中看到(my_project_env) 。 如果你有很多要安装的软件包,简单地列出他们在一个文件-在这里你还可以指定版本- requirements.txt

and then ask pip to install them all in one go in the my_project_env

然后要求pip将它们全部安装在my_project_env

默认Python Matplotlib样式 (Default Python Matplotlib style)

If you want your plots to look more appealing than the ones with the default Matplotlib options, you can set a custom Matplotlib style. Here is an example:

如果您希望您的图看起来比带有默认Matplotlib选项的图更具吸引力,则可以设置自定义Matplotlib样式。 这是一个例子:

Image for post
Data courtesy of 数据由 Aurelien Geron’s “Hands-On Machine Learning” bookAurelien Geron的“动手机器学习”书提供 . ( ( Image by author) 作者提供的图片 )

To achieve this style, download this macro file ml_style.mplstyle (feel free to modify/distribute the file), and add these lines to your Python/Jupyter code

要实现此样式,请下载此宏文件ml_style.mplstyle (可以随意修改/分发该文件),并将这些行添加到您的Python / Jupyter代码中

3. VSCode (3. VSCode)

There is some frustration with the unresponsiveness of switching tabs between large notebooks in Jupyter. Since transitioning to VSCode, there is nothing to miss about Jupyter: all the functionality of Python notebooks is there. Moreover, VSCode is a fast and powerful editor for Python, C, LaTeX and other files. Additionally, VSCode can be set-up to have the same shortcuts and behaviour as other, more familiar, editors like Sublime or Atom — bingo!

在Jupyter中,在大型笔记本电脑之间切换选项卡的React迟钝,这让人有些沮丧。 自过渡到VSCode以来, Jupyter没什么可错过的:Python笔记本的所有功能都在那里。 此外,VSCode是Python,C,LaTeX和其他文件的快速而强大的编辑器。 此外,VSCode可以设置为具有与其他更熟悉的Sublime或Atom-bingo等编辑器相同的快捷方式和行为。

Bikash Sundaray wrote a great article on setting VSCode for Python notebooks. Moreover, you can connect to a remote Jupyter session running on, for example, your GPU Ubuntu server (more on this here) for neural network training.

Bikash Sundaray写了一篇很棒的文章,关于为Python笔记本设置VSCode。 此外,您可以连接到运行在例如您的GPU Ubuntu服务器上的远程Jupyter会话 ( 此处有更多信息 ),以进行神经网络训练。

Image for post
Image by author) 作者提供的图片 )

4.观念 (4. Notion)

A powerful platform to make notes and create documentation is a must. It serves two purposes: 1) keeping yourself organised, and 2) allowing you to share your notes with others easily. Notion allows you to keep your documentation organised, as well as making tasks, templates, meetings, code-embeddings a breeze. Sergio Ruiz wrote a comprehensive guide on utilising many of the Notion’s features.

必须有一个强大的平台来做笔记和创建文档。 它有两个目的:1)使自己保持井井有条,以及2)允许您轻松与他人共享笔记。 概念使您可以保持文档的井井有条,并使任务,模板,会议,代码嵌入变得轻而易举。 塞尔吉奥·鲁伊斯(Sergio Ruiz)撰写了有关使用许多概念功能的综合指南。

Image for post
Image by author) 作者提供的图片 )

An academic-looking email (i.e. ending in .edu, .ac.uk, etc.) will grant you a Pro version for free. If you used Evernote before, Notion provides a migration tool, which worked as expected for me.

具有学术外观的电子邮件(即以.edu,.ac.uk等结尾)将为您免费提供Pro版本。 如果您以前使用过Evernote,则Notion提供了一个迁移工具,该工具对我来说工作正常。

N.B. As of August 2020, only US English and Korean languages are supported, with Britsh English (spell-checker), as well as other languages in the works.

注意:自2020年8月起,仅支持美国英语和韩语,并支持Britsh英语(拼写检查器)以及其他语言。

5.语法上 (5. Grammarly)

Getting a high accuracy score on your model is great; however, a successful data science project involves effective communication of your findings and methods. Grammarly — an AI-enabled grammar, tone and style assistant — allows you to enhance your writing skills.

在模型上获得很高的准确度得分是非常好的; 但是,成功的数据科学项目需要有效传达您的发现和方法。 语法 -一种支持AI的语法,语调和样式助手-使您可以提高写作技巧。

Image for post
Image by author) 作者提供的图片 )

N.B. Notion will implement Grammarly integration (with the standalone app) soon, hopefully. In the meantime, Grammarly works only if the Notion is opened through a browser.

NB Notion有望很快实现语法集成(与独立应用程序一起使用)。 同时,仅当通过浏览器打开概念时,语法才有效。

后记 (Afterword)

I hope you found this article useful in helping you to start your data science project. Please let me know if you have any comments or suggestions. The upcoming follow-up article will describe other robust tools and methods (DevOps) to improve your workflow.

我希望您发现本文对帮助您启动数据科学项目很有用。 如果您有任何意见或建议,请告诉我。 即将发布的后续文章将介绍其他完善的工具和方法(DevOps),以改善您的工作流程。

翻译自: https://towardsdatascience.com/essential-software-tools-for-data-science-projects-32c86ac54ca6

数据科学工具

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值