自然语言处理顶级会议_顶级数据科学工具和语言

自然语言处理顶级会议

意见(Opinion)

目录(Table of Contents)

  1. Introduction

    介绍
  2. Python, R, SAS, and SQL

    Python,R,SAS和SQL
  3. Matplotlib, Seaborn, tqdm

    Matplotlib,Seaborn,tqdm
  4. sklearn, NumPy, and pandas

    sklearn,NumPy和熊猫
  5. Jupyter nbextensions

    Jupyter nbextensions
  6. Tableau and Google Data Studio

    Tableau和Google Data Studio
  7. Summary

    概要

介绍(Introduction)

The goal of this article is to give a general overview of the top Data Science tools and languages. I have either used these the most frequently out of others or have worked with others who have commonly used them as well. There are a few unique tools that are quite beneficial that not everyone may not know about additionally that I will be discussing later on. I will give some use cases for my examples so you can see why these tools and languages are so valuable. I have previously written about some of these tools and languages, so in this article, I will add more information as well as new information.

本文的目的是对顶级数据科学工具和语言进行总体概述。 我要么是最经常使用这些工具的人,要么是与其他经常使用它们的人一起工作的。 有一些非常有用的独特工具,并不是每个人都可能不知道,此外我将在稍后讨论。 我将为示例提供一些用例,以便您了解为什么这些工具和语言如此有价值。 之前,我已经写过关于这些工具和语言的文章,因此在本文中,我将添加更多信息以及新信息。

Keep on reading if you want to know more about the top tools and languages for Data Scientist, as well as why you should be using them.

如果您想进一步了解Data Scientist的顶级工具和语言,以及为什么要使用它们,请继续阅读。

Python,R,SAS和SQL (Python, R, SAS, and SQL)

Image for post
Photo by Katy Wilkens on Unsplash [2].
凯蒂·威尔肯斯( Katy Wilkens)Unsplash [2]上的照片。

I have used Python the most, with R and SAS programming tied around in second place. Especially in academic settings is where I have used both R and SAS. I also found some good use out of them in my beginning Data Science roles. For SQL, I have used this in about every role for both Data Analytics and Data Science purposes.

我使用Python最多,R和SAS编程排在第二位。 特别是在学术环境中,我同时使用了R和SAS。 在开始担任Data Science角色时,我还发现其中有一些用处。 对于SQL,我几乎在每个角色中都使用了此功能,以用于数据分析和数据科学目的。

Python

Python

With each role, I see more and more Data Scientists using Python. I think the reason for this growth and focus on one particular language is that this language is scalable in a business, while also prone to more collaboration opportunities.

对于每个角色,我看到越来越多的Python使用数据科学家。 我认为这种增长并专注于一种特定语言的原因是,该语言在企业中具有可扩展性,同时也倾向于提供更多的合作机会。

For example, when I work with Python, I am more likely to be able to collaborate with not just other Data Scientists, but Machine Learning and Software Engineers as well.

例如,当我使用Python时,我不仅能够与其他数据科学家合作,而且还可以与机器学习和软件工程师合作。

If I were to use R and SAS, I would most likely have to refactor my code so that it could be interpreted and ingested by a software process and system that already exists in the business. Of course, each company is different, and some prefer R or SAS over Python in their stack. Also important to note is that in most of my interviews, Python code was preferred by the interviewers.

如果要使用R和SAS,则很可能必须重构我的代码,以便可以由业务中已经存在的软件过程和系统来解释和提取代码。 当然,每个公司都是不同的,并且有些公司在堆栈中更喜欢R或SAS而不是Python。 还有一点需要注意的是,在我的大多数采访中,Python代码都被采访者所青睐。

R

[R

I have found the most use of this language when practicing Data Science for the first time in an academic and research setting. However, when I was working professionally with R, I had great benefit from its focus on statistics and testing. If you want to work on significance testing, unique visualizations, and A/B testing, then R may be preferred.

在学术和研究环境中首次练习数据科学时,我发现该语言使用最多。 但是,当我在使用R进行专业工作时,由于专注于统计和测试,我从中受益匪浅。 如果您要进行重要性测试,独特的可视化效果和A / B测试,则R是首选。

SAS

SAS

My experience with SAS was similar to that of R. I benefitted from SAS when conducting experiments and significance testing. It also has some incredibly unique and powerful visualizations with summary statistics, like Q-Q plots, residuals, histograms, and Cook’s D.

我在SAS方面的经验与R相似。在进行实验和意义测试时,我从SAS中受益。 它还具有一些具有摘要统计信息的极其独特且功能强大的可视化内容,例如QQ图,残差,直方图和Cook'sD。

SQL

SQL

SQL is widely used by Data Analysts, Data Scientists, Business Intelligence Analyst, Software Engineers, and Data Engineers. I highly recommend having a basic understanding of it for your Data Science career. It is similar to Python in that it is a cross-functional method of communicating out metrics for your business.

SQL被数据分析师,数据科学家,商业智能分析师,软件工程师和数据工程师广泛使用。 我强烈建议您对数据科学职业有一个基本的了解。 它与Python相似,因为它是一种跨功能的方法,可以为您的业务传达指标。

Matplotlib,Seaborn,tqdm (Matplotlib, Seaborn, tqdm)

Image for post
M. B. M. on MBMUnsplash [3]. Unsplash [3]上的照片。

For visualization tools in Python, I use these the most. Matplotlib and seaborn go hand-in-hand, while tqdm is a simple way to describe your code loops by showing a progress bar.

对于Python中的可视化工具,我使用最多。 Matplotlib和seaborn携手并进,而tqdm是通过显示进度条来描述代码循环的简单方法。

Matplotlib

Matplotlib

Matplotlib [4] is a useful tool and popular library for creating beneficial visualizations from your data that can even be interactive. I usually incorporate this library when developing charts, plots, and histograms for quick analysis. Sometimes you need to create the visualization for yourself, and not for other people, like stakeholders, so a more complex tool like Tableau is not necessary for these situations.

Matplotlib [4]是有用的工具和流行的库,用于从甚至是交互式的数据中创建有益的可视化。 在开发图表,曲线图和直方图以进行快速分析时,我通常会合并此库。 有时您需要为自己而不是其他人(如利益相关者)创建可视化文件,因此对于这些情况,不需要像Tableau这样的更复杂的工具。

Seaborn

海生

If I want to create more visually appealing graphics, I use Seaborn [5]. There are more options with this library, and you can describe your data with more statistical vigor. The graphics made in Seaborn also remind me of some of the SAS graphics that are automatically created when calling certain functions like the diagnostic plots for regression models.

如果我想创建更具视觉吸引力的图形,请使用Seaborn [5]。 该库有更多选项,您可以用更多统计活力描述数据。 Seaborn制作的图形还使我想起了某些SAS图形,这些图形是在调用某些函数(例如回归模型的诊断图)时自动创建的。

tqdm

tqdm

If you are as curious and obsessed with checking the status of your Python loops as I am, tqdm [6] is for you, too. You can visually see live updates of your progress, so you can know how much time you have before it is done, and can let you know how long you can focus on something else.

如果您像我一样好奇并着迷于检查Python循环的状态,那么tqdm [6]也适合您。 您可以直观地看到进度的实时更新,因此可以知道完成之前有多少时间,并且可以让您知道可以专注于其他事情的时间。

sklearn,NumPy和熊猫 (sklearn, NumPy, and pandas)

Image for post
Photo by Diana Parkhouse on Unsplash [7].
戴安娜·帕克豪斯( Diana Parkhouse)摄影: Unsplash [7]。

These tools are widely used for good reason. They are essential to Data Scientists who use Python.

这些工具被广泛使用是有充分理由的。 对于使用Python的数据科学家来说,它们是必不可少的。

sklearn

斯克莱恩

For simple and powerful Machine Learning, sklearn [8] is the way to go — also known as scikit-learn. Here are some of the highlights of sklearn:

对于简单而强大的机器学习, sklearn [8]是必经之路-也称为scikit-learn。 以下是sklearn的一些亮点:

  • classification

    分类

  • regression

    回归

  • clustering

    聚类

  • dimensionally reduction

    降维

  • model selection

    选型

  • preprocessing

    预处理

NumPy

NumPy

NumPy [9] is also simple to use and can be utilized for dataframe computation, which is the foundation of a lot of Data Science analysis.

NumPy [9]也易于使用,可用于数据帧计算,这是许多数据科学分析的基础。

Common functions of NumPy include, but are not limited to:

NumPy的常见功能包括但不限于:

  • arrays

    数组

  • indexing

    索引编制

  • slicing

    切片

  • copying

    复制中

  • iterating

    反复

  • splitting

    分裂

  • sorting

    分类

pandas

大熊猫

I would say most Data Scientists use pandas because it is at the start of your project where you perform Data Analytics on your data, while the most important part is that you can develop a pandas dataframe. These dataframes are ingested by several other libraries so that you can perform Machine Learning algorithms on them more easily.

我要说大多数数据科学家都使用pandas,因为它是在项目开始时对数据进行数据分析的工具,而最重要的部分是可以开发pandas数据框。 这些数据帧由其他几个库吸收,因此您可以更轻松地对它们执行机器学习算法。

Some useful parts of pandas [10] include, but are not limited to:

大熊猫[10]的一些有用部分包括但不限于:

merginggroupingreshapinghead/tailcalculationsdescriptive statistics

Jupyter nbextensions (Jupyter nbextensions)

Image for post
Alexis Antonio on 亚历克西斯·安东尼奥( Unsplash [11] Unislash)摄[11]

If you use Jupyter Notebooks frequently, you can add nbextensions [12] that make your notebook more organized. In your Jupyter Notebook, you can add nbextenions with the following code:

如果您经常使用Jupyter笔记本,则可以添加nbextensions [12],以使笔记本更加井井有条。 在Jupyter Notebook中,您可以使用以下代码添加nbextenions:

!pip install jupyter_contrib_nbextensions!jupyter contrib nbextension install
Image for post
Nbextensions tab. Screenshot by Author [13].
Nbextensions选项卡。 作者[13]的屏幕截图。

Here is what your notebook will look like after you restart your kernel/exit out of your terminal and retype jupyter notebook in your terminal to relaunch it. The above screenshot shows all of the possible extensions that you can add by clicking on them. You will also need to exit out of your notebook and reopen it to see the changes applied. I recommend Codefolding and Codefolding in Editor. Here is an example of what my notebook looks like before the codefolding extension:

这是您在终端中重新启动内核/退出并在终端中重新键入jupyter笔记本以重新启动后,笔记本的外观。 上面的屏幕快照显示了您可以通过单击添加的所有可能的扩展。 您还需要退出笔记本电脑,然后重新打开笔记本以查看更改。 我建议在编辑器中进行代码折叠代码折叠。 这是代码折叠扩展名前笔记本的外观示例:

Image for post
Before codefolding. Screenshot by Author [14].
在代码折叠之前。 作者[14]的屏幕截图。

As you can see, this function looks normal, but the following image is after the codefolding is applied.

如您所见,此功能看起来很正常,但是下图是应用代码折叠后的。

Image for post
Codefolding applied. Screenshot by Author [15].
已应用代码折叠。 作者[15]的屏幕截图。

But what happens after you click on those arrows? They collapse the function. Now imagine you have a long function or dictionary that is several hundred lines long and you have to scroll up and down, making it stressful to look at your own notebook. This codefolding extension will surely give your notebook a much cleaner look. Here is what the function looks like collapsed:

但是,单击这些箭头后会发生什么? 它们使功能崩溃。 现在,假设您有一个长函数或字典,该函数或字典的长度为几百行,并且您必须上下滚动,这使得查看自己的笔记本变得很压力。 此代码折叠扩展名一定会为您的笔记本提供更整洁的外观。 这是折叠后的函数:

Image for post
Collapsed function. Screenshot by Author [16].
功能崩溃。 作者[16]的屏幕截图。

I hope this tool was unique. I know the others can be somewhat over discussed, but there are always incoming Data Scientists who may have never heard of any of these tools. But for me, this specific tool was new to me recently even after years of coding in Jupyter Notebook.

我希望这个工具是独一无二的。 我知道其他人可能会被过度讨论,但是总是有新来的数据科学家可能从未听说过任何这些工具。 但是对我而言,即使经过Jupyter Notebook的多年编码,该特定工具还是最近才对我来说是新的。

Tableau和Google Data Studio (Tableau and Google Data Studio)

Tableau [17] is also a well-known tool, however, some Data Scientists may only be using custom coding to create their visualizations. Google Data Studio [18] is less commonly used, but it is even easier to use than Tableau and it is free.

Tableau [17]也是一个众所周知的工具,但是,某些数据科学家可能仅使用自定义编码来创建其可视化。 Google Data Studio [18]的使用较少,但比Tableau更易于使用,并且是免费的。

Tableau

画面

This tool is incredibly useful for especially Data Analysts and Product Managers. It can also be beneficial for Data Scientists who either want to perform exceptionally well-visualized exploratory data analysis, as well as creating dashboards that display your Data Science model metrics.

该工具对数据分析师和产品经理尤其有用。 对于想要执行异常可视化的探索性数据分析以及创建显示您的数据科学模型指标的仪表板的数据科学家来说,这也可能是有益的。

Google Data Studio

Google Data Studio

Google Data Studio is easy to use and you can start off by connecting a data source, whether that be from a live SQL platform or from a CSV file. Here are some charts I previously made that can serve as an example of what you could possibly create with your own data. The data I used is dummy data. Here are five examples of visualizations that you could make:

Google Data Studio易于使用,您可以通过连接数据源(无论是来自实时SQL平台还是来自CSV文件)开始。 这是我之前制作的一些图表,可以作为您可以使用自己的数据创建的示例。 我使用的数据是伪数据。 这是您可以制作的五个可视化示例:

Image for post
Google Data Studio. Screenshot by Author [19].
Google Data Studio。 作者[19]的屏幕截图。

Google Data Studio includes plenty of ways that you can visualize your data whether it is before your model or after your model.

Google Data Studio提供了多种方式来可视化数据,无论是模型之前还是模型之后。

Here is another group of visualizations that you can make as well:

这也是您可以进行的另一组可视化:

Image for post
Google Data Studio. Screenshot by Author [20].
Google Data Studio。 作者[20]的屏幕截图。

Another great feature about using this tool is that you can make it interactive by scrolling over charts and graphs, as well as clicking on dropdown menus within your dashboard. These dashboards are also easy to share and save. As a Data Scientist that can prove to be beneficial when sharing your dashboards with stakeholders who will later on use them by themselves.

使用此工具的另一个重要功能是,您可以通过滚动图表和图形以及单击仪表板中的下拉菜单来使其具有交互性。 这些仪表板也易于共享和保存。 作为数据科学家,当与利益相关者共享仪表板时,可以证明是有益的,以后他们将自己使用它们。

概要 (Summary)

We discussed some top tools and languages for Data Scientists. I hope that you have learned some new things about them if you already use them currently or if they are completely new to you in general. Additionally, I have included some more unique or lesser-known tools like tqdm, Jupyter nbextensions, and Google Data Studio.

我们讨论了数据科学家的一些顶级工具和语言。 我希望您已经了解了一些关于它们的新知识,如果您目前已经使用它们,或者它们对您而言是全新的。 另外,我包括了一些更独特或鲜为人知的工具,例如tqdm,Jupyter nbextensions和Google Data Studio。

As a Data Scientist, you will encounter various projects that will require a wide set of tools and languages. It is critical that you learn some, and become an expert in others. Your toolkit is what will set you apart from other Data Scientists, as well as make you a more competitive applicant for interviews and a more collaborative employee.

作为数据科学家,您将遇到各种需要多种工具和语言的项目。 学习一些知识并成为其他方面的专家至关重要。 您的工具包将使您与其他数据科学家区分开来,并使您成为更具竞争力的面试申请人和更具协作性的员工。

To summarize, here are the top tools and languages for Data Scientists:

总而言之,以下是面向数据科学家的最佳工具和语言:

Python, R, SAS, and SQLMatplotlib, Seaborn, tqdmsklearn, NumPy, and pandasJupyter nbextensionsTableau and Google Data Studio

Thank you for reading! I appreciate your time. Feel free to comment down below and start a discussion on common tools and languages you practice as a Data Scientist or Machine Learning Engineer. Please see the references as well for quick links to documentation on these Data Science tools.

感谢您的阅读! 感谢您的宝贵时间。 请在下面随意评论,并开始讨论您作为数据科学家或机器学习工程师实践的常用工具和语言。 请同时参阅参考资料,以获取这些数据科学工具文档的快速链接。

翻译自: https://towardsdatascience.com/top-data-science-tools-and-languages-b38b88c7669d

自然语言处理顶级会议

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值