如何成为数据科学家_成为数据科学家需要了解的10件事

如何成为数据科学家

介绍 (Introduction)

If you have been browsing job ads lately, you would have noticed a huge amount of positions available for Data Scientist. The demand seems to be much larger than the supply which means that there is a huge opportunity here. However, there appears to be a catch: Most of these positions requires some experience or knowledge in the field of Data Science. So if you want midway through your career, how can you skill up to become a Data Scientist?

如果您最近一直在浏览招聘广告,您会发现Data Scientist可以提供大量职位。 需求似乎比供应大得多,这意味着这里有巨大的机会。 但是,似乎有一个陷阱:这些职位中的大多数都需要数据科学领域的一些经验或知识。 因此,如果您想在职业生涯中途前进,那么如何才能成为一名数据科学家呢?

Well today I will attempt to answer this question.

今天,我将尝试回答这个问题。

什么是数据科学 (What is Data Science)

Before we jump into how one can become a Data Scientist, let’s first have a quick look at what exactly Data Science is.

在深入探讨如何成为一名数据科学家之前,首先让我们快速了解一下什么是数据科学。

We are all aware of the so-called “explosion of data”. More and more data is gathered through the web, mobile apps, fitness devices and the like. This is collectively known as Big Data. But big data does not only refer to the volume of data, but also to high velocity and high variety data.

我们都知道所谓的“数据爆炸”。 通过网络,移动应用程序,健身设备等收集越来越多的数据。 这统称为大数据。 但是大数据不仅指数据量,而且指的是高速和多变的数据。

Data Science is the skills and techniques required to make sense of all this data. Which includes advanced analytics, data mining, machine learning, data visualization and statistics. It’s the ability to draw insights from raw amounts of data to solve real-world problems.

数据科学是理解所有这些数据所需的技能和技术。 其中包括高级分析,数据挖掘,机器学习,数据可视化和统计。 它是从原始数据中汲取见解以解决实际问题的能力。

According to the Gartner Report “Critical Capabilities for Operational Database Management Systems” 2015 :

根据Gartner报告“运营数据库管理系统的关键功能” 2015:

“By 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform.”

“到2017年,所有领先的运营DBMS将在一个DBMS平台中提供关系和NoSQL的多种数据模型。”

We can already see this in SQL Server 2016 which now includes:

我们已经在SQL Server 2016中看到了这一点,现在它包括:

  • R Services

    R服务

    R services allow data scientists and analysts to run statistical programming queries directly on their database. It supports extremely fast computations using multiple cores, processors and threads.

    R服务使数据科学家和分析人员可以直接在其数据库上运行统计编程查询。 它支持使用多个内核,处理器和线程的超快速计算。

  • PolyBase

    PolyBase

    PolyBase acts as a gateway between SQL Server and Hadoop or Azure blob storage, so you can use Transact-SQL to query non-relational data in the same way you would query relational data on your database.

    PolyBase充当SQL Server与Hadoop或Azure Blob存储之间的网关,因此您可以使用Transact-SQL来查询非关系数据,就像查询数据库上的关系数据一样。

  • PowerBI

    PowerBI

    PowerBI it is tightly integrated with SQL Server allowing for easy analysis and sharing of data insights and creating rich visualizations

    PowerBI与SQL Server紧密集成,可轻松分析和共享数据见解并创建丰富的可视化图像

  • Cortana Intelligence Suite on Azure

    Azure上的Cortana Intelligence套件

    The Cortana intelligence suite combines big data and advanced analytics, allowing you to get actionable intelligence from your data. You can create models with Azure Machine Learning, and analyze data in Azure Data Lake or SQL Data Warehouse using Azure Data Lake Analytics, or Azure stream analytics, to mention but a few of the powerful tools which can be used with Cortana.

    Cortana智能套件结合了大数据和高级分析功能,使您能够从数据中获得可行的情报。 您可以使用Azure机器学习创建模型,并使用Azure Data Lake Analytics或Azure流分析来分析Azure Data Lake或SQL Data Warehouse中的数据,这里仅列举了一些可与Cortana一起使用的强大工具。

  • Keeping this in mind, A SQL Server professional will already have access to the tools required to become a Data Scientist.

    牢记这一点,SQL Server专业人员已经可以使用成为数据科学家所需的工具。

    Here is a look at what Azure Machine Learning Studio looks like. You can try it out for free by going to this link and clicking on the Start Studio button.

    这是Azure Machine Learning Studio的外观。 您可以通过转到此链接并单击“ Start Studio”按钮免费试用。

    A myriad of helpful resources is available here to help you get started, including an interactive tutorial.

    这里提供了大量有用的资源,包括交互式教程,可以帮助您入门。



    Figure 1: Microsoft Azure Machine Learning in Action 图1:Microsoft Azure机器学习的实际应用

    成为一名数据科学家我需要知道什么 (What do I need to know to be a Data Scientist)

  1. You need to understand data. Know how to explore it and how to use statistical and analytical techniques

    您需要了解数据。 知道如何探索它以及如何使用统计和分析技术

  2. You need to be able to query and manipulate data sets into required formats using Transact-SQL

    您需要能够使用Transact-SQL将数据集查询和处理为所需格式

  3. You need to be able to present data in a meaningful way by using tools such as Excel or Power BI.

    您需要能够使用Excel或Power BI等工具以有意义的方式显示数据。

  4. You need to understand statistics, and its role in gaining insights from data.

    您需要了解统计信息及其在从数据中获取见解中的作用。

  5. You need to know how to use a statistical programming language such as R or Python.

    您需要知道如何使用统计编程语言,例如R或Python。

  6. You need to be able to perform data transformation, cleansing and some statistical analysis

    您需要能够执行数据转换,清理和一些统计分析

  7. You must understand data science concepts such as machine learning, algorithms , conditional probability etc

    您必须了解数据科学概念,例如机器学习,算法,条件概率等

  8. You must be able to create machine learning models, and how to evaluate them

    您必须能够创建机器学习模型以及如何评估它们

  9. You must be able to use machine learning to generate predictions and solve problems

    您必须能够使用机器学习来生成预测并解决问题

  10. You must learn how to use tools such as Microsoft Azure HDInsight , Scala, Spark etc

    您必须学习如何使用Microsoft Azure HDInsight,Scala,Spark等工具

I know this is quite daunting. But it is achievable with some hard work and dedication. And luckily there are now multiple resources available to help you on your quest to become a Data Scientist.

我知道这很艰巨。 但是通过一些努力和奉献是可以实现的。 幸运的是,现在有多种资源可帮助您寻求成为数据科学家。

那么,如何向准雇主证明我现在是数据科学家呢? (So how do prove to a prospective employer that I am now a Data Scientist?)

Microsoft recognizes that there is an extreme shortage of data scientists and as such has embarked on a mission to facilitate the study of Data Science for those who want to embrace this new exciting career opportunity.

Microsoft认识到数据科学家的极端短缺,因此已经开始执行一项使命,即为那些希望利用这一新的令人兴奋的职业机会的人们提供便利的数据科学研究。

As such they have launched the Microsoft Professional Degree in Data Science which will run for the first time on the 22nd of August 2016.

因此,他们已经推出了微软专业学位在科学数据,这将是第一次2016年八月22 运行。

These courses have been designed by employers and collaboration of top universities such as Columbia and Harvard and will be available at EdX.com

这些课程是由雇主和哥伦比亚和哈佛等顶尖大学的雇主设计的,可在EdX.com上获得。

The degree program which is available on edX.com consists out of 4 units:

edX.com上提供的学位课程包括4个单元:

  • The Fundamentals

    基础知识

    This is where you will learn the basics, such as querying data and visualizing it. There are 3 compulsory courses in this unit and 1 elective where you can choose between using Excel or PowerBI

    您将在这里学习基础知识,例如查询数据和对其进行可视化。 本单元共有3门必修课和1门选修课,您可以在其中使用Excel或PowerBI进行选择

  • Core Data Science

    核心数据科学

    In this unit you will learn how to use a statistical programming language. You can choose between Python or R

    在本单元中,您将学习如何使用统计编程语言。 您可以选择Python或R

  • Applied Data Science

    应用数据科学

    In this unit you will learn more advanced techniques using Python or R to be able to extract meaningful insights from your data.

    在本单元中,您将学习使用Python或R的更高级的技术,以便能够从数据中提取有意义的见解。

  • A Cortana Intelligence Competition

    Cortana情报竞赛

    Finally you get to prove your recently acquired skills by completing a real world project which will be scored and graded, and ultimately award you your degree in Data Science.

    最后,您将通过完成一个实际项目来证明您最近获得的技能,该项目将进行评分和评分,并最终授予您数据科学学位。

结论 (Conclusion)

Microsoft estimates that there are in the region of 1.5 million jobs available for Data Scientists. Looking at the skills required to become a Data Scientist can take the wind out of your sales. But luckily various universities and companies have recognized the shortage of skills and have started programs to bridge this gap.

微软估计,数据科学家可以提供150万个工作岗位。 查看成为数据科学家所需的技能可以消除您的销售。 但是幸运的是,各种大学和公司已经认识到技能的不足,并已经启动了弥合这一差距的计划。

Microsoft themselves are offering a degree program which has been developed by experts and academics in the industry, which will open the doors for many who aspire to become data scientists.

Microsoft本身正在提供由该行业的专家和学者开发的学位课程,这将为许多渴望成为数据科学家的人打开大门。

参考文献: (References: )

翻译自: https://www.sqlshack.com/10-things-need-know-become-data-scientist/

如何成为数据科学家

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值