

介绍 (Introduction)

If you have been browsing job ads lately, you would have noticed a huge amount of positions available for Data Scientist. The demand seems to be much larger than the supply which means that there is a huge opportunity here. However, there appears to be a catch: Most of these positions requires some experience or knowledge in the field of Data Science. So if you want midway through your career, how can you skill up to become a Data Scientist?

如果您最近一直在浏览招聘广告,您会发现Data Scientist可以提供大量职位。 需求似乎比供应大得多,这意味着这里有巨大的机会。 但是,似乎有一个陷阱:这些职位中的大多数都需要数据科学领域的一些经验或知识。 因此,如果您想在职业生涯中途前进,那么如何才能成为一名数据科学家呢?

Well today I will attempt to answer this question.


什么是数据科学 (What is Data Science)

Before we jump into how one can become a Data Scientist, let’s first have a quick look at what exactly Data Science is.


We are all aware of the so-called “explosion of data”. More and more data is gathered through the web, mobile apps, fitness devices and the like. This is collectively known as Big Data. But big data does not only refer to the volume of data, but also to high velocity and high variety data.

我们都知道所谓的“数据爆炸”。 通过网络,移动应用程序,健身设备等收集越来越多的数据。 这统称为大数据。 但是大数据不仅指数据量,而且指的是高速和多变的数据。

Data Science is the skills and techniques required to make sense of all this data. Which includes advanced analytics, data mining, machine learning, data visualization and statistics. It’s the ability to draw insights from raw amounts of data to solve real-world problems.

数据科学是理解所有这些数据所需的技能和技术。 其中包括高级分析,数据挖掘,机器学习,数据可视化和统计。 它是从原始数据中汲取见解以解决实际问题的能力。

According to the Gartner Report “Critical Capabilities for Operational Database Management Systems” 2015 :

根据Gartner报告“运营数据库管理系统的关键功能” 2015:

“By 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform.”


We can already see this in SQL Server 2016 which now includes:

我们已经在SQL Server 2016中看到了这一点,现在它包括:

  • R Services


    R services allow data scientists and analysts to run statistical programming queries directly on their database. It supports extremely fast computations using multiple cores, processors and threads.

    R服务使数据科学家和分析人员可以直接在其数据库上运行统计编程查询。 它支持使用多个内核,处理器和线程的超快速计算。

  • PolyBase


    PolyBase acts as a gateway between SQL Server and Hadoop or Azure blob storage, so you can use Transact-SQL to query non-relational data in the same way you would query relational data on your database.

    PolyBase充当SQL Server与Hadoop或Azure Blob存储之间的网关,因此您可以使用Transact-SQL来查询非关系数据,就像查询数据库上的关系数据一样。

  • PowerBI


    PowerBI it is tightly integrated with SQL Server allowing for easy analysis and sharing of data insights and creating rich visualizations

    PowerBI与SQL Server紧密集成,可轻松分析和共享数据见解并创建丰富的可视化图像

  • Cortana Intelligence Suite on Azure

    Azure上的Cortana Intelligence套件

    The Cortana intelligence suite combines big data and advanced analytics, allowing you to get actionable intelligence from your data. You can create models with Azure Machine Learning, and analyze data in Azure Data Lake or SQL Data Warehouse using Azure Data Lake Analytics, or Azure stream analytics, to mention but a few of the powerful tools which can be used with Cortana.

    Cortana智能套件结合了大数据和高级分析功能,使您能够从数据中获得可行的情报。 您可以使用Azure机器学习创建模型,并使用Azure Data Lake Analytics或Azure流分析来分析Azure Data Lake或SQL Data Warehouse中的数据,这里仅列举了一些可与Cortana一起使用的强大工具。

  • Keeping this in mind, A SQL Server professional will already have access to the tools required to become a Data Scientist.

    牢记这一点,SQL Server专业人员已经可以使用成为数据科学家所需的工具。

    Here is a look at what Azure Machine Learning Studio looks like. You can try it out for free by going to this link and clicking on the Start Studio button.

    这是Azure Machine Learning Studio的外观。 您可以通过转到此链接并单击“ Start Studio”按钮免费试用。

    A myriad of helpful resources is available here to help you get started, including an interactive tutorial.


    Figure 1: Microsoft Azure Machine Learning in Action 图1:Microsoft Azure机器学习的实际应用

    成为一名数据科学家我需要知道什么 (What do I need to know to be a Data Scientist)

  1. You need to understand data. Know how to explore it and how to use statistical and analytical techniques

    您需要了解数据。 知道如何探索它以及如何使用统计和分析技术

  2. You need to be able to query and manipulate data sets into required formats using Transact-SQL


  3. You need to be able to present data in a meaningful way by using tools such as Excel or Power BI.

    您需要能够使用Excel或Power BI等工具以有意义的方式显示数据。

  4. You need to understand statistics, and its role in gaining insights from data.


  5. You need to know how to use a statistical programming language such as R or Python.


  6. You need to be able to perform data transformation, cleansing and some statistical analysis


  7. You must understand data science concepts such as machine learning, algorithms , conditional probability etc


  8. You must be able to create machine learning models, and how to evaluate them


  9. You must be able to use machine learning to generate predictions and solve problems


  10. You must learn how to use tools such as Microsoft Azure HDInsight , Scala, Spark etc

    您必须学习如何使用Microsoft Azure HDInsight,Scala,Spark等工具

I know this is quite daunting. But it is achievable with some hard work and dedication. And luckily there are now multiple resources available to help you on your quest to become a Data Scientist.

我知道这很艰巨。 但是通过一些努力和奉献是可以实现的。 幸运的是,现在有多种资源可帮助您寻求成为数据科学家。

那么,如何向准雇主证明我现在是数据科学家呢? (So how do prove to a prospective employer that I am now a Data Scientist?)

Microsoft recognizes that there is an extreme shortage of data scientists and as such has embarked on a mission to facilitate the study of Data Science for those who want to embrace this new exciting career opportunity.


As such they have launched the Microsoft Professional Degree in Data Science which will run for the first time on the 22nd of August 2016.

因此,他们已经推出了微软专业学位在科学数据,这将是第一次2016年八月22 运行。

These courses have been designed by employers and collaboration of top universities such as Columbia and Harvard and will be available at EdX.com


The degree program which is available on edX.com consists out of 4 units:


  • The Fundamentals


    This is where you will learn the basics, such as querying data and visualizing it. There are 3 compulsory courses in this unit and 1 elective where you can choose between using Excel or PowerBI

    您将在这里学习基础知识,例如查询数据和对其进行可视化。 本单元共有3门必修课和1门选修课,您可以在其中使用Excel或PowerBI进行选择

  • Core Data Science


    In this unit you will learn how to use a statistical programming language. You can choose between Python or R

    在本单元中,您将学习如何使用统计编程语言。 您可以选择Python或R

  • Applied Data Science


    In this unit you will learn more advanced techniques using Python or R to be able to extract meaningful insights from your data.


  • A Cortana Intelligence Competition


    Finally you get to prove your recently acquired skills by completing a real world project which will be scored and graded, and ultimately award you your degree in Data Science.


结论 (Conclusion)

Microsoft estimates that there are in the region of 1.5 million jobs available for Data Scientists. Looking at the skills required to become a Data Scientist can take the wind out of your sales. But luckily various universities and companies have recognized the shortage of skills and have started programs to bridge this gap.

微软估计,数据科学家可以提供150万个工作岗位。 查看成为数据科学家所需的技能可以消除您的销售。 但是幸运的是,各种大学和公司已经认识到技能的不足,并已经启动了弥合这一差距的计划。

Microsoft themselves are offering a degree program which has been developed by experts and academics in the industry, which will open the doors for many who aspire to become data scientists.


参考文献: (References: )

翻译自: https://www.sqlshack.com/10-things-need-know-become-data-scientist/






