【Data Science 之 基本软硬实力】Google,Facebook的数据科学家应该具备哪些软硬技能

A data scientist is better statistician than any software engineer and better engineer as compared to any statistician. Data scientist is termed to be the “sexiest job of the 21st century. Let’s discuss how to become a data scientist (What are the skills needed)


1.What are the Roles and Responsibilities of a Data Scientists:

Data scientists are big data wranglers. They take huge amount of messy data points (unstructured and structured) and clean, massage and organize them with their formidable skills in math, statistics and programming. Then they apply all their analytic powers to uncover hidden solutions to business challenges and present it to the business. In other words, Data scientists utilize their knowledge of statistics and modelling to convert data into actionable insights about everything from product development to customer retention to new business opportunities.

Data Scientist need to have both technical and non-technical skills to perform their job in an effective manner. Technical skills are involved at 3 stages in Data Science. They include:

  1. Data Capture & pre-processing
  2. Data Analysis & pattern recognition
  3. Presentation & visualization

For performing above 3 stages, 3 categories of tools are needed – tools for pulling data, tools for analyzing the data, and tools for presenting the results. Here are the different tools available for performing the same:

2. Tools for data pulling & pre-processing

a. SQL

This is a must skill for all data scientists, regardless of whether you are using structured or unstructured data. Companies are using latest SQL engines like Apache HiveSpark-SQL, Flink-SQL, Impala, etc.

b. Big Data Technologies

This is a must skill for all the Data Scientists. The data scientist needs to know about different big data technologies – 1st Gen technologies like Apache Hadoop & its ecosystem (hive, pigflume, etc.), Next Gen like – Apache Spark and Apache Flink (Apache Flink is replacing Apache Spark quickly as Flink is a general purpose Big data engine, which can handle real-time stream as well, for more details about Flink follow this comprehensive tutorial)

c. UNIX

As most raw data is stored on a UNIX or Linux server before it’s put in a data-store so it’s nice to be able to access the raw data without the dependency of a database. So Unix knowledge is good for Data Scientists. Follow this command guide to practice Linux commands.

d. Python

Python is most popular language for data scientist. Python is an interpreted, object-oriented programming language with dynamic semantics. It is high level language with dynamic binding and typing.

3. Tools for Data Analysis & pattern matching

This depends on your level of statistical knowledge. Some tools are used for more advanced statistics and some for more basic statistics.

a. SAS

Lots of companies use SAS, so some basic SAS understanding is good. You can manipulate equations easily.

b. R

R is most popular in statistical world. R is an open-source tool and language that is object oriented, so you can use that anywhere. It is the first choice of any data scientist as most things are implemented in R.

c. Machine Leaning

Machine learning is the most demanding and most useful tool the data scientists must have. Machine learning algorithms are used for advanced analytics, predictive analytics, advanced pattern matching. There are lots of machine learning tools are available in the market like weka, nltk, etc. but machine learning tools on top of big data technologies are grabbing industry attention like Mahout (on top of Hadoop), MLlib (on top of Spark), FlinkML (on top of Flink).

4. Tools for Visualization

a. Tableau

It is a popular tool, especially in Silicon Valley.

b. JMP (SAS subsidiary)

JMP has some nice visualization.

c. R

R also has great visualization support such as ggplot2, lattice, rCharts, google charts, shiny for webapps, slidify for presentations, etc.

Apart from above mentioned tools following tools are also popular – JasperSoft, SAP BI, QlikView, MicroStrategy, etc.

5. Non-Technical Skills

a. Business Acumen

One needs to have a solid understanding of the industry he is working in, to know the issues faced by the organization. Data scientist should be able to determine which problems are critical and which aren’t, for identifying new ways by which the data can be used as a leverage.

b. Communication Skills

Companies are searching for data scientists who can clearly and confidently translate their insights on the data to other teammates. A data scientist arms them with quantified insights.

c. Analytical Problem-Solving

Analytical problem solving skill is highly demanding for Data Scientist, so that the right approach can be used to get maximum output in available time and resources.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值