

Back in the summer of 2018, I was just starting my first internship as a Data Analyst.


Data science was all the rage back then, with the data scientist being heralded the sexiest job of the 21st century. I remember reading articles featuring the famous Venn diagram that described what a data scientist was.

当时,数据科学风靡一时,数据科学家被誉为21世纪最性感的工作。 我记得读过有关著名的维恩图的文章,这些图描述了数据科学家是什么。

In hindsight, the Venn Diagram wasn’t very descriptive, but it provided a starting point to pick up the tools and knowledge that would eventually help me launch my career as a data analyst.


After fumbling around with a bunch of different tools, working multiple stints in MNCs and startups over the years, I’ve developed a pretty good idea of the key skills and competencies companies look out for in data analysts. And so, this guide is designed to prime you with the bare essentials, hopefully allow you to get your foot into the world of data, and land that first entry-level data analyst job.

在摸索了许多不同的工具,在跨国公司和初创公司工作了多年之后,我对公司在数据分析师中寻找的关键技能和能力有了很好的认识。 因此,本指南旨在为您提供基本知识,希望可以使您踏入数据世界,并着手进行第一个入门级数据分析师工作。

Before that, let’s first cover some major shifts in computing that are important for an aspiring data analyst to know if they plan on jumping into the field.


云服务和无服务器计算的兴起 (The Rise of Cloud Services and Serverless Computing)

With the advent of cloud services and growth of new support and tooling for data practitioners, the barriers to entry to access data have reduced significantly.


Collecting, storing, and disseminating to others within the organization is now cheap and frictionless.


Big 3 Cloud Computing Providers

Companies are hosting computing workloads on GCP, AWS and Azure, so their data engineers can focus on building data pipelines that pipe data into a centralized data warehouse, instead of worrying about maintaining servers and hardware upgrades.


In today’s age, it is trivial to spin up a cluster of nodes in a remote data centre, and let your cloud provider manage the allocation of machine resources.


We call this shift towards a new model of building applications Serverless Computing, rendering the days of building an on-premise data centre feel like a distant dream.


使数据分析师的数据访问民主化 (Democratizing Data Access for Data Analysts)

Frictionless data access means that now more than ever, organizations are finding it easy to build data science departments to take advantage of data assets, and are hiring analysts to crunch data, in the hopes that they will discover insights.


With this in mind, the goal of this article is to help you take advantage of this rising demand, and start your journey as an aspiring data analyst.


So what exactly do data analysts do? And what are the most important skills to start learning right now to get started on your journey?

那么数据分析师到底在做什么呢? 现在就开始学习以开始旅程的最重要技能是什么?

数据分析师的期望 (What to Expect as a Data Analyst)

Among the different roles in data science, the data analyst has by far the simplest learning curve.


You don’t have to code as much as a data engineer, nor do you have to know statistics well enough to the point of being a data scientist or machine learning engineer.


Referring to the data science hierarchy of needs below, you can see the data engineer is usually the person responsible for building data pipelines that move data from databases into a centralized data store called a data warehouse.


This is essentially the collect and move / store layers in the pyramid.


Data Science Hierarchy of Needs

The data is then further transformed by data analysts to discover insights that are used by business users to influence decision-making.


Analysis can be presented in the form of a dashboard, a slide deck, or whatever tool is best suited to present insights and recommendations.


To illustrate this further, here are common descriptions curated from data analyst job descriptions on Linkedin:


  1. Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy.

  2. Experience partnering with business and using data to influence stakeholders and provide actionable recommendations.

  3. Ability to conduct rigorous analysis and communicate conclusions to both technical and non-technical audiences.

  4. Proficiency in SQL with experience in querying large, complex data sets. Strong Excel skills, Python and R.

    精通SQL,具有查询大型复杂数据集的经验。 精通Excel和Python和R。
  5. Proficiency in Tableau, or similar data visualization tools is a plus.


In summary, data analysts work closely with business users to make sure they are satisfied with the insights generated from the data.


They are also the ones responsible for ensuring the analysis is accurate, communicated clearly, and stored in an accessible place business users can refer to.


基本技能和学习之旅 (Essential Skills and The Learning Journey)

So now, knowing what value a data analyst brings to an organization, let us move on to the required skills necessary to become one.


I have condensed this down to 5 key points, and will not cover Microsoft Excel as I am assuming anyone who is interested to become a data analyst will already have the basic knowledge to crunch data using standard Excel functions.

我将其简化为5个关键点 ,并且不会涵盖Microsoft Excel,因为我假设有兴趣成为数据分析师的任何人都已经具备使用标准Excel函数处理数据的基本知识。

I will also not be covering soft skills like stakeholder management and communication, although do note the effectiveness of a data analyst’s output is highly correlated with your stakeholder’s ability to understand it.

我也不会涉及利益相关者管理和沟通等软技能,尽管确实要注意 数据分析师输出 有效性与利益相关者理解数据的能力高度相关。

Therefore, data analysts have to be clear communicators, presenting their thoughts in a persuasive manner, in order to effectively influence stakeholders to take the correct course of action.


1.掌握SQL (1. Master SQL)

SQL will be the most helpful language you learn in your journey as a data analyst.


SQL is human-readable, and declarative, meaning that you do not have to tell the SQL query engine the exact steps to execute the query to pull data. The engine has free reign to explore and figure out the most efficient method to return the output back to you.

SQL是人类可读的, 声明性的 ,这意味着您不必告诉SQL查询引擎执行查询以提取数据的确切步骤。 该引擎可以自由支配,探索并找出最有效的方法将输出返回给您。

Contrast this with a procedural programming language like Python, where you, the programmer, will have to tell the program in what order to execute the data transformation steps to get the output you want. This is why SQL is not considered a traditional programming language by many.

将此与诸如Python之类的过程编程语言进行对比,在这里,作为程序员,您将必须告诉程序以什么顺序执行数据转换步骤以获取所需的输出。 这就是为什么很多人不认为SQL为传统编程语言的原因。

Aside from its declarative nature, there is also increasing support for SQL in big data tools. SQL abstraction layers have been built on top of Big Data processing frameworks such as HiveQL and SparkSQL.

除了其声明性之外,大数据工具中对SQL的支持也越来越多。 SQL抽象层已建立在诸如HiveQL和SparkSQL之类的大数据处理框架之上。

SparkSQL Framework

This means you can utilize the same SQL knowledge and tap on powerful big data processing frameworks at your disposal, in the event you run into limitations with your current data processing engine, which eliminates the need to learn a new language from scratch.


The SQL language has been relevant for over 40 years, and will continue to be the primary way data analysts query data for the foreseeable future. In that regard, it provides the best return on investment in your career.

SQL语言已有40多年的历史了,在可预见的将来,它将继续成为数据分析人员查询数据的主要方式。 在这方面,它为您的职业生涯提供了最佳的投资回报。

2.选择一种编程语言。 (我建议使用Python) (2. Pick up a programming language. (I recommend Python))

Although not compulsory for entry-level data analyst jobs, I highly recommend data analysts to pick up a programming language and learn the basics of data structures and algorithms.


It is inevitable at some point in your career, you will reach the limits of what you can do with SQL and need a programing language to help you interact with APIs to pull data, automate A/B tests, or conduct sentiment analysis.

这是在职业生涯中不可避免的时刻,您将达到使用SQL的能力极限,并且需要一种编程语言来帮助您与API交互以提取数据,自动进行A / B测试或进行情感分析。

A programming language adds an essential tool in your arsenal that provides flexible options for manipulating and creating value with data.


If you are language agnostic, Python is a great language to start with. Many popular data science libraries such as numpy and pandas are written in Python, and there is increasing support for Python in big data processing frameworks such as PySpark.

如果您与语言无关,那么Python是一门很好的语言。 许多流行的数据科学库(例如numpy和pandas)都是用Python编写的,并且在大数据处理框架(例如PySpark)中对Python的支持越来越多。

Python also has a syntax that is easy to comprehend for someone new to programming.


Python Data Science Libraries

3.了解用于数据可视化的工具。 (3. Learn a tool for data visualization.)

Visualization tools enable data analysts to disseminate their findings in the form of automated dashboards to business users, with an intuitive drag-and-drop interface that allows less technical folks slice and dice data.


Tools like Tableau, PowerBI, and Looker are ubiquitous in organizations.


While these tools provide similar features across the board, there are minor nuances when comparing different tools. The good news is that learning one will allow you to transfer the knowledge you have obtained to other tools.

虽然这些工具提供了全面的相似功能,但在比较不同的工具时会有细微的差别。 好消息是,学习一个可以使您将获得的知识转移到其他工具中。

Visualization tools are relatively intuitive to learn compared to SQL or Python, so pick one and roll with it. Tableau and Looker are both good choices with widespread adoption in many organizations.

与SQL或Python相比,可视化工具的学习相对直观,因此选择一个并滚动即可。 Tableau和Looker都是不错的选择,并在许多组织中得到广泛采用。

Data Visualization Tools

4.选择一个您热衷的问题空间。 (4. Pick a problem space you are passionate about.)

A lot of a data analyst’s day-to-day involves breaking down the business problem into a set of questions that can be answered using data.


The kinds of problems you face at an eCommerce company will be vastly different from those faced by a manufacturing company for example. In other words, being a good data analyst in an eCommerce company does not mean you’ll be able to come in and excel as a data analyst in a bank.

例如,您在电子商务公司面临的问题种类将与制造公司所面临的问题截然不同。 换句话说,在电子商务公司中成为一名出色的数据分析师并不意味着您将能够成为银行中的数据分析师并脱颖而出。

Domain knowledge and expertise matters. I would argue it is just as important as your technical abilities because it allows you to narrow down the scope of data to analyze. Your experience will guide you to the best areas in the data to mine for insights, enabling you to be more efficient at your job.

领域知识和专长至关重要。 我认为这与您的技术能力同样重要,因为它可以缩小您要分析的数据范围。 您的经验将指导您进入数据的最佳领域以挖掘见解,从而使您的工作效率更高。

If there is a certain problem space that compels you to explore the underlying dataset, go ahead and build an awesome personal project showing the insights and actionable recommendations you found for that particular domain.


You can then host your project on Github. Alternatively, Tableau also allows users to post their own data visualization project on their site gallery.

然后,您可以在Github上托管您的项目。 另外,Tableau还允许用户将自己的数据可视化项目发布在其网站库上

Github Logo

This is the most effective way to land a job if you have no prior working experience, as it demonstrates initiative and skill to hiring managers and recruiters, and shows that you are passionate to make an impact in that particular industry or domain.


5.修订基本统计知识 (5. Revise basic statistical knowledge)

Finally, a data analyst should have basic understanding of statistics, A/B testing, and online experiments.

最后,数据分析师应该对统计,A / B测试和在线实验有基本的了解。

Any company with a web or mobile app will definitely design experiments to validate whether new product features have improved metrics for the company, for example ARPU.



Often, product managers will look to data analysts to help design such experiments. To do so, understanding hypothesis testing, significance levels, p-values, sample size, Type I / Type II errors, and the various factors that could invalidate your test results are essential.

产品经理通常会寻求数据分析师的帮助来设计此类实验。 为此,了解假设检验,显着性水平,p值,样本量,I类/ II类错误以及可能使测试结果无效的各种因素至关重要。

Peep Laja has done an excellent primer on A/B testing that covers these topics in depth.

Peep Laja在A / B测试方面做了出色的入门,深入介绍了这些主题。

最后的想法和建议 (Final Thoughts and Advice)

Phew, and that’s it! That was a lot to take in. With some tenacity and a bit of luck, you are well-equipped on your way to landing that first data analyst job in no time.

ew,就是这样! 这要花很多钱。有了一些毅力和一点运气,您就可以很快地装备上准备第一项数据分析师工作的方式。

It must be mentioned that the world of data changes rapidly, with new tools coming out every few months. If you relish a challenge, data is an extremely dynamic field to build your career and I’m sure you won’t regret your decision.

必须提到的是,数据世界正在Swift变化,每隔几个月就会出现新的工具。 如果您喜欢挑战,那么数据是建立事业的极其动态的领域,我相信您不会后悔自己的决定。

Just remember, no one had it easy, and learning these things take time, so don’t try to learn everything at once, otherwise you’ll find yourself getting overwhelmed. Instead, focus on reaching proficiency with one or two tools first and build an awesome project with what you’ve learned.

请记住,没有人容易,学习这些东西需要时间,所以不要尝试一次学习所有内容,否则您会发现自己不知所措。 相反,首先要专注于使用一种或两种工具来提高熟练程度,并使用您所学的知识来构建一个很棒的项目。

Practice makes perfect, it is crucial to apply the skills you have learned to real-world problems to internalize how they work. Once you’ve reached proficiency with a programming language or a visualization tool, the next one becomes much easier to learn.

实践是完美的,至关重要的是将您学到的技能应用于现实世界中的问题,以内化它们的工作方式。 一旦您精通编程语言或可视化工具,下一个就会变得更容易学习。

A piece of parting advice is to be flexible, keep an open mind, and experiment with new tools often to keep up with the pace of change in the data world. With that, I wish you all the best in your journey!

分手建议是保持灵活性,保持开放的心态,并经常尝试使用新工具来跟上数据世界的变化步伐。 祝您旅途中一切顺利!

