数据分析师入门_数据分析师入门基础指南

数据分析师入门

Back in the summer of 2018, I was just starting my first internship as a Data Analyst.

早在2018年夏天,我才刚刚开始我的第一次数据分析师实习。

Data science was all the rage back then, with the data scientist being heralded the sexiest job of the 21st century. I remember reading articles featuring the famous Venn diagram that described what a data scientist was.

当时,数据科学风靡一时,数据科学家被誉为21世纪最性感的工作。 我记得读过有关著名的维恩图的文章,这些图描述了数据科学家是什么。

In hindsight, the Venn Diagram wasn’t very descriptive, but it provided a starting point to pick up the tools and knowledge that would eventually help me launch my career as a data analyst.

事后看来,维恩图不是很具描述性,但是它提供了一个起点,让您可以掌握工具和知识,这些工具和知识最终将帮助我开始从事数据分析师的职业。

After fumbling around with a bunch of different tools, working multiple stints in MNCs and startups over the years, I’ve developed a pretty good idea of the key skills and competencies companies look out for in data analysts. And so, this guide is designed to prime you with the bare essentials, hopefully allow you to get your foot into the world of data, and land that first entry-level data analyst job.

在摸索了许多不同的工具,在跨国公司和初创公司工作了多年之后,我对公司在数据分析师中寻找的关键技能和能力有了很好的认识。 因此,本指南旨在为您提供基本知识,希望可以使您踏入数据世界,并着手进行第一个入门级数据分析师工作。

Before that, let’s first cover some major shifts in computing that are important for an aspiring data analyst to know if they plan on jumping into the field.

在此之前,我们首先介绍一些重要的计算变化,这些变化对于有抱负的数据分析人员了解他们是否计划进入该领域至关重要。

云服务和无服务器计算的兴起 (The Rise of Cloud Services and Serverless Computing)

With the advent of cloud services and growth of new support and tooling for data practitioners, the barriers to entry to access data have reduced significantly.

随着云服务的出现以及为数据从业人员提供新的支持和工具的增长,访问数据的障碍已大大减少。

Collecting, storing, and disseminating to others within the organization is now cheap and frictionless.

现在,收集,存储和分发给组织内的其他人很便宜且毫不费力。

Big 3 Cloud Computing Providers

Companies are hosting computing workloads on GCP, AWS and Azure, so their data engineers can focus on building data pipelines that pipe data into a centralized data warehouse, instead of worrying about maintaining servers and hardware upgrades.

公司正在GCP,AWS和Azure上托管计算工作负载,因此他们的数据工程师可以专注于构建将数据管道传输到集中式数据仓库的数据管道,而不必担心维护服务器和硬件升级。

In today’s age, it is trivial to spin up a cluster of nodes in a remote data centre, and let your cloud provider manage the allocation of machine resources.

在当今时代,在远程数据中心中旋转节点集群很简单,让您的云提供商管理机器资源的分配。

We call this shift towards a new model of building applications Serverless Computing, rendering the days of building an on-premise data centre feel like a distant dream.

我们将这种转变称为构建无服务器计算应用程序的新模型,从而使构建内部数据中心的日子变得遥不可及。

使数据分析师的数据访问民主化 (Democratizing Data Access for Data Analysts)

Frictionless data access means that now more than ever, organizations are finding it easy to build data science departments to take advantage of data assets, and are hiring analysts to crunch data, in the hopes that they will discover insights.

无摩擦数据访问意味着,如今组织比以往任何时候都更容易地建立数据科学部门以利用数据资产,并雇用分析人员来处理数据,以期希望他们能发现见解。

With this in mind, the goal of this article is to help you take advantage of this rising demand, and start your journey as an aspiring data analyst.

考虑到这一点,本文的目标是帮助您利用不断增长的需求,并开始成为有抱负的数据分析师。

So what exactly do data analysts do? And what are the most important skills to start learning right now to get started on your journey?

那么数据分析师到底在做什么呢? 现在就开始学习以开始旅程的最重要技能是什么?

数据分析师的期望 (What to Expect as a Data Analyst)

Among the different roles in data science, the data analyst has by far the simplest learning curve.

在数据科学的不同角色中,数据分析师拥有迄今为止最简单的学习曲线。

You don’t have to code as much as a data engineer, nor do you have to know statistics well enough to the point of being a data scientist or machine learning engineer.

您不必像数据工程师那样编写太多代码,也不必足够了解统计信息就可以成为数据科学家或机器学习工程师。

Referring to the data science hierarchy of needs below, you can see the data engineer is usually the person responsible for building data pipelines that move data from databases into a centralized data store called a data warehouse.

参照下面的数据科学需求层次结构,您可以看到数据工程师通常是负责构建数据管道的人员,该管道将数据从数据库移动到称为数据仓库的集中式数据存储中。

This is essentially the collect and move / store layers in the pyramid.

这实质上是金字塔中的收集和移动/存储层。

Data Science Hierarchy of Needs

The data is then further transformed by data analysts to discover insights that are used by business users to influence decision-making.

然后,数据分析人员会进一步转换数据,以发现业务用户用来影响决策的见解。

Analysis can be presented in the form of a dashboard, a slide deck, or whatever tool is best suited to present insights and recommendations.

可以以仪表板,滑盖或任何最适合呈现见解和建议的工具的形式来进行分析。

To illustrate this further, here are common descriptions curated from data analyst job descriptions on Linkedin:

为了进一步说明这一点,这是从Linkedin上的数据分析员职位描述中精选的常见描述:

  1. Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy.

    强大的分析能力,能够收集,组织,分析和传播大量信息,并注重细节和准确性。
  2. Experience partnering with business and using data to influence stakeholders and provide actionable recommendations.

    具有与企业合作的经验,并使用数据影响利益相关者并提供可行的建议。
  3. Ability to conduct rigorous analysis and communicate conclusions to both technical and non-technical audiences.

    能够进行严格分析并向技术和非技术受众传达结论的能力。
  4. Proficiency in SQL with experience in querying large, complex data sets. Strong Excel skills, Python and R.

    精通SQL,具有查询大型复杂数据集的经验。 精通Excel和Python和R。
  5. Proficiency in Tableau, or similar data visualization tools is a plus.

    精通Tableau或类似的数据可视化工具者优先。

In summary, data analysts work closely with business users to make sure they are satisfied with the insights generated from the data.

总之,数据分析师与业务用户紧密合作,以确保他们对从数据生成的见解满意。

They are also the ones responsible for ensuring the analysis is accurate, communicated clearly, and stored in an accessible place business users can refer to.

他们还是负责确保分析准确,清楚地传达并存储在业务用户可以访问的地方的人员。

基本技能和学习之旅 (Essential Skills and The Learning Journey)

So now, knowing what value a data analyst brings to an organization, let us move on to the required skills necessary to become one.

因此,现在,了解数据分析师为组织带来的价值后,让我们继续学习成为一名数据分析师所必需的技能。

I have condensed this down to 5 key points, and will not cover Microsoft Excel as I am assuming anyone who is interested to become a data analyst will already have the basic knowledge to crunch data using standard Excel functions.

我将其简化为5个关键点 ,并且不会涵盖Microsoft Excel,因为我假设有兴趣成为数据分析师的任何人都已经具备使用标准Excel函数处理数据的基本知识。

I will also not be covering soft skills like stakeholder management and communication, although do note the effectiveness of a data analyst’s output is highly correlated with your stakeholder’s ability to understand it.

我也不会涉及利益相关者管理和沟通等软技能,尽管确实要注意 数据分析师输出 有效性与利益相关者理解数据的能力高度相关。

Therefore, data analysts have to be clear communicators, presenting their thoughts in a persuasive manner, in order to effectively influence stakeholders to take the correct course of action.

因此,数据分析师必须是清晰的沟通者,以有说服力的方式表达他们的想法,以便有效地影响利益相关者采取正确的措施。

1.掌握SQL (1. Master SQL)

SQL will be the most helpful language you learn in your journey as a data analyst.

在您作为数据分析师的过程中,SQL将是您学习中最有用的语言。

SQL is human-readable, and declarative, meaning that you do not have to tell the SQL query engine the exact steps to execute the query to pull data. The engine has free reign to explore and figure out the most efficient method to return the output back to you.

SQL是人类可读的, 声明性的 ,这意味着您不必告诉SQL查询引擎执行查询以提取数据的确切步骤。 该引擎可以自由支配,探索并找出最有效的方法将输出返回给您。

Contrast this with a procedural programming language like Python, where you, the programmer, will have to tell the program in what order to execute the data transformation steps to get the output you want. This is why SQL is not considered a traditional programming language by many.

将此与诸如Python之类的过程编程语言进行对比,在这里,作为程序员,您将必须告诉程序以什么顺序执行数据转换步骤以获取所需的输出。 这就是为什么很多人不认为SQL为传统编程语言的原因。

Aside from its declarative nature, there is also increasing support for SQL in big data tools. SQL abstraction layers have been built on top of Big Data processing frameworks such as HiveQL and SparkSQL.

除了其声明性之外,大数据工具中对SQL的支持也越来越多。 SQL抽象层已建立在诸如HiveQL和SparkSQL之类的大数据处理框架之上。

SparkSQL Framework

This means you can utilize the same SQL knowledge and tap on powerful big data processing frameworks at your disposal, in the event you run into limitations with your current data processing engine, which eliminates the need to learn a new language from scratch.

这意味着您可以利用相同SQL知识,并使用强大的大数据处理框架,以防万一当前数据处理引擎遇到限制,从而无需从头开始学习新语言。

The SQL language has been relevant for over 40 years, and will continue to be the primary way data analysts query data for the foreseeable future. In that regard, it provides the best return on investment in your career.

SQL语言已有40多年的历史了,在可预见的将来,它将继续成为数据分析人员查询数据的主要方式。 在这方面,它为您的职业生涯提供了最佳的投资回报。

2.选择一种编程语言。 (我建议使用Python) (2. Pick up a programming language. (I recommend Python))

Although not compulsory for entry-level data analyst jobs, I highly recommend data analysts to pick up a programming language and learn the basics of data structures and algorithms.

尽管对于入门级数据分析师来说不是强制性的,但我强烈建议数据分析师选择一种编程语言,并学习数据结构和算法的基础。

It is inevitable at some point in your career, you will reach the limits of what you can do with SQL and need a programing language to help you interact with APIs to pull data, automate A/B tests, or conduct sentiment analysis.

这是在职业生涯中不可避免的时刻,您将达到使用SQL的能力极限,并且需要一种编程语言来帮助您与API交互以提取数据,自动进行A / B测试或进行情感分析。

A programming language adds an essential tool in your arsenal that provides flexible options for manipulating and creating value with data.

编程语言在您的武器库中添加了一个必不可少的工具,该工具提供了灵活的选项来操纵和创造数据价值。

If you are language agnostic, Python is a great language to start with. Many popular data science libraries such as numpy and pandas are written in Python, and there is increasing support for Python in big data processing frameworks such as PySpark.

如果您与语言无关,那么Python是一门很好的语言。 许多流行的数据科学库(例如numpy和pandas)都是用Python编写的,并且在大数据处理框架(例如PySpark)中对Python的支持越来越多。

Python also has a syntax that is easy to comprehend for someone new to programming.

Python的语法对于刚接触编程的人来说很容易理解。

Python Data Science Libraries

3.了解用于数据可视化的工具。 (3. Learn a tool for data visualization.)

Visualization tools enable data analysts to disseminate their findings in the form of automated dashboards to business users, with an intuitive drag-and-drop interface that allows less technical folks slice and dice data.

可视化工具使数据分析人员可以通过自动化仪表板的形式将其发现信息分发给业务用户,并具有直观的拖放界面,该界面允许技术含量较低的人员对数据进行切片和切块。

Tools like Tableau, PowerBI, and Looker are ubiquitous in organizations.

Tableau,PowerBI和Looker等工具在组织中无处不在。

While these tools provide similar features across the board, there are minor nuances when comparing different tools. The good news is that learning one will allow you to transfer the knowledge you have obtained to other tools.

虽然这些工具提供了全面的相似功能,但在比较不同的工具时会有细微的差别。 好消息是,学习一个可以使您将获得的知识转移到其他工具中。

Visualization tools are relatively intuitive to learn compared to SQL or Python, so pick one and roll with it. Tableau and Looker are both good choices with widespread adoption in many organizations.

与SQL或Python相比,可视化工具的学习相对直观,因此选择一个并滚动即可。 Tableau和Looker都是不错的选择,并在许多组织中得到广泛采用。

Data Visualization Tools

4.选择一个您热衷的问题空间。 (4. Pick a problem space you are passionate about.)

A lot of a data analyst’s day-to-day involves breaking down the business problem into a set of questions that can be answered using data.

数据分析师每天的很多工作涉及将业务问题分解为一系列问题,可以使用数据来回答这些问题。

The kinds of problems you face at an eCommerce company will be vastly different from those faced by a manufacturing company for example. In other words, being a good data analyst in an eCommerce company does not mean you’ll be able to come in and excel as a data analyst in a bank.

例如,您在电子商务公司面临的问题种类将与制造公司所面临的问题截然不同。 换句话说,在电子商务公司中成为一名出色的数据分析师并不意味着您将能够成为银行中的数据分析师并脱颖而出。

Domain knowledge and expertise matters. I would argue it is just as important as your technical abilities because it allows you to narrow down the scope of data to analyze. Your experience will guide you to the best areas in the data to mine for insights, enabling you to be more efficient at your job.

领域知识和专长至关重要。 我认为这与您的技术能力同样重要,因为它可以缩小您要分析的数据范围。 您的经验将指导您进入数据的最佳领域以挖掘见解,从而使您的工作效率更高。

If there is a certain problem space that compels you to explore the underlying dataset, go ahead and build an awesome personal project showing the insights and actionable recommendations you found for that particular domain.

如果存在一定的问题空间,迫使您探索基础数据集,请继续构建一个很棒的个人项目,以显示您在特定领域中发现的见解和可行的建议。

You can then host your project on Github. Alternatively, Tableau also allows users to post their own data visualization project on their site gallery.

然后,您可以在Github上托管您的项目。 另外,Tableau还允许用户将自己的数据可视化项目发布在其网站库上

Github Logo

This is the most effective way to land a job if you have no prior working experience, as it demonstrates initiative and skill to hiring managers and recruiters, and shows that you are passionate to make an impact in that particular industry or domain.

如果您没有以前的工作经验,这是找到工作的最有效方法,因为它表明了招聘经理和招聘人员的主动性和技巧,并表明您热衷于在该特定行业或领域产生影响。

5.修订基本统计知识 (5. Revise basic statistical knowledge)

Finally, a data analyst should have basic understanding of statistics, A/B testing, and online experiments.

最后,数据分析师应该对统计,A / B测试和在线实验有基本的了解。

Any company with a web or mobile app will definitely design experiments to validate whether new product features have improved metrics for the company, for example ARPU.

任何拥有Web或移动应用程序的公司都肯定会设计实验,以验证新产品功能是否为公司改进了指标,例如ARPU。

hypothesis-testing-visualization

Often, product managers will look to data analysts to help design such experiments. To do so, understanding hypothesis testing, significance levels, p-values, sample size, Type I / Type II errors, and the various factors that could invalidate your test results are essential.

产品经理通常会寻求数据分析师的帮助来设计此类实验。 为此,了解假设检验,显着性水平,p值,样本量,I类/ II类错误以及可能使测试结果无效的各种因素至关重要。

Peep Laja has done an excellent primer on A/B testing that covers these topics in depth.

Peep Laja在A / B测试方面做了出色的入门,深入介绍了这些主题。

最后的想法和建议 (Final Thoughts and Advice)

Phew, and that’s it! That was a lot to take in. With some tenacity and a bit of luck, you are well-equipped on your way to landing that first data analyst job in no time.

ew,就是这样! 这要花很多钱。有了一些毅力和一点运气,您就可以很快地装备上准备第一项数据分析师工作的方式。

It must be mentioned that the world of data changes rapidly, with new tools coming out every few months. If you relish a challenge, data is an extremely dynamic field to build your career and I’m sure you won’t regret your decision.

必须提到的是,数据世界正在Swift变化,每隔几个月就会出现新的工具。 如果您喜欢挑战,那么数据是建立事业的极其动态的领域,我相信您不会后悔自己的决定。

Just remember, no one had it easy, and learning these things take time, so don’t try to learn everything at once, otherwise you’ll find yourself getting overwhelmed. Instead, focus on reaching proficiency with one or two tools first and build an awesome project with what you’ve learned.

请记住,没有人容易,学习这些东西需要时间,所以不要尝试一次学习所有内容,否则您会发现自己不知所措。 相反,首先要专注于使用一种或两种工具来提高熟练程度,并使用您所学的知识来构建一个很棒的项目。

Practice makes perfect, it is crucial to apply the skills you have learned to real-world problems to internalize how they work. Once you’ve reached proficiency with a programming language or a visualization tool, the next one becomes much easier to learn.

实践是完美的,至关重要的是将您学到的技能应用于现实世界中的问题,以内化它们的工作方式。 一旦您精通编程语言或可视化工具,下一个就会变得更容易学习。

A piece of parting advice is to be flexible, keep an open mind, and experiment with new tools often to keep up with the pace of change in the data world. With that, I wish you all the best in your journey!

分手建议是保持灵活性,保持开放的心态,并经常尝试使用新工具来跟上数据世界的变化步伐。 祝您旅途中一切顺利!

翻译自: https://medium.com/swlh/data-analyst-primer-the-essential-guide-26bd7e9c2297

数据分析师入门

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值