

In 2019, LinkedIn ranked “Data Scientist” as the most promising profession in the US with 56% increase in job demand, and has consistently topped Glassdoor’s best jobs in America for three years straight.

2019年, LinkedIn将“数据科学家”列为美国最有前途的职业 ,其工作需求增长了56%,并且连续三年蝉联Glassdoor在美国的最佳工作

Sure, the COVID pandemic might have heavily affected the job landscape, but in the midst of businesses suffering enormous cuts lies a more pressing need for a data-driven culture. Having a strong data capability can reinforce better decision-making, with business goals and targets monitored and optimized.

当然,COVID大流行可能已经严重影响了工作前景,但是在企业遭受巨大裁员的情况下,对数据驱动文化的需求更加迫切。 具有强大的数据功能可以更好地制定决策,并监控和优化业务目标。

For me and thousands of other aspiring data scientists, knowing what it takes to successfully break into the field would especially matter now, given the growing competition. So I ask, what do we need to do to stand out?

对于我和其他成千上万的有抱负的数据科学家而言,鉴于竞争日益激烈,知道成功进入该领域需要哪些条件现在尤为重要。 所以我问, 我们要做什么才能脱颖而出?

To answer this, I looked at the 2019 Kaggle Machine Learning and Data Science Survey results. Each year the “world’s largest data science community” conducts a survey among its users to obtain insights about the state of the Data Science and Machine Learning Industry. Looking at how Kagglers’ practices and characteristics influence salary should be a good starting point to answer my question.

为了回答这个问题,我查看了2019年Kaggle机器学习和数据科学调查的结果 。 每年,“世界上最大的数据科学社区”都会对其用户进行一次调查,以获取有关数据科学和机器学习行业状况的见解。 看卡格勒的做法和特征如何影响薪水应该是回答我的问题的一个很好的起点。

I focused on the 2,013 data science professionals in the USA earning more than $30K annually, knowing that a lot of variation in salary can occur across geographies. Also, I excluded the professionals earning less than $30K to capture only those likely working full-time.

我着眼于美国的2,013名数据科学专业人员,他们的年收入超过3万美元,并且知道各个地区的薪资可能会发生很大的变化。 此外,我排除了收入低于3万美元的专业人员,以仅捕获可能全职工作的专业人员。

Most of the data science professionals have salaries within the $100K-200K range. This is way above the US median salary of $40K for 2019.

大多数数据科学专业人员的薪水都在10万至20万美元之间。 这远高于美国2019年的中位数薪资$ 40K

Image for post
Salary Distribution of US Data Science Professionals in Kaggle (<$30K removed)
美国数据科学专业人员在Kaggle的薪水分配(已除去<$ 30K)

不同数据科学职位之间的薪资是否存在重大差异? (Are there major differences in salary among the different data science roles?)

The sample is a mix of various data science professionals. The “Data Scientist” role tops the list (34%), followed by “Software Engineers” (13%) and “Data Analysts” (12%).

该示例混合了各种数据科学专业人员。 “数据科学家”角色位居榜首(34%),其次是“软件工程师”(13%)和“数据分析师”(12%)。

What are the job roles that have higher pay potential? I intend to answer this visually using heatmaps (and I will do this for the most part of this article). I figured that it’s a better way to show the distribution by role given that salary data was presented in ranges (rather than actual numbers).

具有较高薪资潜力的工作角色是什么? 我打算使用热图直观地回答这个问题(我将在本文的大部分内容中这样做)。 我认为,考虑到薪水数据是按范围(而不是实际数字)显示的,这是按角色显示分布的更好方法。

From the heatmap we can see the obvious discrepancy between a Data Scientist and a Data Analyst, with the former showing a heavier concentration on the $100K-200K range, and the latter somewhere within $60K-125K. It seems that data scientists are paid much more than analysts.

从热图中,我们可以看到数据科学家和数据分析师之间存在明显的差异,前者的注意力集中在10万至20万美元的范围内,而后者则在6万至125万美元之间。 数据科学家的薪水似乎比分析师高得多。

Other professions such as Statisticians and Database Engineers tend to have more variation in pay, while Data Engineers are more concentrated in the $120K-125K range.

其他职业,例如统计学家和数据库工程师的薪资差异往往更大,而数据工程师的薪资水平则更多地集中在$ 120K-125K之间。

Image for post

Note: Heatmaps were used row percentages to account for differences in sizes among data roles.


做好数据科学的基本技术技能是什么? (What are the essential technical skills to do well in data science?)

Do great data scientists need to be good at coding? A Glassdoor study suggests that it’s worthwhile because 9 out of 10 data scientist positions require at least one of Python, R, or SQL as a skill.

大数据科学家需要擅长编码吗? Glassdoor的一项研究表明,这样做很有价值,因为十分之九的数据科学家职位至少需要Python,R或SQL中的一项技能。

In this particular survey almost all Kagglers have at least one programming language that they use to do data science, and only .6% do not code at all. This isn’t surprising at all given the nature of Kaggle, where notebooks are the main way to share content.

在这项特殊的调查中,几乎所有的Kaggler都至少使用一种编程语言来进行数据科学,只有0.6%的人根本不编程。 考虑到Kaggle的本质,这一点都不奇怪,在笔记本中,笔记本是共享内容的主要方式。

But which particular programming languages are the most important to learn? The survey says that Python is the most popular with 30% using it on a regular basis. It is then followed by SQL (22%) and R (15%).

但是,哪种特定的编程语言最重要? 调查显示Python最受欢迎,有30%的人定期使用它。 然后是SQL(22%)和R(15%)。

Looking at the salary heatmap we can see that while all the programming languages tend to bunch up in the $100K-200K range, software engineering-oriented languages such as Java, C++, and C have more dense representation in the $150K-200K range. Other noteworthy languages that relate to higher pay are Matlab, Typescript, and Bash.

查看工资热图,我们可以看到,虽然所有编程语言都倾向于在10万至20万美元的范围内聚集,但面向软件工程的语言(如Java,C ++和C)在15万至20万美元的范围内具有更密集的表示形式。 与高薪相关的其他值得注意的语言是Matlab,Typescript和Bash。

Image for post

On average, Kagglers use 2–3 programming languages on a regular basis. Does the number of languages used matter? Plotting the number of languages used according to salary range, we see that the number of languages used tend to increase as pay increases — up to the 125K-150K point. So yes, it may be worth learning more than 1.

平均而言,Kagglers定期使用2-3种编程语言。 使用的语言数量重要吗? 根据薪水范围来绘制使用的语言数量,我们发现使用的语言数量会随着薪资的增加而增加,最高可达125K-150K。 是的,值得学习的不只是1。

Image for post

Apart from programming, what other skills matter? From the salary heatmap we see a strong case for learning cloud-based data software and APIs. Those who use it appear to have a higher earning potential, most likely at $150K-200K, and even a high concentration of professionals earning more than $300K.

除了编程外,还有哪些其他技能重要? 从薪资热图中,我们看到了学习基于云的数据软件和API的强大案例 。 那些使用它的人似乎具有更高的收入潜力,最有可能在15万至20万美元之间,甚至是高度集中的收入超过30万美元的专业人士。

Image for post

教育背景起着很大的作用吗? (Does educational background play a huge part?)

Data science professionals tend to be a highly educated group, with 77% having either a Master’s Degree or a PhD. The heatmaps do not really show anything remarkable, except that Professional Degrees have a high concentration in the $150K-250K bracket. This group only constitutes 1.2% of the sample, hence I would say this is inconclusive.

数据科学专业人士往往是受过高等教育的群体,其中77%拥有硕士学位或博士学位。 该热图并没有真正显示出任何显着之处,除了专业学位高度集中在15万至25万美元的范围内。 该组仅占样本的1.2%,因此我不能说是结论性的。

Image for post

在线平台上的持续学习有什么帮助? (How much does continuous learning on online platforms help?)

Aside from formal education, upskilling can be done through tons of online content, like what Massive Open Online Courses (MOOCs) and online bootcamps offer. A huge majority (83%) of Kagglers use these platforms to learn data science. Coursera is by far the most popular, followed by Datacamp, Udemy, and Kaggle Courses.

除了正规教育之外,还可以通过大量在线内容来进行技能提升,例如大规模开放在线课程(MOOC)和在线训练营提供的内容。 绝大多数(83%)的Kagglers使用这些平台来学习数据科学。 Coursera是迄今为止最受欢迎的课程,其次是Datacamp,Udemy和Kaggle课程。

Interestingly, Fast.ai skewed heavily on the higher income levels $125K-150K. DataQuest on the other hand are much more spread over the lower and middle income levels, which suggests that beginners tend to use this site more.

有趣的是,Fast.ai在较高的收入水平($ 125K-150K)上严重倾斜。 另一方面,DataQuest分布在中低收入水平,这表明初学者倾向于更多地使用该网站。

Image for post

Apart from MOOCs and courses, data science media can also be good sources of skills and industry knowledge. Blogs such as Medium and Analytics Vidhya are the most popular, followed by Kaggle.

除了MOOC和课程外,数据科学媒体也可以成为技能和行业知识的良好来源。 诸如Medium和Analytics Vidhya之类的博客最受欢迎,其次是Kaggle。

Not a lot of pattern can be observed in the salary heatmap — most are bunched within the $100K-200K range. Curiously, Hacker News appears to have more followers on the higher end with $150K-200K salaries.

在薪资热度图中观察不到很多模式,大多数模式集中在10万至20万美元之间。 奇怪的是,《黑客新闻》似乎有更多的追随者,他们的薪水在15万至20万美元之间。

Image for post

Key takeaways:


To win in the data science field (AND if you define winning as having a high pay):


  1. Code! Learning more languages will probably help. Apart from Python and R consider adding other non-data science languages such as C++, Java, and Typescript into your toolkit.

    码! 学习更多的语言可能会有所帮助。 除了Python和R外,请考虑将其他非数据科学语言(例如C ++,Java和Typescript)添加到您的工具箱中。

  2. Cloud-based technologies are worth learning. Get ready to explore those AWS, GCP, and Azure platforms for big data.

    基于云的技术值得学习。 准备好探索那些适用于大数据的AWS,GCP和Azure平台。

  3. Continuously upskill and update through MOOCs and online courses, and through media such as blogs and technology news.

    通过MOOC和在线课程,以及通过博客和技术新闻等媒体, 不断提高技能和更新技能

A few disclaimers:


  • While technical skills are required to succeed in data science, soft skills are definitely indispensable. Unfortunately we cannot get a measure of that using just this dataset.

    尽管必须具备技术技能才能在数据科学中取得成功,但软技能绝对是必不可少的。 不幸的是,仅使用此数据集我们无法对此进行度量。

  • The Kaggle community is not representative of the entire data science industry, and might be geared towards a more specialized sample in the population (i.e. machine learning enthusiasts). Also, this is US-focused, and it will be interesting to see how the results would look like across various countries.

    Kaggle社区并不能代表整个数据科学行业,并且可能面向人群中更专业的样本(例如,机器学习爱好者)。 此外,这是针对美国的,并且有趣的是看到结果在各个国家看起来如何。

  • While these are primarily visualizations driving storytelling, it will be interesting to get more deep and do a predictive model to quantify the drivers to salary.


Thanks for reading! All the code for the analysis and visualizations can be accessed through my github.

谢谢阅读! 可以通过我的github访问所有用于分析和可视化的代码。

翻译自: https://medium.com/@noemiramiro/how-to-win-in-the-data-science-world-fdb0bd9b4ce7


  • 0
  • 1
    觉得还不错? 一键收藏
  • 0


  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


