

As data science enthusiasts know, there’s a lot more to excelling in the field than just its technical aspects. Data professionals need a wide range of skills, extending well beyond the technical aspects of data manipulation and analysis.

正如数据科学爱好者所知道的,该领域的杰出表现不仅仅是其技术方面。 数据专业人员需要广泛的技能,远远超出了数据处理和分析的技术范围。

This week’s episode of the Alter Everything podcast showcases Carlene Jones, data and analytics consultant, and Nynne Haagensen, a data enthusiast who worked with Carlene. Their conversation reinforces that people skills, communication abilities and business savvy are all critical to success in data science and analytics.

本周的 “ Alter Everything”播客的一集将展示数据和分析顾问Carlene Jones以及与Carlene合作的数据爱好者Nynne Haagensen。 他们的对话进一步证明,人们的技能,沟通能力和业务头脑对于数据科学和分析的成功至关重要。

What are all those skills? To explore online conversations around this skill set, I decided to gather and analyze some data, naturally, inspired by this fantastic topic modeling trilogy (part 3 is coming soon!). This seemed like a fun opportunity to apply topic modeling with Alteryx Designer to what folks have discussed out there on the interwebz about the data science skill set. (Topic Modeling is part of the Alteryx Intelligence Suite, which includes some new text mining tools.)

这些技能是什么? 为了探索围绕该技能集的在线对话,我决定收集和分析一些数据,自然是受到这个奇妙的主题 建模三部曲的启发(第3部分即将推出!)。 这似乎是一个有趣的机会,可以将使用Alteryx Designer进行主题建模应用于人们在互联网上有关数据科学技能集的讨论。 ( 主题建模Alteryx Intelligence Suite的一部分 ,其中包括一些新的文本挖掘工具。)

收集意见 (Gathering Opinions)

I built a workflow in Designer that scraped 64 articles from the data science site KDnuggets tagged “skills” and cleaned up the text. I also used Text Pre-processing to quickly prep the remaining text before sending it into the Topic Modeling and Word Cloud tools. The word cloud below gives you a preview of some of the prominent ideas, but topic modeling lets us dig a little deeper.

我在Designer中构建了一个工作流,该工作流从数据科学网站KDnuggets标记了“技能”的64篇文章中抓取并清理了文本。 我还使用文本预处理来快速准备剩余的文本,然后再将其发送到主题建模词云工具中。 下面的“云”一词为您提供了一些重要思想的预览,但是主题建模使我们可以更深入地研究。

Word cloud of terms related to data science skills

I asked the Topic Modeling tool to identify three dominant topics in the text of these articles. You should definitely read all the details on how this process works, but in a nutshell: This is an unsupervised approach, meaning that I’m not specifying what I want the model to find in advance, but rather letting it identify on its own the key ideas in the text of the articles. This tool assumes that each chunk of text I feed it is a mixture of those three different topics, since I asked for three. It figures out how those topics are represented in each chunk based on the probability that certain words occur together. It doesn’t give a name to the topics it finds, though; it needs us to figure out what its groupings of words mean.

我要求主题建模工具在这些文章的文本中确定三个主要主题。 您绝对应该阅读有关此过程如何工作的所有详细信息 ,但总而言之 :这是一种无监督的方法,这意味着我并不是在指定我希望模型预先找到的内容,而是让它自己识别模型。文章正文中的关键思想。 该工具假设我输入的每个文本块都是这三个主题的混合体,因为我要了三个主题。 它根据某些单词一起出现的可能性,弄清楚了这些主题在每个块中是如何表示的。 但是,它没有为找到的主题起名字。 它需要我们弄清楚其词组的含义。

技术技能及更多 (Technical Skills and More)

Image for post

The topic model that results from this analysis is open to interpretation, but here’s what I see. Topic 1 looks to describe the role of the data analyst or data scientist within an organization, with some technical terms mentioned (Python, SQL, Hadoop). However, it also includes concepts like “value,” “market” and “demand” that could reflect the business expertise a skilled data professional brings to the organization. Some of the chunks of original text that scored highly for the presence of Topic 1 include:

通过这种分析得出的主题模型可以接受解释,但这就是我所看到的。 主题1旨在描述组织中数据分析师或数据科学家的角色,并提及一些技术术语(Python,SQL,Hadoop)。 但是,它也包含诸如“价值”,“市场”和“需求”之类的概念,这些概念可能反映出熟练的数据专业人员带给组织的业务专业知识。 因主题1的存在而获得高分的一些原始文本包括:

  • “… a data scientist doesn’t just possess technical skills, they also have domain expertise”

  • “Knowing the basic principles of data science and machine learning is still required, but knowing how to apply them to your problem is even more valuable”

  • “Remember, my goal wasn’t to invent a new machine learning algorithm; it was to demonstrate to a client the potential machine learning had or didn’t have for their business”

    “请记住,我的目标不是发明新的机器学习算法;而是 旨在向客户证明其业务可能具有或不具有潜在的机器学习能力。”
Image for post

Topic 2 has “learning” as its most relevant term and “machine” in second place, so a quick conclusion would be that Topic 2 reflects the prominence of machine learning skills for data science. However, a closer review suggests that maybe “learning” could also be interpreted in another way. Some of the chunks of text that scored highly for Topic 2 include:

主题2以“学习”为其最相关的术语,而“机器”则排在第二位,因此可以快速得出结论,主题2反映了数据科学中机器学习技能的突出地位。 但是,仔细研究表明,也许“学习”也可以用另一种方式来解释。 在主题2上得分很高的一些文本块包括:

  • “Apart from classroom learning, you can practice what you learned in the classroom by building an app, starting a blog, or exploring data analysis to enable you to learn more”

  • “Communication problems are harder than technical problems”

  • “If you’re stuck on a problem, sitting and staring at code may solve it or may not. Instead talk it out in language with a teammate”

    “如果您遇到问题,坐下来凝视代码可能会解决问题,也可能无法解决。 而是与队友用语言交流”

Some of the other terms included in this topic are “question,” “understand,” “team,” “approach” and “offer.” This topic seems to have a theme of ongoing learning and skill development for the data professional.

本主题中包含的其他一些术语是“问题”,“理解”,“团队”,“方法”和“报价”。 这个主题似乎是数据专业人员不断学习和发展技能的主题。

Image for post

Finally, Topic 3 looks like it represents the intersection of technical skills and problem-solving, with terms “problem,” “solve,” “think,” “model,” and “code” showing up as highly relevant. “Math” also appears here, as do “research” and “concept,” suggesting some of the more specific intellectual skills useful in the data fields.

最后,主题3似乎代表了技术技能与解决问题的交集,术语“问题”,“解决”,“思考”,“模型”和“代码”显示为高度相关。 “数学”也出现在这里,“研究”和“概念”也出现在这里,表明在数据领域有用的一些更具体的智力技能。

  • “Machine learning can seem magical. And in some cases it is. But in the cases it’s not, it’s important to acknowledge it.”

    “机器学习似乎很神奇。 在某些情况下是这样。 但是在某些情况下,必须承认这一点很重要。”
  • “There are too many data points for a human to make sense of it. It is a textbook case of death by information overload”

    “对于人类来说,有太多的数据点无法理解。 这是一本教科书,因信息超载而死亡”
  • “Communication skills” and “data visualization”

  • “Spend time thinking about the products of the company, how your job impacts the core of the business, and a few ideas of how you would do your job to solve an important problem”

  • “It’s perfectly fine if you’re overwhelmed by the skills needed (So am I!)”


分析的人文环境 (The Human Context for Analysis)

Yes, it is a lengthy list of skills indeed! This quick analysis suggests that in discussions of data science skills, there is a recurring emphasis not just on technical skills, but on the capabilities that put data analyses into human and business contexts. The best model or analysis doesn’t mean much without humans empowered to figure out the right problem-solving strategy, the questions to ask, the methods to use and the interpretation of their results.

是的,确实是一长串的技能! 这种快速分析表明,在讨论数据科学技能时,经常强调的不仅是技术技能,而且还强调将数据分析纳入人员和业务环境的能力。 没有人被授权找出正确的问题解决策略,提出的问题,使用的方法以及对结果的解释,最好的模型或分析并没有多大意义。

Learn more about how Carlene and Nynne view the skills needed for a data-driven company culture and professional success in this week’s Alter Everything episode.

在本周的“ Alter Everything”一集中,详细了解Carlene和Nynne如何看待数据驱动的公司文化和专业成功所需的技能。

Originally published on the Alteryx Community.

最初发表在 Alteryx社区

翻译自: https://towardsdatascience.com/sources-agree-data-science-skills-go-beyond-data-4cd9057960c4






