研究生期间实验技能怎么学_我在研究生院没有学过的四个数据科学技能(以及如何学习它们!)...

研究生期间实验技能怎么学

by Rachael Tatman

瑞秋·塔特曼(Rachael Tatman)

我在研究生院没有学过的四个数据科学技能(以及如何学习它们!) (The four data science skills I didn’t learn in grad school (and how to learn them!))

Before I get to the meat of this post, I want to make one thing super clear: you do not need a graduate degree to be a data scientist. Unless you’re doing cutting-edge machine learning research (which, let’s be honest, doesn’t describe 99.9% of data scientists — including me!), a degree in how to do research just isn’t necessary. Anyone who tells you differently is trying to sell you something — probably a data science graduate degree.

之前,我到这个职位的肉,我想做成一件事情超清晰:你并不需要一个研究生学位是数据科学家。 除非您正在进行尖端的机器学习研究(老实说,它不描述99.9%的数据科学家(包括我!)),否则就不需要学位来进行研究。 告诉您不同的任何人都在尝试向您推销产品-可能是数据科学研究生学位。

That said, I did learn a lot of valuable skills in grad school. I learned how to deal with messy data, ask good questions, determine which statistical tool to use in a specific situation, write code for statistical computing and machine learning and, last but not least, clearly communicate technical concepts.

就是说,我确实在研究生院学习了很多宝贵的技能。 我学会了如何处理凌乱的数据,提出良好的问题,确定在特定情况下使用哪种统计工具,编写用于统计计算和机器学习的代码,以及最后但并非最不重要的一点,即清晰地传达了技术概念。

These are all skills that every data scientist needs. What they are not is the only skills a data scientist needs. Two of the roughest parts of the transition from grad school to industry for me were 1) identifying the skillsets I was missing and 2) figuring the best way for me to get up to speed on them.

这些都是每位数据科学家都需要的技能。 它们不是数据科学家需要的唯一技能。 对我来说,从研究生到工业的过渡中最困难的两个部分是:1)确定我所缺少的技能组; 2)寻找让我快速掌握这些技能的最佳方法。

Fortunately, if you’re in the same place I was, I’ve got you covered. Without further ado, here are four data science skills I didn’t learn in grad school, along with some practical tips on how you can learn them.

幸运的是,如果您和我在同一个地方,那么我已经覆盖了您。 事不宜迟,这里有我在研究生院没学过的四种数据科学技能,以及一些有关如何学习它们的实用技巧。

SQL (SQL)

I’ve found that most graduate students who are exploring data science as a career are already familiar with R or Python (or both!). On the other hand, far fewer folks in this position know SQL. And that can be problem when you’re ready to go on the data science job market: after Python and R, SQL is the third most widely-used tool in data science.

我发现,大多数将数据科学作为职业发展的研究生已经熟悉R或Python(或两者!)。 另一方面,很少有人知道SQL。 当您准备进入数据科学工作市场时,这可能是个问题:在Python和R之后, SQL是数据科学中使用最广泛的第三工具

SQL (usually pronounced like “sequel”) is a programming language specifically for interacting with databases. It’s fairly rare to see it used in an academic context, but it’s ubiquitous in the industry. Fortunately, the basics are relatively easy to learn and there are a lot of educational resources out there to help you get started.

SQL(通常发音为“ sequel”)是一种专门用于与数据库交互的编程语言。 很少在学术环境中使用它,但是它在业界无处不在。 幸运的是,基础知识相对容易学习,并且有很多教育资源可以帮助您入门。

How to learn SQL:

如何学习SQL:

  • Take a course. There are a lot of online options out there, including courses by Khan Academy, DataCamp, Stanford and Udemy. In person courses are a bit harder to find, but if you check a local university, community college or code camp you might get lucky.

    参加课程 。 那里有很多在线选项,包括可汗学院 ( Khan Academy)DataCamp斯坦福大学(Stanford)乌迪米(Udemy)的课程 。 面对面的课程比较难找,但是如果您考当地的大学,社区学院或代码训练营,您可能会很幸运。

  • Develop a SQL portfolio. Having examples of your ability to write queries on real databases is good evidence that you’re familiar with the language. One option is to write kernels (i.e. hosted R or Python notebooks) on BigQuery datasets on Kaggle. I’ve written a quick how-to to get you started. (Full disclosure: I work for Kaggle. :) HackerRank and SQLZOO also have quite a few SQL exercises.

    开发SQL产品组合 。 拥有在真实的数据库上编写查询的能力的示例可以很好地证明您熟悉该语言。 一种选择是在Kaggle的BigQuery数据集上编写内核(即托管的R或Python笔记本)。 我已经写了一个快速入门指南。 (完全公开:我为Kaggle工作。:) HackerRankSQLZOO也有很多SQL练习。

成为通才 (Being a Generalist)

Grad school is great! Your day-to-day work is expanding the borders of human knowledge, which is pretty rad. As you work through your degree, you really drill down into one specific topic, asking increasingly precise questions in a narrower and narrower domain. Eventually, you’re the most knowledgeable person on the planet about your little sub-sub-sub-niche. There’s nothing wrong with this: it’s just how scholarly inquiry works.

研究生院很棒! 您的日常工作正在扩大人类知识的疆界,这很不错。 在攻读学位期间,您确实会深入到一个特定主题,在越来越狭窄的领域中提出越来越精确的问题。 最终,您是地球上最了解小子-小子领域的人。 这没有错:这就是学术探究的工作方式。

It is not how data science works. Unless you’re very lucky and end up working on the precise thing you wrote your dissertation or thesis on, you’ll be expected to work on problems outside your field pretty much immediately. And not just things from outside your field: problems from fields you’ve never even heard of. You’re going to have to get used to working on things you’re not an expert on very quickly.

数据科学不是这样工作的。 除非您非常幸运,然后最终从事写论文或论文的精确工作,否则您将被迫立即从事领域外的问题研究。 不仅仅是您领域之外的事情:您从未听说过的领域的问题。 您将不得不习惯于很快地处理不是专家的事情。

Here are some ways to get better at being a generalist:

以下是一些变得更好的通才的方法:

  • Read outside your discipline. Academic disciplines tend to use a specialized set of statistical tools. In sociolinguistics, for example, we work a lot with mixed-effects regression — but there are a lot of other statistical approaches out there. Reading work in different disciplines will expose you to a wide range of different techniques and problems and help get you get comfortable jumping feet-first into a new topic.

    在学科外阅读 。 学术学科倾向于使用一组专门的统计工具。 例如,在社会语言学中,我们对混合效应回归进行了很多工作,但是还有很多其他统计方法。 阅读不同学科的作品将使您接触到各种各样的技术和问题,并帮助您轻松地踏上新话题。

  • Practice analyzing new types of data. Data scientists need to work with all sorts of data. You probably already have deep experience with one type of data, but consider branching out. Have you worked with time series? Text? Images? Video? Audio? Pre-trained models? Relational databases? Figure out what the gaps there are in your knowledge and try your hand at working with some new and different sources. (Obligatory plug: Kaggle has more than 10k public datasets from a huge variety of sources. You can also check out Zenodo or the Dataverse project.)

    练习分析新型数据 。 数据科学家需要处理各种数据。 您可能已经对一种类型的数据有丰富的经验,但可以考虑进行分支。 您是否已处理时间序列? 文本? 图片? 视频? 有声音吗 训练有素的模型? 关系数据库? 找出您所学知识中存在的差距,并尝试使用一些新的和不同的资源。 (强制性插件: Kaggle拥有来自各种来源的1万多个公共数据集 。您还可以查看ZenodoDataverse项目 。)

  • Talk about technical concepts with people outside your field. Not only will you learn a lot, you’ll also have a chance to practice explaining technical concepts to people who don’t share your specific academic background.

    与您所在领域以外的人讨论技术概念。 您不仅会学到很多东西,而且还将有机会练习向不具有您特定学术背景的人们介绍技术概念。

源/版本控制 (Source/Version Control)

This one is a little bit of a cheat for me: I actually did learn source control in grad school, thanks to a Software Carpentry workshop. It’s so, so, so valuable, though, and I know that a lot of my peers in grad school weren’t exposed to it.

这一个是我骗了一点点:其实我确实学到源控制在读研究生,得益于软件木工车间 。 它是如此,如此,如此有价值,而且我知道我在研究生院的许多同龄人都没有接触过它。

Source control, also called version control, is a way to manage making changes to a single centralized document or code base. The basic idea is that you do your work on a copy of whatever-you’re-working-on, and every so often you use that copy to update the original. It’s helpful for individual projects (it lets you roll back to that one version that actually worked and figure out what you broke) and pretty much mandatory for technical collaboration.

源代码控制(也称为版本控制)是一种管理对单个集中式文档或代码库进行更改的方法。 基本思想是,您要在所做的任何事情的副本上进行工作,并且经常使用该副本来更新原始副本。 这对于单个项目很有用(它使您可以回滚到实际工作的那个版本,并弄清您的失败之处),并且对于技术合作几乎是必需的。

How to learn to use version control:

如何学习使用版本控制:

  • Use version control on every single research project and paper from here on out. I’m 100% serious. My entire dissertation was version controlled and it saved my butt so many times I lost count.

    从现在开始,对每个研究项目和论文都使用版本控制 。 我是100%认真的人。 我的整个论文都是版本控制的,它挽救了我的屁股,使我失去了很多次。

  • Use GitHub for your personal projects (if you have any) or research you can share. This is optional, but helpful if you end up joining a team that uses GitHub. In addition, an active GitHub profile is one way to demonstrate your workflow to potential employers.

    将GitHub用于您的个人项目(如果有)或可以共享的研究。 这是可选的,但如果您最终加入使用GitHub的团队,则很有用。 此外,有效的GitHub个人资料是向潜在雇主展示您的工作流程的一种方法。

停在“足够好” (Stopping at “Good Enough”)

When you’re working in an academic setting, you really do need to make sure everything is a good as it can be. Your work is going to be closely evaluated by experts and, if it passes muster, it will be added to the scholarly literature permanently. When you’re working in an industry setting, on the other hand, it’s far better to have something useful now than something very polished eventually.

当您在学术环境中工作时,您确实需要确保一切都尽可能好。 您的作品将受到专家的严格评估,如果通过要求,它将被永久添加到学术文献中。 另一方面,当您在行业环境中工作时, 现在拥有一些有用的东西要比最终完成的东西要好得多。

One of the first new terms I learned working in an industry setting was MVP, or “Minimum Viable Product”. The idea is that you share something when it’s just good enough to satisfy some portion of the people that will interact with it. In a data science setting that means not answering every single question you could with the data, or having a model that’s less accurate than it could be with additional tuning. You may have time for deeper analysis or additional tuning later, but you should be ready to share projects the moment they get to “good enough”.

我在行业环境中学习的第一个新术语是MVP,即“最小可行产品”。 想法是,当某件事足够好时,您便会分享一些东西,以满足与之互动的部分人。 在数据科学环境中,这意味着不回答您可能会对数据提出的每个问题,或者意味着其模型的准确性不如进行其他调整时准确。 您可能有时间进行更深入的分析或稍后进行其他调整,但是当项目变得“足够好”时,您应该准备好共享它们。

How to improve on seeing what’s good enough:

如何在看到足够好的基础上进行改进:

  • Work on identifying “done for now”. The next time you work on a project, stop every so often, maybe before you wrap up every day, and think about whether you’ve already created something valuable (you probably have!). Take a minute to practice how you might describe what’s useful or interesting about what you’ve already done.

    努力确定“现在完成”。 下次您进行项目工作时,可能要经常停下来,也许是在每天结束之前,然后考虑是否已经创建了有价值的东西(您可能已经拥有了!)。 花一点时间练习如何描述已完成的工作的有用或有趣之处。

  • Consider sharing intermediate stages of your research. If you can, consider sharing the intermediate stages of your next research project, maybe in a blog or to a lab mate. It may not be ready for the limelight, but is this piece of your analysis novel? Did you learn something worth sharing during the data collection? What have you made that’s already good enough that someone else might find it valuable?

    考虑共享研究的中间阶段 。 如果可以的话,可以考虑共享下一个研究项目的中间阶段,也许是在博客中或在实验室中。 也许还没有准备好成为众人瞩目的焦点,但是您的分析的这一部分是否新颖? 在数据收集过程中,您是否学到了值得分享的东西? 您做了什么已经足以使别人发现它有价值的东西?

And there you have it, four key skills that I use more-or-less every day that grad school didn’t teach me. Other data folks: feel free to chime in with necessary skills you picked up after you were finished with your degree!

有了它,我在研究生院没有教我的时候每天都会或多或少地使用四个关键技能。 其他数据专家:学位完成后,可以随意学习必要的技能!

翻译自: https://www.freecodecamp.org/news/the-four-data-science-skills-i-didnt-learn-in-grad-school-and-how-to-learn-them-f2b039fc0f59/

研究生期间实验技能怎么学

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值