

A short guide on how to get started in the wonderful world of data science


Machine Learning, Artificial Intelligence, Neural Networks, Python, … If you’re somewhat interested in the world of data, you have undoubtedly come across some of the aforementioned terms. While some of these terms sound scarier than others, in the end, it just comes down to taking one step at a time and slowly but surely you’ll understand the bigger picture.

机器学习,人工智能,神经网络,Python等……如果您对数据世界有些兴趣,那么您无疑会遇到一些上述术语。 虽然其中一些术语听起来比其他术语更令人恐惧,但最终,它只能归结为一次并缓慢地迈出一步,但可以肯定的是,您会理解大局。

If you want to skip the wise words, scroll down to the section titled “The key fundamentals”.


弄清楚如何最好地学习(是的,自我意识也可以帮助数据科学) (Figure out how YOU learn best (yes, self-awareness can also help in data science))

Learning is a whole topic on its own and many study methods exist. However, in my opinion, learning is a personal thing. Hence, it is important to understand for yourself how you learn best. I, for example, learn best in a visual manner and I like to draw the bigger picture in my mind (and on paper). For me, videos therefore work pretty well. Others argue reading requires more brain activity and as a result, improves information retention. Medium has an entire channel devoted to articles that explore this topic. Either way, it will save you a lot of time if you figure out what works best for you.

学习本身就是一个完整的主题,并且存在许多学习方法。 但是,我认为学习是个人的事情。 因此,重要的是要自己了解如何最好地学习。 例如,我以视觉方式学习得最好,并且我喜欢在脑海中和纸上画出更大的图景。 对我来说,视频效果很好。 其他人则认为阅读需要更多的大脑活动,因此可以提高信息保留能力。 媒体有一个完整的渠道专门讨论该主题的文章。 无论哪种方式,如果您确定最适合您的方法,它将节省大量时间。

“Different things work for different people” — Renee Teate

“不同的事物为不同的人工作” –蕾妮·泰特(Renee Teate)

数据科学真的是什么? (简短说明) (What is data science really? (A brief description))

Okay, so there has been some discussion going on, on what data science exactly is and what it is not. There are many buzz words flying around and some of them aren’t even used consistently. It leaves people frowning…

好的,因此正在进行一些讨论,讨论什么是数据科学,什么不是。 有许多嗡嗡声在飞来飞去,其中一些甚至没有被一致使用。 人们皱着眉头…

Image for post
Ian NobleUnsplash上的 照片

In reality, data science is a field (I know, some call it an industry rather than a field 🤷‍♂️) that contains many sub-parts. Moreover, it is a field that expands at an incredible pace, thereby making the definition dynamic rather than static.

实际上,数据科学是一个包含许多子部分的领域(我知道,有人称它为行业而不是领域)。 此外,这是一个以惊人的速度扩展的领域,从而使定义成为动态而非静态。

But in general, one can state that data science focuses on making discoveries in data that result in actionable insights. Now, in order to get to these insights, tools such as machine learning models and statistical methods are typically used. Communication skills are just as much part of data science as are the technical skills. Therefore, data visualization is often also considered part of data science to effectively report and communicate on the technical details. Those actionable insights are either presented to humans (technically minded or not) or can be fed into another system. In the latter case, this often results in automated decisions which are then typically referred to as artificial intelligence. I can recommend this article by Dr. Hugo Bowne-Anderson if you want to dig deeper in the demystification of all these terms.

但总的来说,人们可以指出,数据科学专注于在数据中进行发现,从而产生可行的见解。 现在,为了获得这些见解,通常使用诸如机器学习模型和统计方法之类的工具。 沟通技能和技术技能一样,都是数据科学的一部分。 因此,数据可视化通常也被视为数据科学的一部分,以有效地报告和交流技术细节。 这些可行的见解要么呈现给人类(无论是技术上还是非技术上的),要么可以馈入另一个系统。 在后一种情况下,这通常会导致自动决策,然后通常将其称为人工智能。 如果您想深入研究所有这些术语的神秘之处,我可以推荐Hugo Bowne-Anderson博士的这篇文章

让我们开始吧,如何在这个信息丛林中起步? 🌳 (Let’s get to it, how to get started in this information jungle? 🌳)

Image for post
Eutah Mizushima Unsplash

First of all, you have to start somewhere (thanks captain obvious 🦸‍♂️). For real though, I’ve seen people getting paralyzed by the abundance of online courses that are available. In the end, they stick to the idea that they should first refresh their math skills (or stat skills, or programming skills, or any other skill that can serve as an excuse) before they can get into that particular course. In the next sections, I will describe the path I followed together with what did work and what didn’t work (for me).

首先,您必须从某个地方开始(感谢队长明显的🦸‍♂️)。 实际上,我已经看到人们对大量可用的在线课程感到瘫痪。 最后,他们坚持这样的想法:在进入特定课程之前,他们应该首先更新其数学技能(或统计技能,编程技能或任何其他可以作为借口的技能)。 在下一部分中,我将描述我所遵循的道路以及哪些行得通和哪些行不通(对我而言)。

关键基础 (The key fundamentals)

I know I just ranted about the basic skill-excuse, but you will need the basic math/stat/programming skills etc. However, don’t use it as an excuse not to start (these are two different things). Rather than approaching the journey as a modular process (i.e. you first refresh math for 4 weeks, than stats for 4 weeks, …), I went for a more adventurous approach. That is, I closed my eyes and jumped off the cliff and learnt to fly 🕊 while I was falling (maybe that metaphor is too far-fetched). The point is, you can not perfectly prepare for this journey as there will always be concepts that are taken for granted that you actually don’t remember or have never even heard of. As Renee Teate pointed out in a podcast with Dataframed: “it is perfectly okay to take a step back in the middle of the process when something is not clear”. That being said, there certainly will be moments in time where you have to reverse direction but do this when the time comes rather than trying to prepare perfectly beforehand. Oh, and don’t fall into the valley of despair

我知道我只是对基本的技能借口不屑一顾,但是您将需要基本的数学/统计/编程技能等。但是,请勿将其用作不开始的借口(这是两个不同的东西)。 我没有采用模块化的过程(即您首先刷新4周的数学,而不是4周的统计数据,……),而是选择了一种更具冒险精神的方法。 就是说,当我跌倒时,我闭上了眼睛,跳下悬崖,学会了飞翔(也许这个比喻太牵强了)。 关键是,您无法为这次旅行做好充分的准备,因为总会有一些您实际上不记得或从未听说过的概念被认为是理所当然的。 正如Renee Teate带有Dataframed的播客中指出的那样: “在尚不清楚的情况下,在过程的中间退一步是完全可以的” 。 话虽如此,您肯定会在某些时候改变方向,但要等到时机来做,而不是事先做好准备。 哦,不要陷入绝望山谷中……

So, some resources that helped me when I had to take a step back:


I think it’s always a good idea to have a solid reference point to rely on when necessary. Importantly, be straightforward with yourself when something is not clear, otherwise you might end up like Icarus (nerdy reference? Here it is if you don’t know 🤓). I used Springer’s Introduction to Statistical Learning (if you buy this book through the link, it will support this article) as a main resource for any basic concept from statistics that I wanted to refresh or learn from scratch when necessary. This is quite a well known book in the data science community (especially in the self-taught community) and is a good conversation starter with any data scientist (don’t take that last bit of advice for granted). If you want to get more in depth, its successor Elements of Statistical Learning is also an excellent book.

我认为在必要时拥有可靠的参考点总是一个好主意。 重要的是,在不清楚的地方保持直率,否则您可能会像伊卡洛斯(Icarus)(书呆子参考?这里就是如果您不认识🤓)那样结束。 我使用了Springer的《统计学习入门》 (如果您通过链接购买这本书,它将本文提供支持)作为统计的任何基本概念的主要资源,我想在必要时进行刷新或从头开始学习。 这是数据科学界(尤其是自学成才的社区)中相当著名的一本书,并且是与任何数据科学家交谈的良好起点(不要将最后的建议视为理所当然)。 如果您想进一步深入,它的后继《统计学习元素》也是一本很好的书。

For mathematics, I’m a huge fan of Grant Sanderson’s YouTube-channel 3Blue1Brown for most mathematical concepts used in data science. Btw, there exists an excellent podcast with Grant Sanderson, hosted by Lex Fridman (another one of my inspirations), if you want to take a break from reading this. If, however, there is something missing on this YT channel or you want further (in-depth) background information on a certain topic, Khan Academy did the trick for me most of the time. There hasn’t been one specific physical resource that I relied on for mathematics. Regardless, if you prefer some good ol’ paper books, I’ve heard that Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville is one of the go-to resources if you want to get down to the mathematical details. It gets even better, it’s available for free 💸.

对于数学,我是Grant Sanderson的YouTube频道3Blue1Brown的忠实拥护者,可使用数据科学中的大多数数学概念。 顺便说一句,如果您想休息一会儿,可以在Lex Fridman(我的另一个灵感)主持下与Grant Sanderson进行精彩的播客。 但是,如果此YT频道上缺少某些内容,或者您​​想进一步(深入)某个主题的背景信息,可汗学院通常会帮我做这招。 我没有数学上依赖的特定物理资源。 无论如何,如果您喜欢一些好的书面书籍,我听说Ian Goodfellow,Yoshua Bengio和Aaron Courville的《深度学习》是您想要深入了解数学细节的入门资源之一。 它变得更好,它是免费提供的。

For coding, in my opinion, the best resource to fall back on when something is not clear is either GitHub or Stack Overflow. I know there are many books that walk you through the basics (O’Reilly has a huge list of them) but what I like about the former is that it is on the computer and “always up to date” (most of the time, if not, someone most likely pointed out that it’s outdated). It often includes some interesting discussions that can give even deeper insight into what’s going on. In the end, you write code on your computer and typically not on a piece of paper (except maybe for some drafts or general logic) so it makes sense to use a virtual resource here. But preferences may differ, so feel free to explore O’Reilly’s Data Science books.

我认为,对于编码来说,最好的资源是GitHubStack Overflow ,这是当不清楚的时候可以依靠的。 我知道有很多书籍可以带您了解基础知识(奥赖利(O'Reilly)有很多书籍),但是我喜欢前者是因为它在计算机上并且“始终是最新的”(大多数情况下,如果不是,那么很可能有人指出它已经过时了。 它通常包括一些有趣的讨论,可以使您对正在发生的事情有更深入的了解。 最后,您在计算机上写代码,通常不在纸上写代码(可能有些草稿或通用逻辑除外),因此在这里使用虚拟资源很有意义。 但是偏好可能有所不同,因此可以随时探索O'Reilly的Data Science书籍

在线课程(The Good Stuff🤘) (Online courses (The Good Stuff 🤘))

As the heading might give away, I’m a big fan of online courses. They’ve been my main source of inspiration in the past years in data science and I’ve learnt so many things from listening to hours of videos. However, listening is only one part of the job. (Check out this “Scary Guy podcast”-episode with a friend of mine if you want to learn more on what it means to really listen).

标题可能会泄露,我非常喜欢在线课程。 在过去的几年中,它们一直是我在数据科学领域的主要灵感来源,而且从听数小时的视频中我学到了很多东西。 但是,聆听只是工作的一部分。 (如果您想了解更多有关真正聆听的含义的信息,请与我的一个朋友查看此“ Scary Guy播客”-片段)。

Here is some advice to get the most out of online video classes:


  • Listen actively, give it your full undivided attention 📵


  • Take notes/drawings/…, it increases brain activity* 🧠

  • Put a class on pause to recap or wind back if needed** ⏸

  • Self-explain what you heard at the end of the video 🦜

  • Put the knowledge into practice (and don’t wait too long) 🎬


*Taking notes doesn’t necessarily need to be fully structured . It’s just the fact that you note down what you hear, thereby increasing your active participation

*记笔记不一定需要完全结构化。 事实就是您记下了所听到的内容,从而增加了您的积极参与

**You’ll never get a chance in a real-life classes to put the class on pause and go back in time, so exploit that feature


And don’t forget, be disciplined and consistent with your online classes. If you wait too long before you move on or put things into practice, your hard-earned skills will slowly fade away (trust me, I’ve been there).

而且不要忘记,要遵守纪律并与在线课程保持一致。 如果您等待太久才继续前进或将其付诸实践,那么您来之不易的技能就会慢慢消失(相信我,我去过那里)。

我非常喜欢DataCamp的平台入门,这就是为什么 (I’m a big fan of DataCamp’s platform to get started, here’s why)

Before diving in, a brief section on ethics


The recent (and older) discussions about DataCamp as a company have been heated after a misconduct incident took place near the end of 2017 and the follow-up actions taken by DataCamp’s leadership. Given the rather high level of attention it has received again recently, it’s a perfect opportunity to include a section on ethics in data science as this is a very important topic for any aspiring data scientist. While DataCamp’s (late) actions following the incident certainly leave some questions, they have taken multiple steps (especially more recently) to try and make this right.

在2017年底发生不当行为事件以及DataCamp领导层采取的后续行动之后,关于DataCamp作为公司的最近(或更旧的)讨论变得激烈。 鉴于最近又受到了相当高的关注,这是一个包含数据科学伦理学部分的绝好机会,因为这对于任何有抱负的数据科学家来说都是非常重要的主题。 尽管事件发生后DataCamp的(后期)行动无疑留下了一些问题,但他们已经采取了多个步骤(尤其是最近的步骤)来尝试实现这一权利。

I believe it is the duty of companies(whether start-ups or well-established companies) and individuals to take a very active role when it comes down to ethical issues. Specifically in the field of data science — a field that is inherently sensitive to bias and ethics — companies should take the lead and should pull the narrative instead of being pushed by the narrative.

我认为,在涉及道德问题时,公司(无论是初创企业还是实力雄厚的公司)和个人都有责任发挥非常积极的作用。 特别是在数据科学领域-这是偏见和道德本质敏感领域-企业要带头,并应拉,而不是叙事推叙事。

Now, I don’t believe that an entire organization and platform should suffer from unfavorable actions taken in the past. Instead, I hope that DataCamp will continue to grow into a narrative-puller (next to their mission of democratizing data science). Therefore, I happily share the amazing platform (and amazing instructors) that DataCamp as an organization offers. Without their platform and instructors my journey to get started in data science would’ve been a lot harder without a single doubt.

现在,我不认为整个组织和平台都应该受到过去采取的不利措施的影响。 相反,我希望DataCamp能够继续发展成为叙事拉手(这是他们使数据科学民主化的任务的下一步)。 因此,我很高兴分享DataCamp作为组织提供的令人惊叹的平台(和令人惊叹的讲师)。 没有他们的平台和讲师,毫无疑问,我进入数据科学领域的旅程将会更加艰难。

Now, let’s dive in


DataCamp’s interactive classes (taught by amazing instructors) are a perfect starting point to explore what you like, and what you want to focus on. They have multiple skill tracks and even entire career tracks that are frequently updated to stay relevant. If that’s not enough, they have an app with easily digestible short exercises to keep your newly acquired knowledge fresh. What’s also great, they listen to their users’ feedback and implement it (e.g. skill assessments).

DataCamp的互动式课程(由出色的讲师讲授)是探索您喜欢的和想要关注的内容的理想起点。 他们有多个技能轨道,甚至整个职业轨道都经常更新以保持相关性。 如果这还不够的话,他们会提供一个易于消化的简短练习的应用程序,以使您新获得的知识保持新鲜。 很棒的一点是,他们会听取用户的反馈并加以实施(例如技能评估)。

If you subscribe to DataCamp through the above link, it will support this article.


The track that I like most, is their “Data Scientist with Python” career track. Although the “Machine Learning with Python” career track also looks pretty cool (there is quite some overlap in reality, however, in recent updates they’re differentiating both tracks more and more). What’s nice about DataCamp, is that they offer many more things than Python courses. There are also excellent sources to learn Git (which is an essential skill if you’d ever work as a Data Scientist and in a team-setting), Shell, R (Python or R you wonder? Read this), SQL, all the way up to Scala.

我最喜欢的是他们的“ Python数据科学家”职业生涯。 尽管“使用Python进行机器学习”的职业道路看起来也很酷(实际上有很多重叠之处,但是,在最近的更新中,他们越来越区分这两种道路)。 DataCamp的优点在于,它们提供的功能比Python课程还多。 还有很多学习Git的优秀资源(如果您曾经是数据科学家并且处于团队环境中,这是一项必不可少的技能),Shell,R(Python或R,您想知道吗?请阅读此文章),SQL,所有到达Scala。

Wait, there is more. DataCamp also offers guided projects. These projects allow you to put your skills into practice which also helps you to better remember what you’ve learnt.

等等,还有更多。 DataCamp还提供指导性项目。 这些项目使您可以将自己的技能付诸实践,这也可以帮助您更好地记住所学知识。

So, IMHO DataCamp is one of the few out there that offers a 360° experience. Many of the other sources offer some of the components that DataCamp has but then you have to search for e.g. your projects elsewhere (Kaggle is a good place to search for own projects).

因此,恕我直言,DataCamp是提供360°体验的少数几个产品之一 许多其他来源提供了DataCamp所具有的某些组件,但是您必须在其他地方搜索例如您的项目( Kaggle是搜索自己的项目的好地方)。

However, the one thing that has made me look into other online courses is that there is a limit to how deep the DataCamp lectures go in terms of theory (especially when you get to a more advanced stage). You can supplement the courses with the resources mentioned above, but I chose to supplement DataCamp with another platform. Diversification is not only useful in stock portfolios, it can actually be good to have different sources discussing the same subject because they often use different terminology, and emphasize other aspects. Below, I’ll dive into my other favorite online platform.

但是,让我研究其他在线课程的一件事是,DataCamp讲课在理论上的深入程度是有限的(特别是当您进入更高级的阶段时)。 您可以使用上面提到的资源来补充课程,但是我选择了使用另一个平台来补充DataCamp。 多元化不仅在股票投资组合中有用,而且让不同的来源讨论同一主题实际上是一件好事,因为他们经常使用不同的术语并强调其他方面。 在下面,我将深入探讨我最喜欢的其他在线平台。

Coursera与DataCamp相结合是完美的组合 (Coursera combined with DataCamp is a perfect combo)

I think you can perfectly get started with DataCamp alone, but after a while you might find yourself wanting to get deeper on certain topics. For me, I wanted to get a deep understanding of deep learning (that’s deep). For that, I enrolled in the Deep Learning Specialization from Coursera.

我认为您可以单独单独使用DataCamp,但是一段时间后,您可能会发现自己想深入了解某些主题。 对我来说,我想对深度学习有一个深刻的了解(即深度学习)。 为此,我参加了Coursera的深度学习专业课程。

Yes, there are many other platforms that offer similar specializations, so why this one?


For me, it all comes down to Andrew Ng, the teacher of this specialization. That guy is simply GREAT. He founded Google Brain, he founded Coursera, he’s an Adjunct Professor at Stanford, … if you want more, go check this out. On top of that, and most importantly, he is a great teacher. He manages to explain the most complicated concepts with ease and just makes you feel comfortable, even when things get fairly complicated. After each “week of class”, he provides interviews with certain thought leaders from the field which help you broaden your perspective on deep learning. There are also projects at almost every week’s end but they do not cover the variety of projects available at DataCamp. If you want further confirmation on why Coursera as a platform, check out the data-driven strategy of Coursera.

对我而言,这全都归功于该专业的老师Ng。 那家伙真是太好了。 他创立了谷歌大脑,他创办Coursera,他是一个兼职教授在斯坦福大学,...如果你想了解更多,去看看出。 最重要的是,他是一位了不起的老师。 他设法轻松地解释最复杂的概念,即使在事情变得相当复杂的情况下,也能让您感到自在。 每个“星期的课程”结束后,他都会与该领域的某些思想领袖进行访谈,以帮助您拓宽对深度学习的看法。 几乎每个星期结束时也会有一些项目,但是它们并不涵盖DataCamp可用的各种项目。 如果您想进一步确认为什么Coursera作为平台,请查看Coursera的数据驱动策略

So how do I use them in combo? I used DataCamp to get the basics right (Python programming, both imperative and OOP, working with — and understanding the most common packages and data structures, …) to the point where I wanted to go really in depth (although DataCamp is also increasing their offerings on that aspect). Then, I added Coursera to my weekly to-do list. Currently, I’m nearing the end of the deep learning specialization and I’m considering to take their latest specialization in NLP. However, I’m also still using DataCamp on a weekly basis for broadening my basic skills (e.g. Shell, Scala, SQL, …) and for improving my Python and R skills.

那么如何组合使用它们呢? 我使用DataCamp掌握了正确的基础知识(Python编程,命令式和OOP,与之合作,并了解了最常见的程序包和数据结构,…),直到我想更深入地学习(尽管DataCamp还在增加它们的知识)。产品)。 然后,我将Coursera添加到我的每周待办事项列表中。 目前,我即将接近深度学习专业化课程,并且正在考虑接受他们在NLP中最新专业化课程。 但是,我仍然每周仍然使用DataCamp来扩展我的基本技能(例如Shell,Scala,SQL等),并提高我的Python和R技能。

结论 (Conclusion)

Can you become proficient in data science overnight? No. Will one course make you a data science expert? Also no. But, do you need a Master’s degree in data science or a PhD to become proficient in data science? Here, the answer is also no. And that’s the beauty of today’s abundance of online learning opportunities. Find which method is the most fun for you, and then, just ENJOY THE JOURNEY!

您可以在一夜之间精通数据科学吗? 否。一门课程会让您成为数据科学专家吗? 也没有但是,您是否需要数据科学硕士学位或博士学位才​​能精通数据科学? 在这里,答案是否定的。 这就是当今大量在线学习机会的美妙之处。 找到最适合您的方法,然后,尽情享受旅程!

“Wisdom is not a product of schooling but of the lifelong attempt to acquire it.” — Albert Einstein

“智慧不是学校教育的产物,而是终身学习的产物。” - 艾尔伯特爱因斯坦

Before you go, check out the podcasts below…


特别优惠:播客(Extra special: podcasts)

I didn’t want to make the article too long, but I do get a lot of value of podcasts too. I think podcasts are a great way to stay up-to-date with the field of data science in general. Below some of my favorite podcast channels:

我不想让这篇文章太长,但是我确实从播客中获得了很多价值。 我认为播客是与一般数据科学领域保持同步的好方法。 在我最喜欢的一些播客频道下面:

That’s it, bye 🙋‍♂️! — Boje Deforce

就是这样,再见🙋️! — Boje Deforce

