

After over 80+ hours of watching course videos, doing quizzes and assignments, reading reviews on various aggregators and forums, I’ve narrowed down the best data science courses available to the list below.



The best data science courses:


  1. Data Science Specialization — JHU @ Coursera

    数据科学专业 — JHU @ Coursera

  2. Introduction to Data Science — Metis

    数据科学概论 — Metis

  3. Applied Data Science with Python Specialization — UMich @ Coursera

    具有Python专长的应用数据科学 — UMich @ Coursera

  4. Dataquest


  5. Statistics and Data Science MicroMasters — MIT @ edX

    统计学和数据科学微大师 — MIT @ edX

  6. CS109 Data Science — Harvard

    CS109数据科学 —哈佛

  7. Python for Data Science and Machine Learning Bootcamp — Udemy

    适用于数据科学和机器学习训练营的Python — Udemy

标准 (Criteria)

The selections here are geared more towards individuals getting started in data science, so I’ve filtered courses based on the following criteria:


  • The course goes over the entire data science process

  • The course uses popular open-source programming tools and libraries

  • The instructors cover the basic, most popular machine learning algorithms

  • The course has a good combination of theory and application

  • The course needs to either be on-demand or available every month or so

  • There’s hands-on assignments and projects

  • The instructors are engaging and personable

  • The course has excellent ratings — generally, greater than or equal to 4.5/5

    该课程具有出色的评分-通常大于或等于4.5 / 5

There’s a lot more data science courses than when I first started this page four years ago, and so there needs to now be a substantial filter to determine which courses are the best. I hope you feel confident that the courses below are truly worth your time and effort, because it will take several months (or more) of learning and practice to be a data science practitioner.

还有很多更多的数据科学课程比当我第一次在四年前开始了这个页面,所以需要现在是一个实质性的过滤器来确定哪些课程是最好的。 我希望您有信心,以下课程确实值得您花费时间和精力,因为要成为一名数据科学从业者,需要花费几个月(或更多)的学习和实践。

In addition to the top general data science course picks, I have included a separate section for more specific data science interests, like Deep Learning, SQL, and other relevant topics. These are courses with a more specialized approach, and don’t cover the whole data science process, but they are still the top choices for that topic. These extra picks are good for supplementing before, after, and during the main courses.

除了顶级的通用数据科学课程精选之外,我还包括一个单独的部分,针对更具体的数据科学兴趣,例如深度学习,SQL和其他相关主题。 这些是采用更专业方法的课程,虽然不涉及整个数据科学过程,但它们仍然是该主题的最佳选择。 这些额外的选择非常适合在主课程之前,之后和期间进行补充。

学习时应使用的资源 (Resources you should use when learning)

When learning data science online it’s important to not only get an intuitive understanding of what you’re actually doing, but also to get sufficient practice using data science on unique problems.


In addition to the courses listed below, I would suggest reading two books:


  1. Introduction to Statistical Learning — available for Free — one of the most widely recommended books for beginners in data science. Explains the fundamentals of machine learning and how everything works behind the scenes

    《统计学习入门》 ( 免费提供)是面向数据科学初学者的最受推荐的书籍之一。 解释机器学习的基础知识以及一切在幕后的工作方式

  2. Applied Predictive Modeling — a breakdown of the entire modeling process on real-world datasets with incredibly useful tips each step of the way

    Applied Predictive Modeling —在真实数据集上对整个建模过程进行细分,并在每个步骤中都提供了非常有用的提示

These two textbooks are incredibly valuable and provide a much better foundation than just taking courses alone. The first book is incredibly effective at teaching the intuition behind much of the data science process, and if you are able to understand almost everything in there, then you’re more well off than most entry-level data scientists.

这两本教科书非常有价值,并且比单独上课提供了更好的基础。 第一本书在讲授许多数据科学过程背后的直觉方面非常有效,如果您能够理解其中的几乎所有内容,那么您比大多数入门级数据科学家的境况要好得多。


Use Video Speed Controller for Chrome to speed up any video. I usually choose between 1.5x — 2.5x speed depending on the content, and use the “s” (slow down) and “d” (speed up) key shortcuts that come with the extension.

使用适用于Chrome的Video Speed Controller加速任何视频。 我通常根据内容在1.5倍至2.5倍速度之间进行选择,并使用扩展程序随附的“ s”(减速)和“ d”(加速)快捷键。

Now to an overview and review of each course.


1. 数据科学专业 — JHU @ Coursera (1. Data Science Specialization — JHU @ Coursera)

This course series is one of the most enrolled in and highly rated course collections in this list. JHU did an incredible job with the balance of breadth and depth in the curriculum. One thing that’s included in this series that’s usually missing from many of data science courses is a complete section on statistics, which is the backbone to data science.

该课程系列是此列表中注册人数最多和评分最高的课程集之一。 JHU在课程的广度和深度之间取得了令人难以置信的成就。 本系列中包括的一件事是许多数据科学课程通常缺少的一章是统计学的完整部分,这是数据科学的基础。

Overall, the Data Science specialization is an ideal mix of theory and application using the R programming language. As far as prerequisites go, you should have some programming experience (doesn’t have to be R) and you have a good understanding of Algebra. Previous knowledge of Linear Algebra and/or Calculus isn’t necessary, but it is helpful.

总体而言,数据科学专业是使用R编程语言的理论和应用程序的理想组合。 就前提条件而言,您应该具有一定的编程经验(不必一定是R),并且对代数有很好的理解。 不需要线性代数和/或微积分的先前知识,但这很有帮助。

Price — Free or $49/month for certificate and graded materialsProvider — Johns Hopkins University

价格 —证书和分级材料免费或每月49美元 提供商 —约翰·霍普金斯大学



  1. The Data Scientist’s Toolbox

  2. R Programming

  3. Getting and Cleaning Data

  4. Exploratory Data Analysis

  5. Reproducible Research

  6. Statistical Inference

  7. Regression Models

  8. Practical Machine Learning

  9. Developing Data Products

  10. Data Science Capstone


If you’re rusty with statistics and/or want to learn more R first, check out the Statistics with R Specialization as well.


2. 数据科学概论 — Metis (2. Introduction to Data Science — Metis)

An extremely highly rated course — 4.9/5 on SwichUp and 4.8/5 on CourseReport — which is taught live by a data scientist from a top company. This is a six week long data science course that covers everything in the entire data science process, and it’s the only live online course in this list. Furthermore, not only will you get a certificate upon completion, but since this course also accredited, you’ll also receive continuing education units.

评分最高的课程-SwichUp上的4.9 / 5和CourseReport上的4.8 / 5-由顶尖公司的数据科学家现场授课 。 这是一门为期六周的数据科学课程,涵盖了整个数据科学过程的所有内容,并且是此列表中唯一的在线实时课程。 此外,您不仅会在结业时获得证书,而且由于该课程也获得了认证,因此您还将获得继续教育单元。

Two nights per week, you’ll join the instructor with other students to learn data science as if it was an online college course. Not only are you able to ask questions, but the instructor also spends extra time for office hours to further help those students that might be struggling.

每周两晚,您将与其他学生一起加入讲师的行列,以学习数据科学,就好像这是一门在线大学课程一样。 您不仅可以提出问题,而且讲师还花费额外的时间在办公室上班,以进一步帮助那些可能会遇到困难的学生。

Price — $750

价格 -750美元

The curriculum:


  1. Computer Science, Statistics, Linear Algebra Short Course

  2. Exploratory Data Analysis and Visualization

  3. Data Modeling: Supervised/Unsupervised Learning and Model Evaluation

  4. Data Modeling: Feature Selection, Engineering, and Data Pipelines

  5. Data Modeling: Advanced Supervised/Unsupervised Learning

  6. Data Modeling: Advanced Model Evaluation and Data Pipelines | Presentations

    数据建模:高级模型评估和数据管道 简报

For prerequisites, you’ll need to know Python, some linear algebra, and some basic statistics. If you need to work on any of these areas, Metis also has Beginner Python and Math for Data Science, a separate live online course just for learning the Python, Stats, Probability, Linear Algebra, and Calculus for data science.

对于先决条件,您需要了解Python,一些线性代数和一些基本统计信息。 如果您需要在这些领域中的任何一个上工作,Metis还将提供数据科学的Python和数学初学者 ,这是一个单独的实时在线课程,仅用于学习数据科学的Python,统计,概率,线性代数和微积分。

3. 具有Python专长的应用数据科学 — UMich @ Coursera (3. Applied Data Science with Python Specialization — UMich @ Coursera)

University of Michigan, who also launched an online data science Master’s degree, produce this fantastic specialization focused the applied side of data science. This means you’ll get a strong introduction to commonly used data science Python libraries, like matplotlib, pandas, nltk, scikit-learn, and networkx, and learn how to use them on real data.

密歇根大学还开设了在线数据科学硕士学位 ,因此产生了这个出色的专业,专注于数据科学的应用领域。 这意味着您将对常用的数据科学Python库(例如matplotlib,pandas,nltk,scikit-learn和networkx)进行有力的介绍,并学习如何在真实数据上使用它们。

This series doesn’t include the statistics needed for data science or the derivations of various machine learning algorithms, but does provide a comprehensive breakdown of how to use and evaluate those algorithms in Python. Because of this, I think this would be more appropriate for someone that already knows R and/or is learning the statistical concepts elsewhere.

本系列文章不包括数据科学所需的统计数据或各种机器学习算法的派生信息,但确实提供了如何在Python中使用和评估这些算法的全面细分。 因此,我认为这对于已经知道R和/或正在其他地方学习统计概念的人来说更合适。

If you’re rusty with statistics, consider the Statistics with Python Specialization first. You’ll learn many of the most important statistical skills needed for data science.

如果您对统计不满意,请首先考虑使用Python专业化进行统计 。 您将学到数据科学所需的许多最重要的统计技能。

Price — Free or $49/month for certificate and graded materialsProvider — University of Michigan

价格 —证书和分级材料免费或每月49美元 提供商 —密歇根大学



  1. Introduction to Data Science in Python

  2. Applied Plotting, Charting & Data Representation in Python

  3. Applied Machine Learning in Python

  4. Applied Text Mining in Python

  5. Applied Social Network Analysis in Python


To take these courses, you’ll need to know some Python or programming in general, and there are actually a couple of great lectures in the first course dealing with some of the more advanced Python features you’ll need to process data effectively.


4. 数据查询 (4. Dataquest)

Dataquest is a fantastic resource on its own, but even if you take other courses on this list, Dataquest serves as a superb complement to your online learning.


Dataquest foregoes video lessons and instead teaches through an interactive textbook of sorts. Every topic in the data science track is accompanied by several in-browser, interactive coding steps that guide you through applying the exact topic you’re learning.

Dataquest放弃了视频课程,而是通过各种交互式教科书进行教学。 数据科学领域的每个主题都伴随着几个浏览器内交互式编码步骤,这些步骤将指导您应用正在学习的确切主题。

Video-based learning is more “passive” — it’s very easy to think you understand a concept after watching a 2-hour long video, only to freeze up when you actually have to put what you’ve learned in action. — Dataquest FAQ

基于视频的学习更加“被动”-观看了长达2个小时的视频后,您很容易以为您理解了一个概念,但是当您实际上必须将所学的东西付诸实践时,这种想法才冻结。 — Dataquest常见问题解答

To me, Dataquest stands out from the rest of the interactive platforms because the curriculum is very well organized, you get to learn by working on full-fledged data science projects, and there’s a super active and helpful Slack community where you can ask questions.


The platform has one main data science learning curriculum for Python:


Data Scientist In Python PathThis track currently contains 31 courses, which cover everything from the very basics of Python, to Statistics, to the math for Machine Learning, to Deep Learning, and more. The curriculum is constantly being improved and updated for a better learning experience.

Python路径中的数据科学家该课程目前包含31门课程,涵盖了从Python的基础知识到统计学,再到用于机器学习的数学,再到深度学习等等的所有内容。 课程不断改进和更新,以获得更好的学习体验。

Price — 1/3 of content is Free, $29/month for Basic, $49/month for Premium

价格 —内容的1/3是免费的,基本版为$ 29 /月,高级版为$ 49 /月

Here’s a condensed version of the curriculum:


  1. Python — Basic to Advanced

    Python —从基础到高级
  2. Python data science libraries — Pandas, NumPy, Matplotlib, and more

  3. Visualization and Storytelling

  4. Effective data cleaning and exploratory data analysis

  5. Command line and Git for data science

  6. SQL — Basic to Advanced

    SQL —从基础到高级
  7. APIs and Web Scraping

  8. Probability and Statistics — Basic to Intermediate

  9. Math for Machine Learning — Linear Algebra and Calculus

  10. Machine Learning with Python — Regression, K-Means, Decision Trees, Deep Learning and more

  11. Natural Language Processing

  12. Spark and Map-Reduce


Additionally, there’s also entire data science projects scattered throughout the curriculum. Each project’s goal is to get you to apply everything you’ve learned up to that point and to get you familiar with what it’s like to work on an end-to-end data science strategy.

此外,整个课程中还分布着整个数据科学项目。 每个项目的目标都是使您能够应用所学到的知识,并使您熟悉端到端数据科学策略的工作。

Lastly, if you’re more interested in learning data science with R, then definitely check out Dataquest’s new Data Analyst in R path. The Dataquest subscription gives you access to all paths on their platform, so you can learn R or Python (or both!).

最后,如果您对使用R学习数据科学更感兴趣,那么一定要看看Dataquest 在R路径上的新Data Analyst 。 通过Dataquest订阅,您可以访问其平台上的所有路径,因此您可以学习R或Python(或两者!)。

5. 统计学和数据科学微大师-MIT @ edX (5. Statistics and Data Science MicroMasters — MIT @ edX)

MicroMasters from edX are advanced, graduate-level courses that carry real credits you can apply to a select number of graduate degrees. The inclusion of probability and statistics courses makes this series from MIT a very well-rounded curriculum for being able to understand data intuitively.

edX的MicroMasters是高级的,研究生水平的课程,具有真正的学分,您可以申请一定数量的研究生学位。 包括概率和统计课程,使MIT的本系列课程非常全面,能够直观地理解数据。

Due to its advanced nature, you should have experience with single and multivariate calculus, as well as Python programming. There isn’t any introduction to Python or R like in some of the other courses in this list, so before starting the ML portion, they recommend taking Introduction to Computer Science and Programming Using Python to get familiar with Python.

由于其先进的性质,您应该具有单变量和多维演算以及Python编程的经验。 与该列表中的其他一些课程一样,没有关于Python或R的介绍,因此,在开始ML部分之前,他们建议您使用计算机科学入门和使用Python编程来熟悉Python。

Price — Free or $1,350 for credential and graded materialsProvider — University of Michigan

价格 —凭证材料和分级材料免费或1,350美元 提供商 —密歇根大学



  1. Probability — The Science of Uncertainty and Data

  2. Data Analysis in Social Science — Assessing Your Knowledge

  3. Fundamentals of Statistics

  4. Machine Learning with Python: from Linear Models to Deep Learning

  5. Capstone Exam in Statistics and Data Science


The ML course has several interesting projects you’ll work on, and at the end of the whole series you’ll focus on one exam to wrap everything up.


6. CS109数据科学 —哈佛 (6. CS109 Data Science — Harvard)

With a great mix of theory and application, this course from Harvard is one of the best for getting started as a beginner. It’s not on an interactive platform, like Coursera or edX, and doesn’t offer any sort of certification, but it’s definitely worth your time and it’s totally free.

哈佛大学的这门课程融合了理论和应用知识,是初学者入门的最佳课程之一。 它不在像Coursera或edX这样的交互式平台上,并且不提供任何形式的认证,但是绝对值得您花时间,而且它是完全免费的。



  • Web Scraping, Regular Expressions, Data Reshaping, Data Cleanup, Pandas

  • Exploratory Data Analysis

  • Pandas, SQL and the Grammar of Data

  • Statistical Models

  • Storytelling and Effective Communication

  • Bias and Regression

  • Classification, kNN, Cross Validation, Dimensionality Reduction, PCA, MDS

  • SVM, Evaluation, Decision Trees and Random Forests, Ensemble Methods, Best Practices

  • Recommendations, MapReduce, Spark

  • Bayes Theorem, Bayesian Methods, Text Data

  • Clustering

  • Effective Presentations

  • Experimental Design

  • Deep Networks

  • Building Data Science


Python is used in this course, and there’s many lectures going through the intricacies of the various data science libraries to work through real-world, interesting problems. This is one of the only data science courses around that actually touches on every part of the data science process.

本课程使用Python ,并且有许多讲座涉及各种数据科学库的复杂性,以解决现实中有趣的问题。 这是实际上涉及到数据科学过程的每个部分的仅有的数据科学课程之一。

7. Python for Data Science and Machine Learning Bootcamp — Udemy (7. Python for Data Science and Machine Learning Bootcamp — Udemy)

Also available using R.


A very reasonably priced course for the value. The instructor does an outstanding job explaining the Python, visualization, and statistical learning concepts needed for all data science projects. A huge benefit to this course over other Udemy courses are the assignments. Throughout the course you’ll break away and work on Jupyter notebook workbooks to solidify your understanding, then the instructor follows up with a solutions video to thoroughly explain each part.

价格合理的课程。 讲师出色地解释了所有数据科学项目所需的Python,可视化和统计学习概念。 与其他Udemy课程相比,这项作业对本课程有很大的好处。 在整个课程中,您将分手学习Jupyter笔记本工作簿以巩固您的理解,然后讲师将提供一个解决方案视频,以对每个部分进行全面说明。



  • Python Crash Course

  • Python for Data Analysis — Numpy, Pandas

    用于数据分析的Python — N​​umpy,Pandas
  • Python for Data Visualization — Matplotlib, Seaborn, Plotly, Cufflinks, Geographic plotting

  • Data Capstone Project

  • Machine learning — Regression, kNN, Trees and Forests, SVM, K-Means, PCA

  • Recommender Systems

  • Natural Language Processing

  • Big Data and Spark

  • Neural Nets and Deep Learning


This course focuses more on the applied side, and one thing missing is a section on statistics. If you plan on taking this course it would be a good idea to pair it with a separate statistics and probability course as well.

本课程侧重于应用方面,而缺少的一件事是统计部分。 如果您计划参加此课程,最好将其与单独的统计和概率课程配对。

An honorary mention goes out to another Udemy course: Data Science A-Z. I do like Data Science A-Z quite a bit due to its complete coverage, but since it uses other tools outside of the Python/R ecosystem, I don’t think it fits the criteria as well as Python for Data Science and Machine Learning Bootcamp.

荣誉奖提到了另一个Udemy课程: Data Science AZ 。 我非常喜欢Data Science AZ,这是因为它具有完整的覆盖范围,但是由于它使用了Python / R生态系统之外的其他工具,因此我认为它不适合使用Python和Data Science和Machine Learning Bootcamp的标准

其他针对特定技能的顶级数据科学课程 (Other top data science courses for specific skills)

Deep Learning Specialization — CourseraCreated by Andrew Ng, maker of the famous Stanford Machine Learning course, this is one of the highest rated data science courses on the internet. This course series is for those interested in understanding and working with neural networks in Python.

深度学习专业课程 — Coursera由著名的斯坦福机器学习课程的制造商Andrew Ng创建,这是互联网上评分最高的数据科学课程之一。 本课程系列适合那些对理解和使用Python中的神经网络感兴趣的人。

SQL for Data Science — CourseraPair this with Mode Analytics SQL Tutorial for a very well-rounded introduction to SQL, an important and necessary skill for data science.

SQL for Data Science — Coursera将其与Mode Analytics SQL教程结合使用 ,以全面全面地介绍SQL,这是Data Science的一项重要且必要的技能。

Mathematics for Machine Learning — CourseraThis is one of the most highly rated courses dedicated to the specific mathematics used in ML. Take this course if you’re uncomfortable with the linear algebra and calculus required for machine learning, and you’ll save some time over other, more generic math courses.

机器学习数学 -Coursera这是专门针对ML中使用的特定数学的评价最高的课程之一。 如果您对机器学习所需的线性代数和微积分不满意,可以参加本课程,并且与其他更通用的数学课程相比,可以节省一些时间。

How to Win a Data Science Competition — CourseraOne of the courses in the Advanced Machine Learning Specialization. Even if you’re not looking to participate in data science competitions, this is still an excellent course for bringing together everything you’ve learned up to this point. This is more of an advanced course that teaches you the intuition behind why you should pick certain ML algorithms, and even goes over many of the algorithms that have been winning competitions lately.

如何赢得数据科学竞赛 -Coursera 高级机器学习专业中的课程之一。 即使您不想参加数据科学竞赛,这仍然是一门极好的课程,可以将您到目前为止所学到的所有知识融合在一起。 这更多是一门高级课程,它教给您为什么您应该选择某些ML算法的直觉,甚至超越了最近赢得比赛的许多算法。

Bayesian Statistics: From Concept to Data Analysis — CourseraBayesian, as opposed to Frequentist, statistics is an important subject to learn for data science. Many of us learned Frequentist statistics in college without even knowing it, and this course does a great job comparing and contrasting the two to make it easier to understand the Bayesian approach to data analysis.

贝叶斯统计:从概念到数据分析 -Coursera Bayesian与Frequentist相反,统计学是数据科学学习的重要主题。 我们中的许多人甚至在大学时就不了解频数统计,因此本课程在比较和对比这两者方面做得很好,以使人们更容易理解贝叶斯数据分析方法。

Spark and Python for Big Data with PySpark — UdemyFrom the same instructor as the Python for Data Science and Machine Learning Bootcamp in the list above, this course teaches you how to leverage Spark and Python to perform data analysis and machine learning on an AWS cluster. The instructor makes this course really fun and engaging by giving you mock consulting projects to work on, then going through a complete walkthrough of the solution.

使用PySpark针对大数据的Spark和Python — Udemy 与上面 列表中的Python for Data Science and Machine Learning Bootcamp的指导老师相同,本课程教您如何利用Spark和Python在AWS集群上执行数据分析和机器学习。 教员通过给您提供模拟咨询项目进行工作,然后逐步介绍该解决方案,使该课程真正有趣并引人入胜。

学习指南 (Learning Guide)

如何实际学习数据科学 (How to actually learn data science)

When joining any of these courses you should make the same commitment to learning as you would towards a college course. One goal for learning data science online is to maximize mental discomfort. It’s easy to get caught in the habit of signing in to watch a few videos and feel like you’re learning, but you’re not really learning much unless it hurts your brain.

参加任何这些课程时,您都应该像学习大学课程那样对学习做出同样的承诺。 在线学习数据科学的目标之一是最大程度地提高心理不适感。 容易养成登录一些视频并觉得自己正在学习的习惯,但是除非它会伤害您的大脑,否则您并没有真正学到很多东西。

Vik Paruchuri (from Dataquest) produced this helpful video on how to approach learning data science effectively:

Vik Paruchuri(来自Dataquest )制作了这个有用的视频,介绍了如何有效地学习数据科学:

Essentially, it comes down to doing what you’re learning, i.e. when you take a course and learn a skill, apply it to a real project immediately. Working through real-world projects that you are genuinely interested in helps solidify your understanding and provides you with proof that you know what you’re doing.

从本质上讲,这取决于您正在学习的内容 ,即,当您学习课程并学习技能时,请立即将其应用于实际项目中 。 通过您真正感兴趣的真实项目进行工作有助于巩固您的理解,并为您提供证明您知道自己在做什么的证据。

One of the most uncomfortable things about learning data science online is that you never really know when you’ve learned enough. Unlike in a formal school environment, when learning online you don’t have many good barometers for success, like passing or failing tests or entire courses. Projects help remediate this by first showing you what you don’t know, and then serving as a record of knowledge when it’s done.

在线学习数据科学最令人不舒服的事情之一是,当您学到足够的知识时,您永远不会真正知道。 与正规学校环境不同,在网上学习时,您没有很多成功的晴雨表,例如通过或未通过测试或整个课程。 项目通过首先向您展示您不知道的内容,然后在完成后作为知识的记录来帮助纠正此问题。

All in all, the project should be the main focus, and courses and books should supplement that.


When I first started learning data science and machine learning, I began (as a lot do) by trying to predict stocks. I found courses, books, and papers that taught the things I wanted to know, and then I applied them to my project as I was learning. I learned so much in a such short period of time that it seems like an improbable feat if laid out as a curriculum.

当我刚开始学习数据科学和机器学习时,我(很多时候)都开始尝试预测库存。 我找到了课程,书籍和论文来教授我想知道的东西,然后在学习的时候将它们应用到我的项目中。 在这么短的时间内,我学到了很多东西,以课程的形式讲,这似乎是一件不可能的事。

It turned out to be extremely powerful working on something I was passionate about. It was easy to work hard and learn nonstop because predicting the market was something I really wanted to accomplish.

事实证明,在我热衷的事情上工作非常强大。 努力工作和不断学习非常容易,因为预测市场是我真正想要实现的目标。

基本知识和技能 (Essential knowledge and skills)

There’s a base skill set and level of knowledge that all data scientists must possess, regardless of what industry they’re in. For hard skills, you not only need to be proficient with the mathematics of data science, but you also need the skills and intuition to understand data.


The Mathematics you should be comfortable with:


  • Algebra

  • Statistics (Frequentist and Bayesian)

  • Probability

  • Linear Algebra

  • Basic calculus

  • Optimization


Furthermore, these are the basic programming skills you should be comfortable with:


  • Python or R,

  • SQL

  • Extracting data from various sources, like SQL databases, JSON, CSV, XML, and text files

  • Cleaning and transforming unstructured, messy data

  • Effective Data visualization

  • Machine learning — Regression, Clustering, kNN, SVM, Trees and Forests, Ensembles, Naive Bayes


Lastly, it’s not all about the hard skills; there’s also many soft skills that are extremely important and many of them aren’t taught in courses. These are:

最后,这不仅仅与硬技能有关。 还有许多非常重要的软技能,其中很多是没有在课程中教授的。 这些是:

  • Curiosity and creativity

  • Communication skills — speaking and presenting in front of groups, and being able to explain complex topics to non-technical team members

  • Problem solving — coming up with analytical solutions for business problems


Python与R (Python vs. R)

After going through the list you might have noticed that each course is dedicated to one language: Python or R. So which one should you learn?


Short answer: just learn Python, or learn both.


Python is an incredibly versatile language, and it has a huge amount of support in data science, machine learning, and statistics. Not only that, but you can also do things like build web apps, automate tasks, scrape the web, create GUIs, build a blockchain, and create games.

Python是一种非常多才多艺的语言,并且在数据科学,机器学习和统计方面具有大量支持。 不仅如此,您还可以执行以下操作:构建Web应用程序,自动化任务, 刮擦网络 ,创建GUI,构建区块链和创建游戏。

Because Python can do so many things, I think it should be the language you choose. Ultimately, it doesn’t matter that much which language you choose for data science since you’ll find many jobs looking for either. So why not pick the language that can do almost anything?

因为Python可以做很多事情,所以我认为它应该是您选择的语言。 最终,对于数据科学选择哪种语言都没关系因为您会找到很多寻找其中一种的工作。 那么,为什么不选择几乎可以做任何事情的语言呢?

In the long run, though, I think learning R is also very useful since many statistics/ML textbooks use R for examples and exercises. In fact, both books I mentioned at the beginning use R, and unless someone translates everything to Python and posts it to Github, you won’t get the full benefit of the book. Once you learn Python, you’ll be able to learn R pretty easily.

但是,从长远来看,我认为学习R也是非常有用的,因为许多统计/ ML教科书都将R用作示例和练习。 实际上,我一开始提到的两本书都使用R,除非有人将所有内容翻译成Python并将其发布到Github,否则您将无法获得本书的全部好处。 学习Python之后,您将可以轻松学习R。

Check out this StackExchange answer for a great breakdown of how the two languages differ in machine learning.

请查看此StackExchange答案 ,以详细了解两种语言在机器学习中的不同之处。

证书值得吗? (Are certificates worth it?)

One big difference between Udemy and other platforms, like edX, Coursera, and Metis, is that the latter offer certificates upon completion and are usually taught by instructors from universities.


Some certificates, like those from edX and Metis, even carry continuing education credits. Other than that, many of the real benefits, like accessing graded homework and tests, are only accessible if you upgrade. If you need to stay motivated to complete the entire course, committing to a certificate also puts money on the line so you’ll be less likely to quit. I think there’s definitely personal value in certificates, but, unfortunately, not many employers value them that much.

有些证书,例如edX和Metis的证书,甚至带有继续教育学分。 除此之外,许多真正的好处,例如访问分级作业和测试,只有在您升级后才能使用。 如果您需要保持动力来完成整个课程,那么提交证书也可以使您受益匪浅,因此您不太可能退出。 我认为证书肯定具有个人价值,但是不幸的是,没有多少雇主这么看重证书。

Coursera和edX与Udemy (Coursera and edX vs. Udemy)

Udemy does not currently have a way to offer certificates, so I generally find Udemy courses to be good for more applied learning material, whereas Coursera and edX are usually better for theory and foundational material.

Udemy目前没有提供证书的方法,因此我通常会发现Udemy课程适合于应用性更好的学习材料,而Coursera和edX 通常对于理论和基础材料而言更好。

Whenever I’m looking for a course about a specific tool, whether it be Spark, Hadoop, Postgres, or Flask web apps, I tend to search Udemy first since the courses favor an actionable, applied approach. Conversely, when I need an intuitive understanding of a subject, like NLP, Deep Learning, or Bayesian Statistics, I’ll search edX and Coursera first.

每当我在寻找有关特定工具的课程时,无论它是Spark,Hadoop,Postgres还是Flask Web应用程序,我都倾向于首先搜索Udemy,因为这些课程偏向于一种可行的,可应用的方法。 相反,当我需要对某个学科(如NLP,深度学习或贝叶斯统计学)的直观了解时,我将首先搜索edX和Coursera。

结语 (Wrapping Up)

Data science is vast, interesting, and rewarding field to study and be a part of. You’ll need many skills, a wide range of knowledge, and a passion for data to become an effective data scientist that companies want to hire, and it’ll take longer than the hyped up YouTube videos claim.

数据科学是一个广阔而有趣的领域,值得研究和参与。 您需要具备多种技能,广泛的知识以及对数据的热情,才能成为公司想要聘用的有效数据科学家,而且所需时间要比被炒作的YouTube视频花费的时间更长。

If you’re more interested in the machine learning side of data science, check out the Top 5 Machine Learning Courses for 2019 as a supplement to this article.


If you have any questions or suggestions, feel free to leave them in the comments below.


Thanks for reading and have fun learning!


Originally published at learndatasci.com.


翻译自: https://www.freecodecamp.org/news/top-7-online-data-science-courses-for-2019-e4afdc4693e7/






