新课程:学习使用Python和Pandas进行数据清洗

New Course: Learn Data Cleaning with Python and Pandas

Data cleaning might not be the reason you got interested in data science, but if you’re going to be a data scientist, no skill is more crucial. Working data scientists spend at least 60% of their time cleaning data, and dirty data is often ranked the single biggest barrier data scientists face at work.

数据清理可能不是您对数据科学感兴趣的原因,但是如果您要成为数据科学家,没有什么技能更为关键。 工作数据科学家花费至少60%的时间来清理数据 ,而脏数据通常被列为数据科学家在工作中面临的最大障碍。

That’s why we’ve just added a brand new course to our Python Data Analyst and Data Scientist paths called Data Cleaning and Analysis. If you’re a Dataquest Premium subscriber, you can start learning right now.

这就是为什么我们刚刚在Python Data Analyst和Data Scientist路径中添加了全新的课程,称为Data Cleaning and Analysis 。 如果您是Dataquest Premium的订户 ,则可以立即开始学习。

New Course: Learn Data Cleaning with Python and Pandas

为什么要学习数据清洗? (Why Learn Data Cleaning?)

Data scientists can end up doing a wide variety of things across a wide variety of industries, but almost every data science job shares at least one thing in common: data cleaning. The real world is messy, after all, and that means real-world datasets tend to be messy, too. Incomplete entries, inconsistent formatting, entry errors – these are things you’ll encounter in almost every dataset you work with.

数据科学家最终可以在各种各样的行业中做各种各样的事情,但是几乎每个数据科学工作都有至少一个共同点:数据清理。 毕竟,现实世界是混乱的,这意味着现实世界数据集也往往是混乱的。 条目不完整,格式不一致,条目错误–这些是几乎在您处理的每个数据集中都会遇到的问题。

Even if you’re working with perfect data, though, data cleaning skills are still necessary. You’ll almost always want to make changes to your data and its formatting to facilitate your analysis, and that means doing the same sorts of things you do to messy data: dropping irrelevant entries, reformatting columns, etc.

即使您使用的是完美的数据,数据清理技能仍然是必需的。 您几乎总是想对数据及其格式进行更改以方便分析,这意味着您需要对混乱的数据执行相同的操作:删除不相关的条目,重新格式化列等。

Learning data cleaning is particularly important if you aspire to work with any kind of machine learning. As the Harvard Business Review put it:

如果您希望使用任何种类的机器学习,学习数据清理就显得尤为重要。 正如《 哈佛商业评论》所说:

Poor data quality is enemy number one to the widespread, profitable use of machine learning. […] The quality demands of machine learning are steep, and bad data can rear its ugly head twice — first in the historical data used to train the predictive model and second in the new data used by that model to make future decisions.

不良的数据质量是广泛使用机器学习,无法获利的主要敌人。 […]机器学习的质量要求非常苛刻,坏数据可能会使其丑陋的头两次抬头-首先是用于训练预测模型的历史数据,其次是该模型用于做出未来决策的新数据。

Simply put, there’s no doing data science without doing data cleaning.

简而言之,没有进行数据清理就没有数据科学。

本课程涵盖什么内容? (What Does This Course Cover?)

In Data Cleaning and Analysis, you’ll learn key data cleaning techniques in Python using the popular pandas data analysis library (if you’d like to learn data cleaning in R, we have a separate R data cleaning course). Throughout the course you’ll work with real-world data from the World Happiness Report, cleaning and analyzing a large dataset that includes a variety of metrics for world nations like GDP and average life expectancy.

在“数据清洗和分析”中,您将使用流行的pandas数据分析库学习Python中的关键数据清洗技术(如果您想在R中学习数据清洗,我们有单独的R数据清洗课程 )。 在整个课程中,您将使用《世界幸福报告》中的真实数据,清理和分析大型数据集,其中包括针对世界各国的各种指标,例如GDP和平均预期寿命。

In the first three missions of Data Cleaning, you’ll learn to aggregate, combine, and transform data efficiently using pandas to get it ready for analysis. Then you’ll dig into slightly more complex topics, like how to work with strings in pandas, how to use regular expressions, and how to handle missing and duplicate data.

在数据清理的前三个任务中,您将学习使用pandas有效地聚合,合并和转换数据,以准备进行分析。 然后,您将研究稍微复杂的主题,例如如何在pandas使用字符串,如何使用正则表达式以及如何处理丢失和重复的数据。

Once you’ve worked through the teaching missions, you’ll be challenged to put all of your new data cleaning skills to the test with a new guided project that will also teach you some new pandas skills and data presentation skills as you work to clean and analyze real-world datasets of employee exit surveys from two Australian government bureaus.

完成教学任务后,您将面临一个挑战,如何通过一个新的指导项目来测试所有新数据清洁技能,该项目还将在您进行清洁工作时教会您一些新的pandas技能和数据展示技能并分析来自两个澳大利亚政府机构的员工离职调查的真实数据集。

And of course, all the material is presented in Dataquest’s split-screen presentation style so that you can get your hands dirty and start coding right off the bat.

当然,所有资料都是以Dataquest的分屏显示风格呈现的,因此您可以动手操作并立即进行编码。

New Course: Learn Data Cleaning with Python and Pandas

抓住拖把 (Grab Your Mop)

Data cleaning may not sound as sexy as machine learning, but the often-ignored reality of data science is that your analysis can only ever be as good as your data. If your data’s a mess, your analysis is going to be a mess, too.

数据清理听起来可能不像机器学习那么性感,但是数据科学经常被忽视的现实是, 您的分析只能与数据一样好 。 如果您的数据一团糟,那么您的分析也将一团糟。

Thankfully, with the power of Python and pandas, you don’t have to let that happen, so grab your mop and dive into our new Data Cleaning and Analysis course right now!

值得庆幸的是,借助Python和pandas的强大功能,您不必让这种情况发生,因此,请立即开始研究我们的新数据清洗和分析课程吧!

翻译自: https://www.pybloggers.com/2019/02/new-course-learn-data-cleaning-with-python-and-pandas/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值