python 插补数据_python 2020中缺少数据插补技术的快速指南

python 插补数据

Most machine learning algorithms expect complete and clean noise-free datasets, unfortunately, real-world datasets are messy and have multiples missing cells, in such cases handling missing data becomes quite complex.

大多数机器学习算法期望完整且干净的无噪声数据集,但不幸的是,现实世界的数据集比较杂乱,缺少多个单元格,在这种情况下,处理丢失的数据变得相当复杂。

Therefore in today’s article, we are going to discuss some of the most effective and indeed easy-to-use data imputation techniques which can be used to deal with missing data.

因此,在今天的文章中,我们将讨论一些最有效且确实易于使用的数据插补技术,这些技术可用于处理丢失的数据。

So without any further delay, let’s get started.

因此,没有任何进一步的延迟,让我们开始吧。

Image for post
Copyright: © Ampcool22 | Dreamstime.com
版权:©Ampcool22 | Dreamstime.com

什么是数据归因? (What is Data Imputation?)

Data Imputation is a method in which the missing values in any variable or data frame(in Machine learning) is filled with some numeric values for performing the task. Using this method the sample size remains the same, only the blanks which were missing are now filled with some values. This method is easy to use but the variance of the dataset is reduced.

数据插补是一种方法,其中(在机器学习中)任何变量或数据框中的缺失值都填充有一些数字值,以执行任务。 使用此方法,样本大小保持不变,现在仅将缺少空白 填充一些值这种方法易于使用,但数据集的方差减小了。

为什么要进行数据插补? (Why Data Imputation?)

There can be various reasons for imputing data, many real-world datasets(not talking about CIFAR or MNIST) containing missing values which can be in any form such as blanks, NaN, 0s, any integers or any categorical symbol. Instead of just dropping the Rows or Columns containing the missing values which come at the price of losing data which may be valuable, a better strategy is to impute the missing values.

插补数据可能有多种原因,许多现实世界的数据集(不涉及CIFAR或MNIST)包含缺失值,这些缺失值可以采用任何形式,例如空格,NaN,0,任何整数或任何分类符号更好的策略是估算缺失值 ,而不是仅仅删除包含缺失值的行或列,而这些缺失值会以丢失可能有价值的数据代价

Having a good theoretical knowledge is amazing but implementing them in code in a real-time machine learning project is a completely different thing. You might get different and unexpected results based on different problems and datasets. So as a Bonus,I am also adding the links to the various courses which has helped me a lot in my journey to learn Data science and ML, experiment and compare different data imputations strategies which led me to write this article on comparisons between different data imputations methods.

拥有良好的理论知识是惊人的,但是在实时机器学习项目中以代码实现它们是完全不同的。 根据不同的问题和数据集,您可能会得到不同且出乎意料的结果。 因此,作为奖励,我还添加了到各种课程的链接,这些链接对我学习数据科学和ML,实验和比较不同的数据归因策略有很大帮助,这使我撰写了有关不同数据之间的比较的本文。归因方法。

I am personally a fan of DataCamp, I started from it and I am still learning through DataCamp and keep doing new courses. They seriously have some exciting courses. Do check them out.

我个人是 DataCamp 的粉丝 ,我从此开始,但仍在学习 DataCamp 并继续 学习 新课程。 他们认真地开设了一些令人兴奋的课程。 请检查一下。

1.处理缺少输入的数据 (

  • 2
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值