missforest_missforest最佳丢失数据插补算法

missforest

Missing data often plagues real-world datasets, and hence there is tremendous value in imputing, or filling in, the missing values. Unfortunately, standard ‘lazy’ imputation methods like simply using the column median or average don’t work well.

丢失的数据通常困扰着现实世界的数据集,因此,估算或填写丢失的值具有巨大的价值。 不幸的是,标准的“惰性”插补方法(例如仅使用列中位数或平均值)效果不佳。

On the other hand, KNN is a machine-learning based imputation algorithm that has seen success but requires tuning of the parameter k and additionally, is vulnerable to many of KNN’s weaknesses, like being sensitive to being outliers and noise. Additionally, depending on circumstances, it can be computationally expensive, requiring the entire dataset to be stored and computing distances between every pair of points.

另一方面,KNN是一种基于机器学习的插补算法,它已经取得了成功,但需要调整参数k,而且容易受到KNN的许多弱点的影响,例如对异常值和噪声敏感。 另外,根据情况,计算可能会很昂贵,需要存储整个数据集并计算每对点之间的距离。

MissForest is another machine learning-based data imputation algorithm that operates on the Random Forest algorithm. Stekhoven and Buhlmann, creators of the algorithm, conducted a study in 2011 in which imputation methods were compared on datasets with randomly introduced missing values. MissForest outperformed all other algorithms in all metrics, including KNN-Impute, in some cases by over 50%.

MissForest是基于随机森林算法的另一种基于机器学习的数据插补算法。 该算法的创建者Stekhoven和Buhlmann于2011年进行了一项研究,该研究在具有随机引入的缺失值的数据集上比较了插补方法。 在所有指标上,MissForest的性能均优于其他所有算法,包括KNN-Impute,在某些情况下超过50%。

First, the missing values are filled in using median/mode imputation. Then, we mark the missing values as ‘Predict’ and the others as training rows, which are fed into a Random Forest model trained to predict, in this case, Age based on

  • 6
    点赞
  • 20
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值