欠采样和过采样_过采样和欠采样

欠采样和过采样

简介 (Introduction)

The Imbalanced classification problem is what we face when there is a severe skew in the class distribution of our training data. Okay, the skew may not be extremely severe (it can vary), but the reason we identify imbalanced classification as a problem is because it can influence the performance on our Machine Learning algorithms.

吨他不均衡分类问题是,当有在我们的训练数据的类分布的严重扭曲了我们的脸。 好的,偏斜可能不会非常严重(可能会有所不同),但是我们将分类不平衡视为问题的原因是,它会影响我们的机器学习算法的性能。

One way the imbalance may affect our Machine Learning algorithm is when our algorithm completely ignores the minority class. The reason this is an issue is because the minority class is often the class that we are most interested in. For instance, when building a classifier to classify fraudulent and non-fraudulent transactions from various observations, the data is likely to have more non-fraudulent transactions than that of fraud — I mean think about it, it would be very worrying if we had an equal amount of fraudulent transactions as non-fraud.

不平衡可能影响我们的机器学习算法的一种方式是,当我们的算法完全忽略少数派类别时。 之所以会出现这个问题,是因为少数派类别通常是我们最感兴趣的类别。例如,当建立一个分类器以根据各种观察结果对欺诈性和非欺诈性交易进行分类时,数据可能会包含更多的非欺诈交易要比欺诈交易多-我的意思是,考虑一下,如果我们有同等数量的欺诈交易与非欺诈交易,那将非常令人担忧。

Image for post
Figure 1: Example of class distribution for Fraud detection Problem
图1:欺诈检测问题的类分布示例

An approach to combat this challenge is Random Sampling. There are two main ways to perform random resampling, both of which have there pros and cons:

应对这种挑战的一种方法是随机采样。 执行随机重采样的主要方法有两种,两种方法各有利弊:

Oversampling — Duplicating samples from the minority class

过度采样 -复制少数群体的样本

Undersampling — Deleting samples from the majority class.

采样-从多数类别中删除样本。

In other words, Both oversampling and undersampling involve introducing a bias to select more samples from one class than from another, to compensate for an imbalance that is either already present in the data, or likely to develop if a purely random sample were taken (Source:

  • 5
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值