我爱机器学习

我爱机器学习

[导读]Learning from Imbalanced Classes

Imbalanced

原文:Learning from Imbalanced Classes

数据不平衡是一个非常经典的问题,数据挖掘、计算广告、NLP等工作经常遇到。该文总结了可能有效的方法,值得参考:

  • Do nothing. Sometimes you get lucky and nothing needs to be done. You can train on the so-called natural (or stratified) distribution and sometimes it works without need for modification.
  • Balance the training set in some way:
    • Oversample the minority class.
    • Undersample the majority class.
    • Synthesize new minority classes.
  • Throw away minority examples and switch to an anomaly detection framework.
  • At the algorithm level, or after it:
    • Adjust the class weight (misclassification costs).
    • Adjust the decision threshold.
    • Modify an existing algorithm to be more sensitive to rare classes.
  • Construct an entirely new algorithm to perform well on imbalanced data.
阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。欢迎关注我们的网站(https://www.52ml.net),对机器学习感兴趣的欢迎加入我们的QQ群:252085834。 https://blog.csdn.net/machinelearning_net/article/details/52368183
个人分类: 机器学习
上一篇机器学习相关的Awesome系列
下一篇SIGKDD历年Best Papers
想对作者说点什么? 我来说一句

Learning from Imbalanced Data Sets

2014年07月09日 165KB 下载

Learning From Data-从数据中学习

2017年08月22日 4.16MB 下载

没有更多推荐了,返回首页

关闭
关闭