几个经典的用于不平衡回归的采样方法
前言
众所周知,不平衡回归相比于不平衡分类是一个很少被关注的话题. 因需要,笔者整理一些用于处理imbalanced regression的data level方法.
SMOGN
原始论文:
Branco, P., Torgo, L., Ribeiro, R. (2017). SMOGN: A Pre-Processing Approach for Imbalanced Regression. Proceedings of Machine Learning Research, 74:36-50. http://proceedings.mlr.press/v74/branco17a/branco17a.pdf.
该方法的官方实现是基于R语言, 该方法目前已经被收录进Python包(smogn)中, 可通过如下命令安装使用,
pip install smogn
项目地址见:https://github.com/nickkunz/smogn
SMOTE
原始论文:
Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of artificial intelligence research, 2002, 16: 321-357. https://www.jair.org/index.php/jair/article/download/10302/24590
SMOTE及其各种变体的实现大集合见项目:https://github.com/analyticalmindsltd/smote_variants
SMOTE用于Regression的应用论文:
- Torgo L, Ribeiro R P, Pfahringer B, et al. Smote for regression[C]//Progress in Artificial Intelligence: 16th Portuguese Conference on Artificial Intelligence, EPIA 2013, Angra do Heroísmo, Azores, Portugal, September 9-12, 2013. Proceedings 16. Springer Berlin Heidelberg, 2013: 378-389.
- Camacho L, Douzas G, Bacao F. Geometric SMOTE for regression[J]. Expert Systems with Applications, 2022: 116387.
DA-WR (Data Augmentation - Weighted Resampling)
论文: Data Augmentation for Imbalanced Regression, AISTATS 2023.
代码链接: https://github.com/sstocksieker/DAIR.
REBAGG: REsampled BAGGing for Imbalanced Regression
论文: REBAGG: REsampled BAGGing for Imbalanced Regression, LIDTA 2018.
基本思路: 结合了集成学习Bagging
学位论文:
Thesis, Re-sampling Approaches for Regression Tasks under Imbalanced Domains, 2014.
ImbalancedLearningRegression
原始论文:
Branco P. ImbalancedLearningRegression-A Python Package to Tackle the Imbalanced Regression Problem[J]. 2022.https://2022.ecmlpkdd.org/wp-content/uploads/2022/09/sub_1456.pdf
该方法已经被收录进Python包 (ImbalancedLearningRegression)中,可通过如下命令安装使用,
pip install ImbalancedLearningRegression
官方项目地址:https://github.com/paobranco/ImbalancedLearningRegression.
总结
虽然不多,应该还有,后面再补充…
上面提到的这些基本上都是应用到人工 构造特征的数据集上, 如何将其应用到端到端的深度学习方法中值得进一步研究,
此方方面的研究工作见:
Dablain D, Krawczyk B, Chawla N V. DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022. https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9694621, 发表于顶刊IEEE TNNLS, 膜拜.
总结
后续再补充