非平衡数据流学习——数据层面的算法
非平衡数据流学习——数据层面的算法
Undersampling Naïve Bayes
论文: Nguyen, H.M., Cooper, E.W., Kamei, K.: Online learning from imbalanced data streams. In: Third International Conference of Soft Computing and Pattern Recognition, SoCPaR 2011, Dalian, 14–16 Oct 2011, pp. 347–352 (2011)
思想:训练学习器时,对于minority类,均更新分类器,而对于majority类,以一定概率(非平衡率)来更新分类器。
缺点:假设minority类一直是minority,不存在类别关系的动态变化
Generalized Over-sampling Based Online Imbalanced Learning Framework (GOS-IL)
论文:Barua, S., Islam, M.M., Murase, K.: GOS-IL: a generalized over-sampling based online imbalanced learning framework. In: Neural Information Processing – 22nd International Conference, ICONIP 2015, Proceedings, Part I, Istanbul, 9–12 Nov 2015, pp. 680–687 (2015)
思想:对每一类保存三个参数:
- 当前分类器分错的数据项
- 当前类已经收到的数据数目
- 当前类用于更新的数据数目
该算法对分类错误的数据项进行上采样,而且只在非平衡率到达一定水平,并且分类器的错误率到一定的threshold时才进行上采样。
缺点:不处理概念飘移和类别关系的动态变化
Sequential SMOTE
论文:
- Mao, W., Wang, J., Wang, L.: Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, 12–17 July 2015, pp. 1–8 (2015)
- Mao, W., Jiang, M., Wang, J., Li, Y.: Online extreme learning machine with hybrid sampling strategy for sequential imbalanced data. Cogn. Comput. 9(6), 780–800 (2017)