摘要
针对传统单个分类器在不平衡数据上分类效果有限的问题,基于对抗生成网络(GAN)和集成学习方法,提出一种新的针对二类不平衡数据集的分类方法——对抗生成网络自适应增强决策树(GAN-AdaBoost-DT)算法。首先,利用GAN训练得到生成模型,生成模型生成少数类样本,降低数据的不平衡性;其次,将生成的少数类样本代入自适应增强(AdaBoost)模型框架,更改权重,改进AdaBoost模型,提升以决策树(DT)为基分类器的AdaBoost模型的分类性能。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在信用卡诈骗数据集上的实验分析表明,该算法与合成少数类样本集成学习相比,准确率提高了4.5%,受测者工作特征曲线下面积提高了6.5%;对比改进的合成少数类样本集成学习,准确率提高了4.9%,AUC值提高了5.9%;对比随机欠采样集成学习,准确率提高了4.5%,受测者工作特征曲线下面积提高了5.4%。在UCI和KEEL的其他数据集上的实验结果表明,该算法在不平衡二分类问题上能提高总体的准确率,优化分类器性能。
Concerning that traditional single classifiers have poor classification effect for imbalanced data classification,a new binary-class imbalanced data classification algorithm was proposed based on Generative Adversarial Nets(GAN)and ensemble learning,namely Generative Adversarial Nets-Adaptive Boosting-Decision Tree(GAN-AdaBoost-DT).Firstly,GAN training was adopted to get a generative model which produced minority class samples to reduce imbalance ratio.Then,the minority class samples were brought into Adaptive Boosting(AdaBoost)learning framework and their weights were changed to improve AdaBoost model and classification performance of AdaBoost with Decision Tree(DT)as base classifier.Area Under the Carve(AUC)was used to evaluate the performance of classifier when dealing with imbalanced classification problems.The experimental results on credit card fraud data set illustrate that compared with synthetic minority over-sampling ensemble learning method,the accuracy of the proposed algorithm was increased by 4.5%,the AUC of it was improved by 6.5%;compared with modified synthetic minority over-sampling ensemble learning method,the accuracy was increased by 4.9%,the AUC was improved by 5.9%;compared with random under-sampling ensemble learning method,the accuracy was increased by 4.5%,the AUC was improved by 5.4%.The experimental results on other data sets of UCI and KEEL illustrate that the proposed algorithm can improve the accuracy of imbalanced classification and the overall classifier performance.