基于精英的多目标差分进化用于特征选择：高效冗余度量的过滤器方法

最新推荐文章于 2022-10-17 12:54:46 发布

星辰大海StarSea

最新推荐文章于 2022-10-17 12:54:46 发布

阅读量2.2k

点赞数

分类专栏：特征选择看过的论文

看过的论文同时被 2 个专栏收录

22 篇文章 0 订阅

订阅专栏

特征选择

18 篇文章 1 订阅

订阅专栏

#引用

##LaTex

@article{NAYAK2017,
title = “Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure”,
journal = “Journal of King Saud University - Computer and Information Sciences”,
year = “2017”,
issn = “1319-1578”,
doi = “https://doi.org/10.1016/j.jksuci.2017.08.001”,
url = “http://www.sciencedirect.com/science/article/pii/S1319157817301003”,
author = “Subrat Kumar Nayak and Pravat Kumar Rout and Alok Kumar Jagadev and Tripti Swarnkar”,
keywords = “Multi-objective, Feature selection, Differential Evolution, Filter approach, Correlation coefficient, Mutual information”
}

##Normal

Subrat Kumar Nayak, Pravat Kumar Rout, Alok Kumar Jagadev, Tripti Swarnkar,
Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure,
Journal of King Saud University - Computer and Information Sciences,
2017,
,
ISSN 1319-1578,
https://doi.org/10.1016/j.jksuci.2017.08.001.
(http://www.sciencedirect.com/science/article/pii/S1319157817301003)
Keywords: Multi-objective; Feature selection; Differential Evolution; Filter approach; Correlation coefficient; Mutual information

#摘要

现实世界数据 — 复杂
特征多 —> 愈加复杂

特征 — 冗余有错误 —> 选择

维度降低 —> 减少分类时间
—> 消除误导性特征 —> 提高分类准确度

提出的算法：
Filter Approach using Elitism based Multi-objective Differential Evolution algorithm for feature selection (FAEMODE)

创新：目标表述
—> 特征间的线性与非线性依赖 — 处理冗余与不需要的特征

选择特征子集
23个基准测试集
10重交叉验证
4个知名分类器

7种过滤方法
2种传统的与3种元启发式封装方法wrapper approaches

#1简介

##1.1上下文的

分类问题：
大量的特征 — 对于分类并不是所有都是必要和相关的
— 冗余
— 性质错误
—》降低分类算法效率

用最少的有用的相关的特征表示数据

特征选择（Feature Selection，FS）

只选择相关特征
移除冗余的与不相关特征

广义上说，特征可分为主要的4类：

不相关
弱相关，冗余
弱相关，不冗余
强相关

相关特征：
拥有对应数据集的最大化信息

冗余特征：
提供与相关特征同样的信息

不相关特征：
对分类有负面影响

特征之间有复杂的相互作用 —》难以选择最相关的特征

一个相关特征：
良好 — 单独工作
异常 — 与其他特征在一组工作

FS的搜索空间很大 — 愈为困难

特征数目增加 — 搜索空间指数增加
穷举法 — 几乎不可能

搜索算法：

随机搜索
完全搜索
启发式搜索
贪婪搜索

缺点：
易陷入局部最优
高计算成本

高效全局搜索方法

演化计算 Evolutionary Computation，EC

基于评估指标，FS算法分为：

过滤器方法
封装方法

不同点：

wrapper methods 利用学习算法，特征子集
虽然计算复杂度高
效果好
filter approaches 并不
更广义计算时间少

##1.2相关工作与动机

数据挖掘 FS — 算法

filter特征选择的常用的评价方法：

信息测度Information Measure（主要，i.e. mutual information，互信息）【1】
更广义，对噪声或离群数据不敏感【2】
相关测度Correlation Measure
距离测度Distance Measure
一致性测度Consistency Measure
模糊集理论Fuzzy Set Theory
粗糙集理论Rough Set Theory

基于互信息的算法：

minimal-redundancy-maximal-relevance criterion (mRMR)【3】
mutual information feature selector (MIFS)【4】
normalized mutual information feature selection (NMIFS)【5】
mutual information feature selector under uniform information distribution (MIFS-U)【6】
mutual information-based feature selection with class-dependent redundancy (MIFS-CR)【1】

缺点：
易陷入局部最优《— 贪婪算法

常见的filter方法：The Relief algorithm【7】：
权重 —》特征 — 相关性《— 距离度量
缺点 — 忽视了冗余特征

wrapper methods：

sequential forward selection (SFS)【8】
sequential backward selection (SBS)【9】

存在的问题：一个FS完成后，在随后的评估阶段，已选择的不能被移除。

为解决此问题：
‘‘plus- $l$ -take-away- $r$ ” method【10】 — 结合了SFS与SBS
细节 —

首先， $l$ 次前向选择
然后， $r$ 次后向剔除

参数 $l$ 和 $r$ 的确定是主要任务 — 困难【11】

—》进而有两种浮动FS方法：【12】

Sequential Forward Floating Selection (SFFS)
Sequential Backward Floating Selection (SBFS)
—》自动确定参数 $l$ 和 $r$ ，但存在局部最优问题

难点：

非常大的空间
特征交互作用

EC的优势：

不需领域知识
不需对搜索空间作任何假设

filter approaches + EC：

Genetic algorithm (GA) + information theory 【11，13，14】
rough set theory and fuzzy set theory along with GA【15，16】
Particle Swarm Optimization (PSO)【17】

wrapper FS process + EC：

genetic programming + a tree-based classifier design algorithm【18】
A Binary Bat algorithm for FS (BBA-FS) + optimum forest classifier【19】
Two PSO based single objective FS algorithms：commonly used PSO algorithm (ErFS) ++ PSO with a two-stage fitness function (2SFS)【20】

two conventional approaches for performance comparison：

linear forward selection (LFS)【21】
greedy stepwise backward selection (GSBS)【22】

【23】对于FS，DE优于GA、PSO、Ant Colony Optimization (ACO), 及Harmony search和声搜索

多目标FS算法：

filter FS：

Multi-objective PSO【24】— information theory
rough set theory along with multi-objective PSO【25，26】
a multi-objective filter FS based on ACO + Rough set theory【27】
using NSGAII【28】
a multi-objective evolutionary algorithm with class-dependent redundancy for FS (MECY-FS)【1】—能更有效进行紧凑特征子集预测 + a Multi-objective Evolutionary FS algorithm based on redundancy measure used in MIFS-U (MEFS-U) for comparison purpose only — 缺点：特征子集大小定为总的特征数目的一半，不能自动确定

wrapper FS：

PSO-based【20】—两目标：分类准确度最大化+特征数目最小化
Differential Evolution，DE【29】—分类错误最小化+特征数目最小化—展示了对于单目标算法的优越性
【30】MOEA/D：最大化与最小化类内与类间距离

本文提出的算法FAEMODE使用了DE
考虑了两种指标：correlation coefficient 及 mutual information
linear dependency has been considered but also the non-linear dependency amongst features

#2资料与方法

##2.1. Differential Evolution (DE)【31】

这里写图片描述

##2.2. 多目标优化

这里写图片描述

##2.3. Elitism based Multi-objective Differential Evolution

在传统DE上进行了少许改动
Elitism principle of NSGA-II【32】
the mutation step of DE：Randomly selecting —》 elitism principle

##2.4. Correlation measuring tools

the selection of optimal features with least redundancy amongst them
具有最小冗余度的最优特征

所选特征与目标类有多相关 — 最大化
所选类别之间的差别有多大 — 冗余最小化

特征之间的相关性或依赖性 — mutual information and correlation coefficient —》最为广泛使用

###2.4.1. Correlation coefficient相关系数

Pearson correlation coefficient (PCC) — 量化两个变量之间的线性依赖

这里写图片描述

$c o v$ — covariance协方差
$\sigma$ — standard deviation标准差

$KaTeX parse error: Unexpected character: '' at position 10: PCC \in [̲-1, 1]$
完全独立 —》 0
完全正（负）相关 —》 1（-1）

主要局限性：只对线性关系敏感

###2.4.2. Mutual information互信息

信息理论中最常用的，量化了两个变量分享的信息量

不同于PCC，Mutual Information (MI)对非线性关系很敏感，更加泛化及鲁棒。

这里写图片描述

$X, Y$ — 两个离散随机变量
$\left( x, y \right)$ — $X, Y$ 的联合概率分布函数
$\left( x \right), P \left( y \right)$ — 边际分布函数

完全独立 —》 0
完全相关 —》 1

#3 FAEMODE中的概念

##3.1. 目标公式

特征：

最大相关性
最小冗余

这里写图片描述

$x_i$ — 单个特征
$c$ — 类
$D$ — 一个特征子集Fs与一个类 $c$ 的相关性
需要最大化

这里写图片描述

$R$ — 线性与非线性依赖
最小化

这里写图片描述

##3.2. 选择最优解

one validity index一个效度指标 —》 good compact clusters良好的紧凑集群

缺点：可能不能考虑所有目标的重要性
—》 a fuzzy concept模糊概念【33】—看作Decision Maker决策者

成员函数 — membership value 隶属度值

这里写图片描述

$\mu_i$ — 隶属函数 $i$ 的隶属度集
$PF_i^{max},PF_i^{min}$ — 函数的最大与最小值

这里写图片描述

$k$ — 非占优解 $k$
$\mu^k$ — 归一化隶属函数：值越大折衷解越好
$N_{Pareto}$ — Pareto前沿中非占优解的数目
$M$ — 目标函数数目

例子：

3个非占优解 — $N_{Pareto}=3$
2个目标 — $M = 2$

这里写图片描述

求每个目标函数的最大最小值 — $PF_1^{max}, PF_2^{max},PF_1^{min}, PF_2^{min}$
再基于Eq. 13计算隶属度值

这里写图片描述

Eq. 14 归一化，例如，对于 $k = 2$ 时 $\mu^k$ 计算如下：

这里写图片描述

#4. 进行特征选择的FAEMODE算法

##4.1. 提出的算法

这里写图片描述

##4.2. 种群表示与初始化

种群 $P_{gn} = \left\{ X_{1,gn}, \ldots, X_{NP,gn} \right\}$
个体 $X_{i,gn} = \left\{ x^1_{1,gn}, \ldots, x^D_{1,gn} \right\}$

均匀初始化 — 覆盖整个搜索空间

下界和上界
$X_{LB} = \left\{ x^1_{LB}, \ldots, x^D_{LB} \right\}$
$X_{UB} = \left\{ x^1_{UB}, \ldots, x^D_{UB} \right\}$

这里写图片描述

$[0, 1]$
$D$ — 特征总数

每个解为一种特征组合

这里写图片描述

阈值 — $0.5$
$> 0.5$ — 特征被选择

##4.3. 适应值评估

例子：

这里写图片描述

$D A T A$ — 数据集
$D = 5$
$n$ — 实例数

若 $x^1_{1,gn},x^3_{1,gn},x^4_{1,gn}$ 大于阈值 $0.5$ ，则

这里写图片描述

##4.4. 终止条件

最大进化代数
the maximum number of generations (GN)

#5. 数据集试验及结果讨论

4个基于filter的方法：mRMR、MIFS、NMIFS、MIFS-U、MIFS-CR
多目标EA+filter方法：MECY-FS、MEFS-U
2个wrapper方法 + 单目标和双目标元启发式wrapper

##5.1. 数据集与FAEMODE的参数设置

23个基准数据集

这里写图片描述

选择了9个

验证方法：
10 Fold Cross Validation (FCV)
10重交叉验证

性能评估：
Waikato Environment for Knowledge Analysis (Weka)【34】

用于对比的最常用的分类算法：

K-nearest neighbor (KNN)【35】
Naïve Bayes (NB)【36】
Radial Basis Function Neural Network (RBFNN)【37】
C4.5【38】

1NN — 与filter算法比较
5NN — 与wrapper算法比较

为评估filter算法性能：
the area under the curve (AUC) of the receiver operating characteristics (ROC)
此度量被认为是优于准确度【39】

wrapper方法评估 — 准确度

这里写图片描述

##5.2. 实验结果与分析

运行40次

其他算法的结果来自于【1，30】

10重交叉验证

所选特征子集用分类方法KNN, NB, RBFNN及C4.5进行验证；再者，用特征约简百分比进行比较。

每个数据集，选择一次运行结果，进行了可视化。

这里写图片描述

表3-10：与filter FS方法的比较，括号中为基于AUC的排序，另，AUC均值，得到最优AUC的次数，标准差（ $\pm$ 后的数）

这里写图片描述

表11-14：与wrapper方法的比较 — 分类准确度均值

这里写图片描述

表15 — FAEMODE所得特征子集大小

这里写图片描述

###5.2.1. 与传统filter FS方法的比较

KNN, NB, RBFNN 与 C4.5

表3，4，5，6

基于得到的特征子集

###5.2.2. 与多目标FS方法的比较

MECY-FS 与 MEFS-U

表7，8，9，10

###5.2.3. 与wrapper FS方法的比较

传统方法：SFS 与 SBS — 表11

单目标：BBA-FS — 表12

多目标：DEMOFS 与 MOEA/D-FS — 表13，14

平均分类准确度

10 FCV with KNN (K = 5) classifier

###5.2.4. 基于特征约简百分比的比较

表15

#6 总结

a Filter Approach using Elitism based Multi-objective Differential Evolution for feature selection (FAEMODE)

双目标

线性与非线性依赖

#参考文献

【1】Wang, Z., Li, M., Li, J., 2015. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inf. Sci. 307, 73–88.
http://dx.doi.org/10.1016/j.ins.2015.02.031.
【2】Huang, J., Cai, Y., Xu, X., 2007. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28, 1825–1844.
http://dx.doi.org/10.1016/j.patrec.2007.05.011.
【3】Peng, Hanchuan., Long, Fuhui., Ding, C., 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238.
http://dx.doi.org/10.1109/TPAMI.2005.159.
【4】Battiti, R., 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550.
http://dx.doi.org/10.1109/72.298224.
【5】Estevez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M., 2009. Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20, 189–201.
http://dx.doi.org/10.1109/TNN.2008.2005601.
【6】Kwak, N., Choi, Chong.-Ho., 2002. Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159.
http://dx.doi.org/10.1109/72.977291.
【7】Kira, K., Rendell, L.A., 1992. A practical approach to feature selection. Proc. Ninth Int. Workshop Mach. Learn., 249–256
【8】Whitney, A.W., 1971. A direct method of nonparametric measurement selection. IEEE Trans. Comput. 100, 1100–1103.
http://dx.doi.org/10.1109/TC.1971.223410.
【9】Marill, T., Green, D., 1963. On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9, 11–17.
http://dx.doi.org/10.1109/TIT.1963.1057810.
【10】Stearns, S.D., 1976. On selecting features for pattern classifiers, in: Proceedings of the 3rd International Joint Conference on Pattern Recognition. pp. 71–75.
【11】Xue, B., Zhang, M., Browne, W.N., Yao, X., 2016. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626.
http://dx.doi.org/10.1109/TEVC.2015.2504420.
【12】Pudil, P., Novovicˇová, J., Kittler, J., 1994. Floating search methods in feature selection. Pattern Recogn. Lett. 15, 1119–1125.
http://dx.doi.org/10.1016/0167-8655(94)90127-9.
【13】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2013b. Multi-objective evolutionary algorithms for filter based feature selection in classification. Int. J. Artif. Intell. Tools 22, 1350024.
http://dx.doi.org/10.1142/S0218213013500243.
【14】Spolaôr, N., Lorena, A.C., Lee, H.D., 2011. Multi-objective Genetic Algorithm Evaluation in Feature Selection. Springer, Berlin, Heidelberg, pp. 462–476.
doi:10.1007/978-3-642-19893-9_32.
【15】Banerjee, M., Mitra, S., Banka, H., 2007. Evolutionary Rough Feature Selection in Gene Expression Data. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev. 37, 622–632.
doi:10.1109/TSMCC.2007.897498).
【16】Chakraborty, B., 2002. Genetic algorithm with fuzzy fitness function for feature selection, in: Proceedings of the IEEE International Symposium on Industrial Electronics ISIE-02. IEEE, pp. 315–319. vol. 1.
doi:10.1109/ISIE.2002.1026085.
【17】Chakraborty, B., 2008. Feature subset selection by particle swarm optimization with fuzzy fitness function, in: 2008 3rd International Conference on Intelligent System and Knowledge Engineering. IEEE, pp. 1038–1042.
doi:10.1109/ISKE.2008.4731082.
【18】Muni, D.P., Pal, N.R., Das, J., 2006. Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. Part B 36, 106–117.
http://dx.doi.org/10.1109/TSMCB.2005.854499.
【19】Nakamura, R.Y.M., Pereira, L.A.M., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.-S., 2012. BBA: A Binary Bat Algorithm for Feature Selection, in: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images. IEEE, pp. 291–297.
doi:10.1109/SIBGRAPI.2012.47.
【20】Xue, B., Zhang, M., Browne, W.N., 2013a. Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671.
http://dx.doi.org/10.1109/TSMCB.2012.2227469.
【21】Gutlein, M., Frank, E., Hall, M., Karwath, A., 2009. Large-scale attribute selection using wrappers, in: 2009 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, pp. 332–339.
doi:10.1109/CIDM.2009.4938668.
【22】Caruana, R., Freitag, D., 1994. Greedy attribute selection. Proc. Eighth Int. Conf. Mach. Learn., 28–36
【23】Wang, L., Ni, H., Yang, R., Pappu, V., Fenn, M.B., Pardalos, P.M., 2014. Feature selection based on meta-heuristics for biomedicine. Optim. Meth. Softw. 29, 703–719.
http://dx.doi.org/10.1080/10556788.2013.834900.
【24】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2012. A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connection Sci. 24, 91–116. http://dx.doi.org/10.1080/09540091.2012.737765.
【25】Cervante, L., Xue, B., Shang, L., Zhang, M., 2013. A Multi-objective Feature Selection Approach Based on Binary PSO and Rough Set Theory. Springer, Berlin, Heidelberg, pp. 25–36.
doi:10.1007/978-3-642-37198-1_3.
【26】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2014. Binary PSO and rough set theory for feature selection: a multi-objective filter based approach. Int. J. Comput. Intell. Appl. 13, 1450009.
http://dx.doi.org/10.1142/S1469026814500096.
【27】Ke, L., Feng, Z., Zongben Xu, Ke Shang, Yonggang Wang, 2010. A multiobjective ACO algorithm for rough feature selection, in: 2010 Second Pacific-Asia Conference on Circuits, Communications and System. IEEE, pp. 207–210.
doi:10.1109/PACCS.2010.5627071.
【28】Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F., 2007. Multi-objective Feature Selection with NSGA II, in: Adaptive and Natural Computing Algorithms. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 240–247.
doi:10.1007/978-3-540-71618-1_27.
【29】Xue, B., Fu, W., Zhang, M., 2014b. Multi-objective Feature Selection in Classification: A Differential Evolution Approach. Springer International Publishing, pp. 516–528.
doi:10.1007/978-3-319-13563-2_44.
【30】Paul, S., Das, S., 2015. Simultaneous feature selection and weighting – an evolutionary multi-objective optimization approach. Pattern Recogn. Lett. 65, 51–59.
http://dx.doi.org/10.1016/j.patrec.2015.07.007.
【31】Storn, R., Price, K., 1997. Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359.
http://dx.doi.org/10.1023/A:1008202821328.
【32】Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197.
http://dx.doi.org/10.1109/4235.996017.
【33】Abido, M.A., 2003. A novel multiobjective evolutionary algorithm for environmental/economic power dispatch. Electr. Pow. Syst. Res. 65, 71–81.
http://dx.doi.org/10.1016/S0378-7796(02)00221-3.
【35】Aha, D.W., Kibler, D., Albert, M.K., 1991. Instance-based learning algorithms. Mach. Learn. 6, 37–66.
http://dx.doi.org/10.1007/BF00153759.
【36】John, G.H., Langly, P., 1995. Estimating continuous distributions in Bayesian classifiers, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, pp. 338–345.
【37】Moody, J., Darken, C.J., 1989. Fast learning in networks of locally-tuned processing units. Neural Comput. 1, 281–294.
http://dx.doi.org/10.1162/neco.1989.1.2.281.
【38】Quinlan, J.R., 1993. C4.5: Programs for Machine Learning, Morgan Kauffman, San Mateo, CA.
【39】Ling, C.X., Huang, J., Zhang, H., 2003. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms, in: Advances in Artificial Intelligence. Springer, Berlin, Heidelberg, pp. 329–341.
doi:10.1007/3-540-44886-1_25.