#引用
##LaTex
@article{NAYAK2017,
title = “Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure”,
journal = “Journal of King Saud University - Computer and Information Sciences”,
year = “2017”,
issn = “1319-1578”,
doi = “https://doi.org/10.1016/j.jksuci.2017.08.001”,
url = “http://www.sciencedirect.com/science/article/pii/S1319157817301003”,
author = “Subrat Kumar Nayak and Pravat Kumar Rout and Alok Kumar Jagadev and Tripti Swarnkar”,
keywords = “Multi-objective, Feature selection, Differential Evolution, Filter approach, Correlation coefficient, Mutual information”
}
##Normal
Subrat Kumar Nayak, Pravat Kumar Rout, Alok Kumar Jagadev, Tripti Swarnkar,
Elitism based Multi-Objective Differential Evolution for feature selection: A filter approach with an efficient redundancy measure,
Journal of King Saud University - Computer and Information Sciences,
2017,
,
ISSN 1319-1578,
https://doi.org/10.1016/j.jksuci.2017.08.001.
(http://www.sciencedirect.com/science/article/pii/S1319157817301003)
Keywords: Multi-objective; Feature selection; Differential Evolution; Filter approach; Correlation coefficient; Mutual information
#摘要
现实世界数据 — 复杂
特征多 —> 愈加复杂
特征 — 冗余 有错误 —> 选择
维度降低 —> 减少分类时间
—> 消除误导性特征 —> 提高分类准确度
提出的算法:
Filter Approach using Elitism based Multi-objective Differential Evolution algorithm for feature selection (FAEMODE)
创新:目标表述
—> 特征间的线性与非线性依赖 — 处理冗余与不需要的特征
选择特征子集
23个基准测试集
10重交叉验证
4个知名分类器
7种过滤方法
2种传统的与3种元启发式封装方法wrapper approaches
#1简介
##1.1上下文的
分类问题:
大量的特征 — 对于分类并不是所有都是必要和相关的
— 冗余
— 性质错误
—》 降低分类算法效率
用最少的有用的相关的特征表示数据
特征选择(Feature Selection,FS)
只选择相关特征
移除冗余的与不相关特征
广义上说,特征可分为主要的4类:
- 不相关
- 弱相关,冗余
- 弱相关,不冗余
- 强相关
相关特征:
拥有对应数据集的最大化信息
冗余特征:
提供与相关特征同样的信息
不相关特征:
对分类有负面影响
特征之间有复杂的相互作用 —》 难以选择最相关的特征
一个相关特征:
良好 — 单独工作
异常 — 与其他特征在一组工作
FS的搜索空间很大 — 愈为困难
特征数目增加 — 搜索空间指数增加
穷举法 — 几乎不可能
搜索算法:
- 随机搜索
- 完全搜索
- 启发式搜索
- 贪婪搜索
缺点:
易陷入局部最优
高计算成本
高效全局搜索方法
演化计算 Evolutionary Computation,EC
基于评估指标,FS算法分为:
- 过滤器方法
- 封装方法
不同点:
- wrapper methods 利用学习算法,特征子集
虽然计算复杂度高
效果好 - filter approaches 并不
更广义 计算时间少
##1.2相关工作与动机
数据挖掘 FS — 算法
filter特征选择的常用的评价方法:
- 信息测度Information Measure(主要,i.e. mutual information,互信息)【1】
更广义,对噪声或离群数据不敏感【2】 - 相关测度Correlation Measure
- 距离测度Distance Measure
- 一致性测度Consistency Measure
- 模糊集理论Fuzzy Set Theory
- 粗糙集理论Rough Set Theory
基于互信息的算法:
- minimal-redundancy-maximal-relevance criterion (mRMR)【3】
- mutual information feature selector (MIFS)【4】
- normalized mutual information feature selection (NMIFS)【5】
- mutual information feature selector under uniform information distribution (MIFS-U)【6】
- mutual information-based feature selection with class-dependent redundancy (MIFS-CR)【1】
缺点:
易陷入局部最优 《— 贪婪算法
常见的filter方法:The Relief algorithm【7】:
权重 —》 特征 — 相关性 《— 距离度量
缺点 — 忽视了冗余特征
wrapper methods:
- sequential forward selection (SFS)【8】
- sequential backward selection (SBS)【9】
存在的问题:一个FS完成后,在随后的评估阶段,已选择的不能被移除。
为解决此问题:
‘‘plus-
l
l
l-take-away-
r
r
r” method【10】 — 结合了SFS与SBS
细节 —
- 首先, l l l次前向选择
- 然后, r r r次后向剔除
参数 l l l和 r r r的确定是主要任务 — 困难【11】
—》进而有两种浮动FS方法:【12】
- Sequential Forward Floating Selection (SFFS)
- Sequential Backward Floating Selection (SBFS)
—》自动确定参数 l l l和 r r r,但存在局部最优问题
难点:
- 非常大的空间
- 特征交互作用
EC的优势:
- 不需领域知识
- 不需对搜索空间作任何假设
filter approaches + EC:
- Genetic algorithm (GA) + information theory 【11,13,14】
- rough set theory and fuzzy set theory along with GA【15,16】
- Particle Swarm Optimization (PSO)【17】
wrapper FS process + EC:
- genetic programming + a tree-based classifier design algorithm【18】
- A Binary Bat algorithm for FS (BBA-FS) + optimum forest classifier【19】
- Two PSO based single objective FS algorithms:commonly used PSO algorithm (ErFS) ++ PSO with a two-stage fitness function (2SFS)【20】
two conventional approaches for performance comparison:
- linear forward selection (LFS)【21】
- greedy stepwise backward selection (GSBS)【22】
【23】对于FS,DE优于GA、PSO、Ant Colony Optimization (ACO), 及Harmony search和声搜索
多目标FS算法:
filter FS:
- Multi-objective PSO【24】— information theory
- rough set theory along with multi-objective PSO【25,26】
- a multi-objective filter FS based on ACO + Rough set theory【27】
- using NSGAII【28】
- a multi-objective evolutionary algorithm with class-dependent redundancy for FS (MECY-FS)【1】—能更有效进行紧凑特征子集预测 + a Multi-objective Evolutionary FS algorithm based on redundancy measure used in MIFS-U (MEFS-U) for comparison purpose only — 缺点:特征子集大小定为总的特征数目的一半,不能自动确定
wrapper FS:
- PSO-based【20】—两目标:分类准确度最大化+特征数目最小化
- Differential Evolution,DE【29】—分类错误最小化+特征数目最小化—展示了对于单目标算法的优越性
- 【30】MOEA/D:最大化与最小化类内与类间距离
本文提出的算法FAEMODE使用了DE
考虑了两种指标:correlation coefficient 及 mutual information
linear dependency has been considered but also the non-linear dependency amongst features
#2资料与方法
##2.1. Differential Evolution (DE)【31】
##2.2. 多目标优化
##2.3. Elitism based Multi-objective Differential Evolution
在传统DE上进行了少许改动
Elitism principle of NSGA-II【32】
the mutation step of DE:Randomly selecting —》 elitism principle
##2.4. Correlation measuring tools
the selection of optimal features with least redundancy amongst them
具有最小冗余度的最优特征
- 所选特征与目标类有多相关 — 最大化
- 所选类别之间的差别有多大 — 冗余最小化
特征之间的相关性或依赖性 — mutual information and correlation coefficient —》 最为广泛使用
###2.4.1. Correlation coefficient相关系数
Pearson correlation coefficient (PCC) — 量化两个变量之间的线性依赖
c
o
v
cov
cov — covariance协方差
σ
\sigma
σ — standard deviation标准差
KaTeX parse error: Unexpected character: '' at position 10: PCC \in [̲-1, 1]
完全独立 —》 0
完全正(负)相关 —》 1(-1)
主要局限性:只对线性关系敏感
###2.4.2. Mutual information互信息
信息理论中最常用的,量化了两个变量分享的信息量
不同于PCC,Mutual Information (MI)对非线性关系很敏感,更加泛化及鲁棒。
X
,
Y
X,Y
X,Y — 两个离散随机变量
p
(
x
,
y
)
p \left( x, y \right)
p(x,y) —
X
,
Y
X,Y
X,Y的联合概率分布函数
P
(
x
)
,
P
(
y
)
P \left( x \right), P \left( y \right)
P(x),P(y) — 边际分布函数
完全独立 —》 0
完全相关 —》 1
#3 FAEMODE中的概念
##3.1. 目标公式
特征:
- 最大相关性
- 最小冗余
x
i
x_i
xi — 单个特征
c
c
c — 类
D
D
D — 一个特征子集Fs与一个类
c
c
c的相关性
需要最大化
R
R
R — 线性与非线性依赖
最小化
##3.2. 选择最优解
one validity index一个效度指标 —》 good compact clusters良好的紧凑集群
缺点:可能不能考虑所有目标的重要性
—》 a fuzzy concept模糊概念【33】—看作Decision Maker决策者
成员函数 — membership value 隶属度值
μ
i
\mu_i
μi — 隶属函数
i
i
i的隶属度集
P
F
i
m
a
x
,
P
F
i
m
i
n
PF_i^{max},PF_i^{min}
PFimax,PFimin — 函数的最大与最小值
k
k
k — 非占优解
k
k
k
μ
k
\mu^k
μk — 归一化隶属函数:值越大折衷解越好
N
P
a
r
e
t
o
N_{Pareto}
NPareto — Pareto前沿中非占优解的数目
M
M
M — 目标函数数目
例子:
3个非占优解 —
N
P
a
r
e
t
o
=
3
N_{Pareto}=3
NPareto=3
2个目标 —
M
=
2
M=2
M=2
求每个目标函数的最大最小值 —
P
F
1
m
a
x
,
P
F
2
m
a
x
,
P
F
1
m
i
n
,
P
F
2
m
i
n
PF_1^{max}, PF_2^{max},PF_1^{min}, PF_2^{min}
PF1max,PF2max,PF1min,PF2min
再基于Eq. 13计算隶属度值
Eq. 14 归一化,例如,对于 k = 2 k=2 k=2时 μ k \mu^k μk计算如下:
#4. 进行特征选择的FAEMODE算法
##4.1. 提出的算法
##4.2. 种群表示与初始化
种群
P
g
n
=
{
X
1
,
g
n
,
…
,
X
N
P
,
g
n
}
P_{gn} = \left\{ X_{1,gn}, \ldots, X_{NP,gn} \right\}
Pgn={X1,gn,…,XNP,gn}
个体
X
i
,
g
n
=
{
x
1
,
g
n
1
,
…
,
x
1
,
g
n
D
}
X_{i,gn} = \left\{ x^1_{1,gn}, \ldots, x^D_{1,gn} \right\}
Xi,gn={x1,gn1,…,x1,gnD}
均匀初始化 — 覆盖整个搜索空间
下界和上界
X
L
B
=
{
x
L
B
1
,
…
,
x
L
B
D
}
X_{LB} = \left\{ x^1_{LB}, \ldots, x^D_{LB} \right\}
XLB={xLB1,…,xLBD}
X
U
B
=
{
x
U
B
1
,
…
,
x
U
B
D
}
X_{UB} = \left\{ x^1_{UB}, \ldots, x^D_{UB} \right\}
XUB={xUB1,…,xUBD}
[
0
,
1
]
[0,1]
[0,1]
D
D
D — 特征总数
每个解为一种特征组合
阈值 —
0.5
0.5
0.5
>
0.5
>0.5
>0.5 — 特征被选择
##4.3. 适应值评估
例子:
D
A
T
A
DATA
DATA — 数据集
D
=
5
D=5
D=5
n
n
n — 实例数
若 x 1 , g n 1 , x 1 , g n 3 , x 1 , g n 4 x^1_{1,gn},x^3_{1,gn},x^4_{1,gn} x1,gn1,x1,gn3,x1,gn4大于阈值 0.5 0.5 0.5,则
##4.4. 终止条件
最大进化代数
the maximum number of generations (GN)
#5. 数据集试验及结果讨论
- 4个基于filter的方法:mRMR、MIFS、NMIFS、MIFS-U、MIFS-CR
- 多目标EA+filter方法:MECY-FS、MEFS-U
- 2个wrapper方法 + 单目标和双目标元启发式wrapper
##5.1. 数据集与FAEMODE的参数设置
23个基准数据集
选择了9个
验证方法:
10 Fold Cross Validation (FCV)
10重交叉验证
性能评估:
Waikato Environment for Knowledge Analysis (Weka)【34】
用于对比的最常用的分类算法:
- K-nearest neighbor (KNN)【35】
- Naïve Bayes (NB)【36】
- Radial Basis Function Neural Network (RBFNN)【37】
- C4.5【38】
1NN — 与filter算法比较
5NN — 与wrapper算法比较
为评估filter算法性能:
the area under the curve (AUC) of the receiver operating characteristics (ROC)
此度量被认为是优于准确度【39】
wrapper方法评估 — 准确度
##5.2. 实验结果与分析
运行40次
其他算法的结果来自于【1,30】
10重交叉验证
所选特征子集用分类方法KNN, NB, RBFNN及C4.5进行验证;再者,用特征约简百分比进行比较。
每个数据集,选择一次运行结果,进行了可视化。
表3-10:与filter FS方法的比较,括号中为基于AUC的排序,另,AUC均值,得到最优AUC的次数,标准差( ± \pm ±后的数)
表11-14:与wrapper方法的比较 — 分类准确度均值
表15 — FAEMODE所得特征子集大小
###5.2.1. 与传统filter FS方法的比较
KNN, NB, RBFNN 与 C4.5
表3,4,5,6
基于得到的特征子集
###5.2.2. 与多目标FS方法的比较
MECY-FS 与 MEFS-U
表7,8,9,10
###5.2.3. 与wrapper FS方法的比较
传统方法:SFS 与 SBS — 表11
单目标:BBA-FS — 表12
多目标:DEMOFS 与 MOEA/D-FS — 表13,14
平均分类准确度
10 FCV with KNN (K = 5) classifier
###5.2.4. 基于特征约简百分比的比较
表15
#6 总结
a Filter Approach using Elitism based Multi-objective Differential Evolution for feature selection (FAEMODE)
双目标
线性与非线性依赖
#参考文献
【1】Wang, Z., Li, M., Li, J., 2015. A multi-objective evolutionary algorithm for feature selection based on mutual information with a new redundancy measure. Inf. Sci. 307, 73–88.
http://dx.doi.org/10.1016/j.ins.2015.02.031.
【2】Huang, J., Cai, Y., Xu, X., 2007. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28, 1825–1844.
http://dx.doi.org/10.1016/j.patrec.2007.05.011.
【3】Peng, Hanchuan., Long, Fuhui., Ding, C., 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238.
http://dx.doi.org/10.1109/TPAMI.2005.159.
【4】Battiti, R., 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5, 537–550.
http://dx.doi.org/10.1109/72.298224.
【5】Estevez, P.A., Tesmer, M., Perez, C.A., Zurada, J.M., 2009. Normalized mutual information feature selection. IEEE Trans. Neural Netw. 20, 189–201.
http://dx.doi.org/10.1109/TNN.2008.2005601.
【6】Kwak, N., Choi, Chong.-Ho., 2002. Input feature selection for classification problems. IEEE Trans. Neural Netw. 13, 143–159.
http://dx.doi.org/10.1109/72.977291.
【7】Kira, K., Rendell, L.A., 1992. A practical approach to feature selection. Proc. Ninth Int. Workshop Mach. Learn., 249–256
【8】Whitney, A.W., 1971. A direct method of nonparametric measurement selection. IEEE Trans. Comput. 100, 1100–1103.
http://dx.doi.org/10.1109/TC.1971.223410.
【9】Marill, T., Green, D., 1963. On the effectiveness of receptors in recognition systems. IEEE Trans. Inf. Theory 9, 11–17.
http://dx.doi.org/10.1109/TIT.1963.1057810.
【10】Stearns, S.D., 1976. On selecting features for pattern classifiers, in: Proceedings of the 3rd International Joint Conference on Pattern Recognition. pp. 71–75.
【11】Xue, B., Zhang, M., Browne, W.N., Yao, X., 2016. A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20, 606–626.
http://dx.doi.org/10.1109/TEVC.2015.2504420.
【12】Pudil, P., Novovicˇová, J., Kittler, J., 1994. Floating search methods in feature selection. Pattern Recogn. Lett. 15, 1119–1125.
http://dx.doi.org/10.1016/0167-8655(94)90127-9.
【13】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2013b. Multi-objective evolutionary algorithms for filter based feature selection in classification. Int. J. Artif. Intell. Tools 22, 1350024.
http://dx.doi.org/10.1142/S0218213013500243.
【14】Spolaôr, N., Lorena, A.C., Lee, H.D., 2011. Multi-objective Genetic Algorithm Evaluation in Feature Selection. Springer, Berlin, Heidelberg, pp. 462–476.
doi:10.1007/978-3-642-19893-9_32.
【15】Banerjee, M., Mitra, S., Banka, H., 2007. Evolutionary Rough Feature Selection in Gene Expression Data. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev. 37, 622–632.
doi:10.1109/TSMCC.2007.897498).
【16】Chakraborty, B., 2002. Genetic algorithm with fuzzy fitness function for feature selection, in: Proceedings of the IEEE International Symposium on Industrial Electronics ISIE-02. IEEE, pp. 315–319. vol. 1.
doi:10.1109/ISIE.2002.1026085.
【17】Chakraborty, B., 2008. Feature subset selection by particle swarm optimization with fuzzy fitness function, in: 2008 3rd International Conference on Intelligent System and Knowledge Engineering. IEEE, pp. 1038–1042.
doi:10.1109/ISKE.2008.4731082.
【18】Muni, D.P., Pal, N.R., Das, J., 2006. Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. Part B 36, 106–117.
http://dx.doi.org/10.1109/TSMCB.2005.854499.
【19】Nakamura, R.Y.M., Pereira, L.A.M., Costa, K.A., Rodrigues, D., Papa, J.P., Yang, X.-S., 2012. BBA: A Binary Bat Algorithm for Feature Selection, in: 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images. IEEE, pp. 291–297.
doi:10.1109/SIBGRAPI.2012.47.
【20】Xue, B., Zhang, M., Browne, W.N., 2013a. Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671.
http://dx.doi.org/10.1109/TSMCB.2012.2227469.
【21】Gutlein, M., Frank, E., Hall, M., Karwath, A., 2009. Large-scale attribute selection using wrappers, in: 2009 IEEE Symposium on Computational Intelligence and Data Mining. IEEE, pp. 332–339.
doi:10.1109/CIDM.2009.4938668.
【22】Caruana, R., Freitag, D., 1994. Greedy attribute selection. Proc. Eighth Int. Conf. Mach. Learn., 28–36
【23】Wang, L., Ni, H., Yang, R., Pappu, V., Fenn, M.B., Pardalos, P.M., 2014. Feature selection based on meta-heuristics for biomedicine. Optim. Meth. Softw. 29, 703–719.
http://dx.doi.org/10.1080/10556788.2013.834900.
【24】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2012. A multi-objective particle swarm optimisation for filter-based feature selection in classification problems. Connection Sci. 24, 91–116. http://dx.doi.org/10.1080/09540091.2012.737765.
【25】Cervante, L., Xue, B., Shang, L., Zhang, M., 2013. A Multi-objective Feature Selection Approach Based on Binary PSO and Rough Set Theory. Springer, Berlin, Heidelberg, pp. 25–36.
doi:10.1007/978-3-642-37198-1_3.
【26】Xue, B., Cervante, L., Shang, L., Browne, W.N., Zhang, M., 2014. Binary PSO and rough set theory for feature selection: a multi-objective filter based approach. Int. J. Comput. Intell. Appl. 13, 1450009.
http://dx.doi.org/10.1142/S1469026814500096.
【27】Ke, L., Feng, Z., Zongben Xu, Ke Shang, Yonggang Wang, 2010. A multiobjective ACO algorithm for rough feature selection, in: 2010 Second Pacific-Asia Conference on Circuits, Communications and System. IEEE, pp. 207–210.
doi:10.1109/PACCS.2010.5627071.
【28】Hamdani, T.M., Won, J.-M., Alimi, A.M., Karray, F., 2007. Multi-objective Feature Selection with NSGA II, in: Adaptive and Natural Computing Algorithms. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp. 240–247.
doi:10.1007/978-3-540-71618-1_27.
【29】Xue, B., Fu, W., Zhang, M., 2014b. Multi-objective Feature Selection in Classification: A Differential Evolution Approach. Springer International Publishing, pp. 516–528.
doi:10.1007/978-3-319-13563-2_44.
【30】Paul, S., Das, S., 2015. Simultaneous feature selection and weighting – an evolutionary multi-objective optimization approach. Pattern Recogn. Lett. 65, 51–59.
http://dx.doi.org/10.1016/j.patrec.2015.07.007.
【31】Storn, R., Price, K., 1997. Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11, 341–359.
http://dx.doi.org/10.1023/A:1008202821328.
【32】Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197.
http://dx.doi.org/10.1109/4235.996017.
【33】Abido, M.A., 2003. A novel multiobjective evolutionary algorithm for environmental/economic power dispatch. Electr. Pow. Syst. Res. 65, 71–81.
http://dx.doi.org/10.1016/S0378-7796(02)00221-3.
【35】Aha, D.W., Kibler, D., Albert, M.K., 1991. Instance-based learning algorithms. Mach. Learn. 6, 37–66.
http://dx.doi.org/10.1007/BF00153759.
【36】John, G.H., Langly, P., 1995. Estimating continuous distributions in Bayesian classifiers, in: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, pp. 338–345.
【37】Moody, J., Darken, C.J., 1989. Fast learning in networks of locally-tuned processing units. Neural Comput. 1, 281–294.
http://dx.doi.org/10.1162/neco.1989.1.2.281.
【38】Quinlan, J.R., 1993. C4.5: Programs for Machine Learning, Morgan Kauffman, San Mateo, CA.
【39】Ling, C.X., Huang, J., Zhang, H., 2003. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms, in: Advances in Artificial Intelligence. Springer, Berlin, Heidelberg, pp. 329–341.
doi:10.1007/3-540-44886-1_25.