【计算机科学】【2017.05】基于深度神经网络的特征选择

在这里插入图片描述

本文为比利时列日国立大学(作者:Nicolas Vecoven)的硕士论文,共77页。

变量和特征选择已经成为许多研究的焦点,特别是在生物信息学中有许多应用。机器学习是选择特征的有力工具,然而并非所有的机器学习算法在特征选择方面都处于同等的地位。事实上,人们已经提出了许多方法来利用随机森林进行特征选择,这使得它们成为当前生物信息学的热门模型。

另一方面,由于所谓的深度学习技术的出现,神经网络在过去几年中开始迅速发展。然而,神经网络是一种黑盒模型,很少有人试图分析其底层实现过程。的确,可以找到很多关于使用神经网络进行特征提取的文章(对于神经网络,底层的输入-输出过程不需要被理解),而很少涉及特征选择。

为了利用深度神经网络进行特征选择,本文提出了一些新的算法。为了评估我们的结果,我们设计了回归和分类问题,允许我们从性能、计算时间和约束等多个方面比较每种算法。本文所获得的结果非常有希望,因为我们实现了在各种情况下超越(或对等)随机森林算法的性能。由于在人工数据集上获得了非常有希望的结果,我们也解决了DREAM4的挑战。由于该数据集中可用的样本数量非常少,这个挑战对于神经网络来说可能是一个不适合的问题。然而,我们仍然能够达到几乎所有期望的效果。

最后,给出了我们研究的大多数方法的扩展方案。实际上,本文所讨论的算法非常模块化,并且可以针对所面对的问题进行调整。例如,我们解释了我们的某一种算法如何通过修剪以适应神经网络而不损失准确性。

Variable and feature selection have becomethe focus of much research, especially in bioinformatics where there are manyapplications. Machine learning is a powerful tool to select features, howevernot all machine learning algorithms are on an equal footing when it comes tofeature selection. Indeed, many methods have been proposed to carry out featureselection with random forests, which makes them the current go-to model inbioinformatics. On the other hand, thanks to the so-called deep learning,neural networks have benefited a huge interest resurgence in the past fewyears. However neural networks are blackbox models and very few attempts havebeen made in order to analyse the underlying process. Indeed, quite a fewarticles can be found about feature extraction with neural networks (for whichthe underlying inputs-outputs process does not need to be understood), whilevery few tackle feature selection. In this document, we propose new algorithmsin order to carry out feature selection with deep neural networks. To assessour results, we generate regression and classification problems which allow usto compare each algorithm on multiple fronts: performances, computation timeand constraints. The results obtained are really promising since we manage toachieve our goal by surpassing (or equaling) random forests performances inevery case (which was set to be our state-of-the-art comparison). Due to thepromising results obtained on artificial datasets we also tackle the DREAM4challenge. Due to the very small number of samples available in the datasets,this challenge is supposedly an ill-suited problem for neural networks. We werenevertheless able to achieve near state of the art results. Finally, extensionsare given for most of our methods. Indeed, the algorithms discussed are verymodulable and can be adapted regarding the problem faced. For example, weexplain how one of our algorithm can be adapted in order to prune neuralnetworks without losing accuracy.

1 引言
2 深度神经网络回顾与特征选择的研究动机
3 方法研究与解释
4 应用:基因调控的推断
5 结论与展望
附录A 超立方体数据集生成
附录B 硬件与软件的详细描述

下载英文原文地址:

http://page5.dfpan.com/fs/elc4j2e21f2951667a7/

更多精彩文章请关注微信号:在这里插入图片描述

  • 0
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值