【论文笔记】Deep Neural Networks for High Dimension, Low Sample Size Data

最新推荐文章于 2023-09-30 23:12:22 发布

酸菜鱼_2323

最新推荐文章于 2023-09-30 23:12:22 发布

阅读量1.1k

点赞数 1

分类专栏：论文笔记

本文链接：https://blog.csdn.net/qq_40623047/article/details/113621874

版权

本文探讨了在生物信息学等领域中，处理高维小样本（HDLSS）数据所面临的挑战。为了解决这个问题，提出了Deep Neural Pursuit（DNP）模型，它结合特征选择和多dropout技术，以减轻由高维度引起的过拟合，并应对小样本大小带来的高方差梯度。DNP通过贪婪和增量的方式选择特征，同时训练分类器，实现端到端的学习。实验表明，DNP在合成数据和真实世界生物数据集上表现出色，优于Lasso、GBFS和HSIC-Lasso等方法。

摘要由CSDN通过智能技术生成

Deep Neural Networks for High Dimension, Low Sample Size Data

code
dataset
Introduction
Related Work
DNP Model
Experiments
Conclusions

Publication: IJCAI’17: Proceedings of the 26th International Joint Conference on Artificial IntelligenceAugust 2017

code

GBFS算法：http://www.cse.wustl.edu/˜xuzx/research/code/GBFS.zip（已连不上）
HSIC-Lasso code: http://www.makotoyamada-ml.com/software.html（页面中已过期）

dataset

Biological datasets: http://featureselection.asu.edu/datasets.php

Introduction

In bioinformatics, gene expression data suffers from the growing challenges of high dimensionality and low sample size. This kind of high dimension, low sample size (HDLSS) data is also vital for scientific discoveries in other areas such as chemistry, financial engineering, and etc [Fan and Li, 2006]. When processing this kind of data, the severe overfitting and high-variance gradients are the major challenges for the majority of machine learning algorithms [Friedman et al., 2000].

Feature selection has been widely regarded as one of the most powerful tools to analyze the HDLSS data. However, selecting the optimal subset of features is known to be NP-hard [Amaldi and Kann, 1998]. Instead, a large body of compromised methods for feature selection have been proposed.

Lasso [Tibshirani, 1996] pursue sparse linear models：sparse linear models ignore the nonlinear input-output relations and interactions among features.
nonlinear feature selection via kernel methods [Li et al., 2005; Yamada et al., 2014] or gradient boosted tree：address the curse of dimensionality under the blessing of large sample size.

The deep neural networks (DNN) methods light up new scientific discoveries. DNN has achieved breakthroughs in modeling nonlinearity in wide applications. The deeper architecture of a DNN is, the more complex relations it can model. DNN has harvested initial successes in bioinformatics for modeling splicing [Xiong et al., 2015] and sequence specificity [Alipanahi et al., 2015]. Estimating a huge amount of parameters for DNN using abundant samples may suffer from severe overfitting, not to mention the HDLSS setting.

To address the challenges of the HDLSS data, we propose an end-to-end DNN model called Deep Neural Pursuit (DNP). DNP simultaneously selects features and learns a classifier to alleviate severe overfitting caused by high dimensionality. By averaging over multiple dropouts, DNP is robust and stable to high-variance gradients resulting from the small sample size. From the perspective of feature selection, the DNP model selects features greedily and incrementally, similar to the matching pursuit [Pati et al., 1993]. More concretely, starting from an empty subset of features and a bias, the proposed DNP method incrementally selects an individual feature according to the backpropagated gradients. Meantime, once more features are selected, DNP is updated using the backpropagation algorithm.

The main contribution of this paper is to tailor the DNN for the HDLSS setting using feature selection and multiple dropouts.

Related Work

we discuss feature selection methods that are used to analyze the HDLSS data including linear, nonlinear and incremental methods.

sparsity-inducing regularizer is one of the dominating feature selection methods for the HDLSS data.
Lasso [Tibshirani, 1996] minimizes the objective function penalized by the l_1 norm of feature weights, leading to a sparse model. Unfortunately, Lasso ignores the nonlinearity and interactions among features.
(1) Kernel methods are often used for nonlinear feature selection.
Feature Vector Machine (FVM) [Li et al., 2005]；
HSIC-Lasso [Yamada et al., 2014] improves FVM by allowing different kernel functions for features and labels；
LAND [Yamada et al., 2016] further accelerates HSIC-Lasso for data with large sample size via kernel approximation and distributed computation
(2) Decision tree models are also qualified for modeling nonlinear input-output relations.
random forests [Breiman, 2001]
Gradient boosted feature selection (GBFS) [Xu et al., 2014]
The aforementioned nonlinear methods, including FVM, random forests and GBFS, require training data with large sample size.
HSIC-Lasso and LAND fits the HDLSS setting. However, compared to the proposed DNP model which is end-to-end, HSIC-Lasso and LAND are two-stage algorithms which separate feature selection from the classification
Besides DNP method, there exist other greedy and incremental feature selection algorithms.
**SpAM：**sequentially selects an individual feature in an additive manner, thereby missing important interactions among features.
Grafting method & convex neural network ：only consider single hidden layer；differ from DNP in the motivation.（Grafting focuses on the acceleration of algorithms and convex neural network focuses on the theoretical understanding of neural networks.）
Deep feature selection (DFS)：selects features in the context of DNN；However, according to our experiments, DFS fails to achieve sparse connections when facing the HDLSS data.