论文题目:Recent Advances of deep learning in bioinformatics and computational biology
scholar 引用:2
页数:10
发表时间:2019.03
发表刊物:frontiers in genetics
作者:Binhua Tang1,2*†, Zixiang Pan1†, Kang Yin1 and Asif Khateeb1
摘要:Keywords: computational biology, bioinformatics, application, althorithm, deep learning
Extracting inherent valuable knowledge from omics big data remains as a daunting problem in bioinformatics and coomputational biology. Deep learning as an emerging branch from machine learning, has exhibited unprecedented perfromance in quite a few applications from academia and industry. We highlight the difference and similarity in widely utilized models in deep learning studies, through discussing their basic structures, and reviewing diverse applications and disadvantages. We anticipate the work can serve as a meaningful perspective for further development of its theory, algorithm and application in bioinformatic and computational biology.
结论:
- how to decipher and characterize data feature is an important work in deep-learning workflow.
- It is a new direction for deep learning to integrate or embed with other conventional algorithms in tackling complex geometric transform tasks.
- deep learning achieves pervasive successes in bioinformatics and computational biology.
- deep learning should not be misinterpreted or overestimated either in academic or AI industry.
Introduction:
- Deep learning is the emerging generation of the artificial intelligence techniques, specifically in machine learning.
- The basic concepts and models in deep learning have derived from the artificial neural network.
- This review summarize the essential concepts and recent applications of deep learning, together highlight the key achievements and future directions of deep learning, especially from the perspectives of bioinformatics and computational biology.
正文组织架构:
1. Essential concepts in deep neural network
- basical structure of neural network
- learning by training, validation and testing
- activation and loss function
2. Typical algorithms and applications
- recurrent neural network
- convolutional nerual network
- autoencoder
- deep belief network
- transfer learning in deep learning
正文部分内容摘录:
- 这篇文献的图感觉看起来不太简洁,过多了填充了 背景色的原因吧?有点影响阅读。
- Neural net work is a class of information processing modules, frequently utilized in machine learning.
- the input raw datasets are usually separated into two or three groups.
- 文中的validation sets好像不是必需的,但是我的理解中没有验证集的话,是仅靠loss function进行调参吗?「验证数据集」主要用于调整超参数和数据准备时对模型进行评估,而「测试数据集」主要用于对比多个最终模型时的评估。如果采用 k-折交叉验证等重采样方法,尤其当重采样方法已经被嵌套在模型验证中时,「验证数据集」和「测试数据集」的概念可能会淡化。https://blog.csdn.net/JNingWei/article/details/78170171 这个帖子中也说验证集是必需的~
- model parameters and their characteristics normally can be tuned by various learning paradigms.
- Graphic Processing Unit (GPU)
- RNN only has one hidden layer but it can unfold horizontally and multi-vertical-groups are enabled to utilize most of the previous results, namely "using memory".
- RNN is suitable to deal with long and sequential data, such as DNN array and genomics sequence.
- But RNN cannot interact with hidden neurons far from the current one.
- To construct an efficient framework of recalling deep memory, many improved algorithms have been proposed, like BRNN in protein secondary structure prediction, and MDRNN in analyzing electron microscopy and MRIs of breast cancer samples.
- LSTM and GRU are two recently-improved derivatives of RNN to solve the long-time dependency issues.
- DeepDiff can effectively predict cell-type-specific gene expression.
- CNN are suitable to process information in the form of multiple arrays.
- Recently, CNN has been adopted rapidly in biomedical imaging studies.
- DeepChrome is proposed to predict gene expression by feature extraction from histone modification.
- Through an unsupervised manner, autoencoder is another typical artificial nerual network, disigned to precisely extract coding or representation features using data-driven learning.
- Autoencoder can compress and encode information from the input layer into a short code, then after specific processing, it will decode into the output closely matching the original input.
- Similar to traditional PCA in dimension reduction to some extent, but autoencoder is more robust and effective in extracting data features for its non-linear transformation in hidden layers.
- stacked sparse autoencoder (SSAE) was proposed to analyze high-resolution histopathological images in breast cancer
- 这篇文章的各种机器学习的方法在生物信息领域的应用可以整理一张表出来。
- Besides the above deep learning models, transfer learning is frequently utilized in specific cases without sufficient labeling information or dimensionality。
Typical Algorithms | Specific methods | Applications |
Recurrent Neural Network(RNN) | DNA arrays and genomics sequence | |
BRNN | protein secondary structure prediction | |
MDRNN | analyzing electron microscopy and MRIs of breast cancer | |
DeepDiff | predict cell-type-specific gene expression | |
Convolutional Neural Network (CNN) | biomedical imaging studies | |
CT scans and MRI images from head trauma, stroke diagnosis and brain EPV(enlarged perivascular space) detection | ||
DeepChrome | predict gene expression | |
CNN and RNN | predict imaging content | |
Autoencoder | SSAE(stacked sparse autoencoder) | analyze high-resolution histopathological images in breast cancer |
SAE(sparse autoencoder) | prediction of protein secondary structure, local backbone angles, and solvent accessible surface area | |
denoising autoencoder (DAE) | predict features from a large scale of electronic health records (EHR) | |
randomized denoising autoencoder marker (rDAm) | predict future cognitive and neural decline for Alzheimer diseases | |
Deep Belief Network(DBN) | classified schizophrenia patients based on brain MRIs | |
perform quantitative structure activity relationship (QSAR) study | ||
study the combination of resting-state fMRI (rs-fMRI) | ||
gray matter, and white matter data by exploiting the latent and abstract high-level features | ||
medical image diagnosis | ||
Transfer Learning in Deep Learning | Ensembled with CNN | attain greater prediction performance of interstitial lung disease CT scans |
multi-layer LSTM and conditional random field (CRF) | target datasets |