2021-09-09 文献阅读 Predicting circRNA-disease associations based on autoencoder and graph embedding

Predicting circRNA-disease associations based on autoencoder and graph embedding
文章链接 https://doi.org/10.1016/j.ins.2021.04.073

摘要

Circular RNAs (circRNAs) are a special kind of non-coding RNA.
They play important regulatory role in diseases through interactions of miRNAs associated with the diseases.
Due to their insensitivity to nucleases, they are more stable than linear RNAs.
It is thus imperative to integrate available information for predicting circRNA-disease associations in humans.

Here, we propose a computational model to predict circRNA-disease associations based on accelerated attributed network embedding (AANE) algorithm and autoencoder(AE).

  • First, we use AANE algorithm to extract low-dimensional features of circRNAs and diseases
  • and then stacked autoencoder (SAE) to automatically extract in-depth features.
  • The features obtained by AANE and the SAE are integrated
  • and XGBoost is used as a binary classifier to get the predicted results.

The proposed model has an average area under the receiver operating characteristic curve value of 0.8800 in 5-fold cross validation and 0.8988 in 10-fold cross validation.

  • The factors that can affect the performance of the model are discussed
  • and some common diseases are used as case studies.
  • Results indicated that the model has great performance in predicting circRNA-disease associations.

在这里插入图片描述

db : circRNAs and diseases

Several databases have been constructed [41] and contain associations between circRNAs and diseases.
Some of these are

  • circRNADisease [50],
  • circR2Disease [7],
  • circ2Disease [43],
  • Circ2Traits [15],
  • circFunBase [32].

Furthermore, the similarity matrixes of circRNAs and diseases can be computed from such databases as

  • OMIM (Online Mendelian Inheritance in Man), which is an online catalog of human genes and genetic disorders [16] and others like it.
  • Similarly, DO (Disease Ontology) [33] semantically integrates disease and medical vocabularies through extensive cross mapping of DO terms to
    • MeSH (Medical Subject Headings) [23],
    • ICD (International Classification of Disease),
    • NCIs thesaurus,
    • SNOMED (the Systematized NOmenclature of MEDicine),
    • OMIM,
    • disGeNET, a database of disease-gene associations [28],
    • MEDIC, a practical disease vocabulary used at comparative toxicogenomics database [5],
    • CIRCpedia V2, an updated database for comprehensive circular RNA annotation and expression comparison, and so on [6].

Most of these databases are manually collected and integrated circRNA-disease associations that have been verified through some biomedical literature. Some methods have been proposed to predict circRNA-disease association based on one hypothesis: that similar circRNAs are likely associated with same diseases.

former work

With developments in high-throughput sequencing technology, more and more relationships between circRNAs and diseases have been found.
Many models now predict circRNA-disease associations. For example,

  • DWNN-RLS [42], proposed by Yan et al., is based on the regularized least squares of kronecker product kernel;
  • GCNCDA, proposed by Wang et al., is based on graph convolutional network algorithm [38], and so on.

In this study, we propose a computational model based on AANE algorithm and deep learning for predicting circRNA-disease associations. The AANE algorithm and deep neural network extract features and information of circRNAs and diseases, put them into a binary classifier and then compare predicted results with those of previous models. The influence of some hyper-parameters on the model performance is also analyzed.

model advantages

The proposed model has the following advantages:

  • (1) It makes full use of known circRNA-disease associations and direct information on circRNAs and diseases;
  • (2) It integrates different methods of extracting features with SAE used for advanced features of circRNAs and diseases;
  • (3) With parallel optimization used in XGBoost algorithm, the running time of the algorithm is greatly reduced;
  • (4) The input layer consists of two one-hot encoding vectors, which are used to describe disease and circRNA.
  • The fusion of AANE and SAE produces a model that has both linear and non-linear learning ability. Results show that the AUC value of the proposed model reaches 0.8800 and 0.8988 in 5-fold cross-validation and 10-fold cross-validation, respectively.

model

在这里插入图片描述

dataset

Known circRNA-disease associations in circR2Disease database
all unlabeled samples are treated as the negatives

density calculate

在这里插入图片描述

features

Construct similarity matrices of circRNAs and diseases

  • Construct circRNA similarity network

    • The exoRBase database is used to construct the expression profile similarity of circRNAs, as the database provides the expression level in original tissues. For a given circRNA, a feature vector with dimension 32 is obtained if it has expression profiles data.
      在这里插入图片描述
    • GIP kernel similarity is another component of multi-source information on circRNAs, which is widely applied to construct molecular interaction similarity matrices.
      在这里插入图片描述
  • Construct disease similarity network

    • MeSH (Medical Subject Headings) database to construct the disease semantic similarity network, MeSH is the NLM (National Library of Medicine) controlled vocabulary thesaurus used for indexing articles for PubMed. A directed acyclic graph (DAG) is used to describe a disease according to its tree numbers and semantic terms
    • GIP kernel similarity network of disease

Multi-source data fusion

In this study, we extract features of circRNAs and diseases from two aspects, one, to extract low-dimensional features by using AANE algorithm, another, to extract potential features with SAE algorithm, In order to make full use of the information, we needed to fuse the features in a very simple way.
在这里插入图片描述

result

在这里插入图片描述

在这里插入图片描述

case study

we chose the top 20 candidate circRNAs of each cancer as our predicted disease-related circRNAs, then use recent literatures to verify the predicted results.
在这里插入图片描述

conclusion

Existing studies have shown that circRNAs paly important role in gene regulation and expression at the level of transcription and post transcription. A gradual understanding of the structure and function of circRNAs, has indicated that circRNAs are very important in the occurrence and development of diseases. Identifying or predicting circRNA-disease associations can thus help our understanding of human complex diseases.

Compared with biological experiments, machine learning method can greatly reduce costs.

In this study, we proposed a new method based on graph embedding and deep learning.

  • The graph embedding method was used to extract linear and
  • shallow features of circRNAs and diseases, and deep learning technique was used to extract non-linear and in-depth features.
  • These two features were then combined as one feature and put into XGBoost classifier binary classifier in our study.
  • We furthermore compared our model with four other methods: KATZHCDA, NCPCDA, SIMCLDA and CD-LNLP, the AUC value reaching 0.8800 in 5-fold cross validation and 0.8988 in 10-fold cross validation.
  • For case studies, we used two common diseases(Gastric Cancer and Esophageal Squamous Cell Carcinoma) to evaluate our model.
  • For the top 20 circRNAs related to each disease, most predicted associations had been reported in literatures.
  • Thus our model predicted the associations between circRNAs and diseases effectively.

In future research, the number of hidden layers in SAE would be increased to extract deeper features and more biological information would be added to the algorithm to obtain a more accurate and robust model.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值