signature=fd4583a022a9bbdf58f488e20fb98a5c,Multi-level reproducibility of signature hubs in human in...

Network topology consistency of the hub protein lists

We first searched for signature hubs whose co-expressions with their interacting partners were significantly different between patients labelled non-metastatic and metastatic. We used the method proposed by Taylor et al. [13], as described briefly in Methods, in the dataset (the Wang dataset) compiled by Wang et al. [28] and in the dataset (the Desmedt dataset) compiled by Desmedt et al.[29]. Here, we did not apply the FDR control at the step of finding signature hubs because the statistical powers of most multiple test adjustment methods are decreased in the presence of wide and correlated expression changes of genes in cancers [30, 31]. Instead, we used a P value of 0.01 to find candidate signature hubs, as in the work by Taylor et al.[13]. With P < 0.01, we identified a total of 65 and 72 signature hubs in the Wang dataset and Desmedt dataset, respectively (See Additional file 1-Table S1 for the signature hubs.). Only 4 signature hubs appeared in both datasets and the percentage of overlaps (PO) score of the hub lists was only 5.9%. Thus, at the level of individual proteins, the signature hubs detected in different studies were extremely inconsistent, although the PO score was significantly larger than expected by chance alone (hypergeometric test P = 0.027).

Then, we evaluated the reproducibility of two lists of signature hubs by the POT score which measures the percentage of overlapped interaction neighbours of signature hubs extracted from different studies (see Methods). First, by the hypergeometric distribution model, with FDR < 0.05, we tested whether the interaction neighbours of a hub in a list overlapped significantly with the neighbours of at least one of the hubs in another list. Then, considering that signature hubs with significant neighbourhood overlaps might have similar functional roles, we calculated the POT score for two lists of signature hubs. The POT score between the lists of signature hubs extracted from the Wang dataset and the Desmedt dataset was as high as 73%.

Next, we did three random experiments to test whether the increased overlap might be introduced by some factors irrelevant to the disease status. First, for each dataset, we assigned phenotype labels randomly to patients to generate expression data with the same correlation structure as the original dataset, and then searched for signature hubs in the PPI network by the approach used with the real data. Because the phenotype information was randomised, the detected signature hubs should be irrelevant to disease status. Repeating this process 1000 times, we found the average of the POT scores for the random pairs of protein lists was 41%, which was significantly smaller than the score (73%) observed with the real data (P < 0.005). Second, we tested whether the increased reproducibility might be due to the network topology. From the same PPI network, we randomly selected 1000 pairs of protein lists with the same lengths as the signature hub lists and then computed their POT scores. The average of the POT scores for these random pairs of protein lists was 44%, which was significantly smaller than that observed (P < 0.005). Third, we tested whether the high level of reproducibility might be due to the high degrees (numbers of interaction partners) of signature hubs. Using a local rewiring algorithm [32], we produced 1000 random PPI networks in each of which all proteins had exactly the same connectivity as in the original PPI network and the choice of their interaction partners was random. Then, from each random network we selected the pairs of hub lists that had exactly the same lengths and degree distributions as the two lists of signature hubs extracted from the actual PPI network. Then, we recalculated the POT score for this random pair of hub lists. This process was repeated 1000 times. The average POT score for 1000 pairs of random hub lists was 42%, significantly smaller than that observed (P < 0.005).

Both false negatives and false positives are concerned for the PPI data quality [33, 34]. To tackle the low coverage problem introduced by false negatives, we integrated 8 databases to generate a large PPI network for our study. To reduce the effect of false positives, we also used a small PPI network which contained only the hand-curated PPI interaction data from OPHID [35] and MINT [36]. The POT score was decreased a little to 62% due to the smaller network size based on this PPI dataset. However, the POT score was significantly higher than those (20%, 29% and 17%) based on each of the three random experiments described above (P < 0.005), respectively. Two PPI networks generated similar POT scores, suggesting that our results were rather robust against false negatives and false positives in the PPI data.

Pathway consistency of the hub protein lists

If two signature hubs share many interaction neighbour proteins, then they might participate in the same or similar functions [26, 27]. To reveal the consistency of signature hub lists at the pathway level, for each signature hub identified from each dataset, we analysed the enrichment of its interaction neighbours in pathways collected in the Kyoto Encyclopaedia of Genes and Genomes (KEGG) [37] (see Methods). With FDR < 0.01, we found that 34 pathways were enriched significantly with the neighbours of at least one of the signature hubs detected in the Desmedt dataset, among which 26 pathways were included in the 38 significant pathways detected in the Wang dataset (See Additional file 1-Table S2 for the list of 26 pathways.). Notably, among the other 12 pathways detected in the Wang dataset but not in the Desmedt dataset, 11 were marginally significant in the Desmedt dataset with P < 0.05. Similarly, among the 8 pathways detected in the Desmedt dataset but not in the Wang dataset, 6 were marginally significant in the Wang dataset with P < 0.05. Thus, some inconsistency between the two datasets might come from a reduction of the statistical power by using the stringent FDR control for adjusting multiple tests when the multiple tests are not independent of each other [30, 31].

We did a random experiment to test the significance of the high concordance of pathway enrichment (see Methods). First, we took the 38 pathways identified from the Wang dataset as the gold standard. From each of the random networks produced by a local rewiring algorithm [32], we extracted a random hub list of the same length and degree distribution with the list of signature hubs identified from the Desmedt dataset. Then, we detected the pathways enriched with the neighbours of random hubs and compared them with the gold standard. Repeating this process 1000 times, we found the average number of overlapping pathways was 1, significantly fewer than the 26 overlaps observed in the real data (P < 0.001). The result was the same when taking the pathways detected from the Desmedt dataset as the gold standard.

The 26 pathways detected in both datasets included many pathways known to be deregulated in breast cancer pathogenesis, such as cell cycle, apoptosis, Jak-STAT, MAPK, ErbB, Wnt and P53 signalling pathways [38]. Among these 26 pathways, there were 191 and 238 interaction neighbours of the signature hubs identified from the Wang and Desmedt datasets, respectively, and they shared 114 proteins, which was significantly more than expected by chance alone (hypergeometric test P < 2.2 × 10-16). These common interaction neighbour proteins might have important roles in cancer. To test this, we assembled a list of 427 cancer susceptibility genes from the Cancer Gene Census database [39] and found 50 out of 114 neighbour proteins were known cancer proteins (hypergeometric test P= 6 × 10-4). When using the 685 genes collected in our F-census database [25], 100 out of 114 neighbour proteins were included (hypergeometric test with P < 2.2 × 10-16).

The above results suggested that the two lists of signature hubs might affect the same pathways. In one situation, in different cohort patients, a cancer-associated pathway could be affected by the co-expression changes of different signature hubs with the same set of neighbours enriched in this pathway. For example (Figure 1a), the interleukins IL2 and IL6 were identified as signature hubs from the Wang and Desmedt datasets separately and their overlapped neighbours were enriched in the Jak-STAT signalling pathway. Thus, changes of co-expression of these shared neighbours with either IL2 or IL6 might disrupt the Jak-STAT signalling pathway and contribute to the progression of cancer [40]. For another example (Figure 1b), 6 signature hubs identified from the Wang dataset and another 3 signature hubs identified from the Dsemedt dataset are all subunits of a ribosome complex for protein biosynthesis. They share other subunits as interaction neighbours and their deregulation might be associated with cell growth and proliferation [41]. In another situation, a cancer-associated pathway could be affected by changes of different signature hubs interacting with different sets of neighbours that were separately enriched in this pathway. For example (Figure 1c), proteins DUSP3 with degree 18 and CAD with degree 39 were identified as signature hubs in the Wang and Desmedt datasets separately. The neighbours of each of these two proteins were enriched in the MAPK signalling pathway associated with cancer metastasis [42], but their neighbours shared only 1 protein. It has been suggested that DUSP3 can negatively regulate members of the MAP kinase superfamily (MAPK) [43], while the deregulation of CAD proteins might be associated with activation of the MAPK cascade[44]. Notably, this functional relation between two signature hubs was not reflected by the POT score, which considers only overlapping neighbours between the signature hubs (see Discussion).

Figure 1

a819d9a53f9f6fd98d542d51197ab4f3.png

Examples of pathways shared by a signature hub from the Wang dataset and a signature hub from the Desmedt dataset. (a) JAK-STAT signaling pathway; (b) Ribosome complex; (c) MAPK signaling pathway. The yellow and red colors represent proteins (both hubs and their neighbors) identified from the Wang and Desmedt datasets, respectively. The orange colors represent the overlapped neighbors of these two hub proteins. Please see the main text for detailed explanation.

Co-expression consistency of the hub proteins lists

Considering that a signature hub disturbs functions through differential co-expression with their interaction neighbours [13], we further assumed that two functionally similar hubs should display consistent co-expression changes with their overlapping neighbours across different datasets [45, 46]. Therefore, for two hubs detected from two datasets separately, we additionally tested the consistency of the directions of their correlations with the shared neighbours across the datasets by the Bernoulli distribution model (see Methods).

With the co-expression restriction, for Wang and Desmedt dataset, the POT score (denoted as POG-e score) decreased a little from 73% to 67%, largely explainable when considering that any extra restriction may miss some true relations. On the other hand, the random POT-e score decreased greatly from 44% to 26%. The results suggested that signature hubs sharing neighbours were significantly consistent in the change directions of correlations with their shared neighbors. For example, from the Wang and Desmedt datasets separately, the interleukins IL2 and IL6 were identified as signature hubs and their 6 overlapped neighbours were enriched in the Jak-STAT signaling pathway. In both Wang and Desmedt datasets, the expressions of IL2 and IL6 were both positively correlated with the expressions of these shared neighbours in non-metastatic patients, but negatively correlated with the expressions of the shared neighbours in metastatic patients. These results suggest that Jak-STAT signaling pathway could be perturbed by the disruption of co-expressions of either IL2 or IL6 with the shared neighbours during the breast cancer metastasis.

Validation in three independent breast cancer datasets

We validated our results by analyzing three other independent datasets for breast cancer metastasis [2, 47, 48]. For lists of signature hubs extracted from every two breast cancer datasets, the PO score was less than 4%. However, the corresponding POT scores took values ranging from 61% to 75% which were all significantly larger than expected by chance according to the three random experiments as described in Methods. Similar results were observed based on the POT-e score (P< 0.005, see Additional file 1- Table S3 for details).

For example, 80 signature hubs were identified from the Vijver dataset, among which only 4 and 1 overlapped with the signature hubs found in the Wang and Desmedt datasets, respectively. However, the corresponding POT scores were 64% and 75%, respectively, and they were both significantly larger than expected by chance (P < 0.005), according to each of the three random experiments as described in Methods. Notably, although the average POT score between the Wang and Vijver datasets was only 64%, the POT score for the signature hub list extracted from the Vijver dataset to the signature hub list extracted from the Wang dataset was 71%, suggesting that many of the signature hubs detected from the Vijver dataset could be represented by the signature hubs from the Wang dataset in terms of neighbourhood similarity. The score in the opposite direction was only 57%, indicating that the samples used in the Vijver dataset might be insufficient for capturing enough signature hubs to cover the signature hubs extracted from the Wang dataset.

According to pathway enrichment analysis, the signature hubs extracted from the Vijver dataset and those from both the Wang dataset and the Desmedt dataset were highly consistent. Among the 26 pathways shared by the Wang and Desmedt datasets, 19 were included in the 34 pathways identified from the Vijver dataset, significantly more than expected by chance alone (hypergeometric test P= 5.2 × 10-5). All the other 7 pathways detected in both the Wang and Desmedt datasets were marginally significant in the Vijver dataset with P < 0.05. These results indicated that these pathways, such as MAPK signaling and apoptosis pathways which were also founded in other studies [11, 49], might be disturbed in the breast metastatic progression.

The above results confirmed that signature hubs detected from different datasets for breast cancer metastasis were reproducible in terms of neighbourhood protein overlap and, more generally, pathway overlap. Notably, approximately half of the patients in the Vijver dataset were lymph node-positive and underwent adjuvant therapy before expression profiling, whereas all patients in the Wang dataset had lymph node-negative breast cancer [11]. However, our results indicated that the two types of samples might have similar molecular changes at the pathway level.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值