signature=a49c891ed9a20614ef99d156a996305e,Identification of a robust gene signature that predicts b...

Expression microarray technology promises to change phenotypic characterization of tumors, leading to better diagnosis, prognosis, and ultimately treatment of cancer. Expression profiling has been used to identify predictive gene sets in a diverse set of cancers, including lymphomas [7, 23, 24], prostate cancer [25], and breast cancer [3, 5, 10–12, 26]. These gene signatures are more robust than individual prognostic genes that have been identified. Unlike single gene predictors, gene sets are less likely to be influenced by variation in expression of one or two genes when classifying tumor specimens, since they use the entire set of genes to classify samples, not just one or two. Furthermore, tumors that are indistinguishable using traditional clinical parameters can be classified into good and poor outcome groups using predictive gene sets, and thus these sets may have the ability to outperform the traditional markers. However, the identities of genes within classifiers differ widely even for the same tumor types, despite the fact that the association of specific expression patterns with tumor phenotypes is clear. Robust gene selection techniques and extensive validation are required to identify the gene sets which best predict patient outcome.

We identified multiple gene sets based on several predictive models, then validated them with an independent tumor set. We used three methods to identify potential predictive gene sets, all of which resulted in significant differences in survival of the predicted good and poor prognosis groups in both the test and validation tumor sets. We reasoned that the most robust gene set was one which consisted of all of the overlapping genes from the three selection methods, consisting of 21 genes. As with the other 3 gene sets, this predictive set resulted in significant differences in survival for the good and poor prognosis groups.

Multivariate analysis of the 21 gene set indicated that it was not an independent prognostic marker when used in combination with nodal status or stage for prediction of outcome in our tumor set (data not shown). However, because of the small number of events in the data set, we had limited power for determining the predictive ability of the gene set in a multivariate model.

We tested the performance of our gene sets in the tumor sets of van't Veer et al. and Sotiriou et al. There are several caveats with such an approach. First, it has been observed that there is often disagreement between microarray results profiled on different platforms [22, 27]. Part of this may be attributable to different probe regions represented (i.e., splice variants), different hybridization, washing, imaging, and data normalization and analysis methods, and the use of different (or no) reference samples. Another problem comparing microarray results is that there are often genes missing from one platform that are identified as being predictive using the second gene set. Obviously, these cannot be included in the comparative analysis, so the model is weakened by the absence of core predictive genes. Finally, as in our data, there is a difference in the way tumors were selected, so that genes that are predictive in a diverse set of tumors such as ours may not perform as well in a more homogeneous tumor set such as that of van't Veer. Indeed, we found that the performance of our gene sets was not as good in the tumor sets of van't Veer et al. and Sotiriou et al., but still resulted in significant differences in survial with Kaplan Meier analysis. Furthermore, the performance of our predictive gene sets was comparable to those of van't Veer and Sotiriou in our tumor sets, although none of their gene sets resulted in significant differences in survival with Kaplan-Meier analysis when used on our tumor set. In particular, the failure of the van't Veer predictive set to correctly classify our tumors likely reflects the fact that more than half of their predictive genes were missing from our microarrays.

It worth noting that each predictive gene set performed best within the tumor set from which it was derived. This is not surprising, since there is an inherent bias introduced by testing genes on the tumors from which the genes were selected. Thus, it is important to validate the predictive utility of genes in independent data sets, which requires making such data sets publicly available. Ideally, a common platform will come into use so that investigators will be able to make easy comparisons between experiments, without having to exclude potentially important genes from their validation analyses. This would also allow a comprehensive meta-analysis of genes in common to the predictive gene lists under investigation to identify those with the strongest prediction of breast cancer outcome. To date, the best solution to these problems has been the development of more rigorous statistical techniques and better laboratory practices, which improve concordance in cross-platform comparisons [28].

One striking observation is the minimal overlap between genes in the predictive gene sets developed by us and those of van't Veer et al. and Sotiriou et al. Sotiriou reported that their predictive gene set had 15 genes in common with the van't Veer set [10]. We had 4 genes in common with the van't Veer predictive gene set, and 11 genes in common with the Sotiriou predictive gene set. All three sets had 1 gene in common (MAD2L1). However, it is interesting to note that of 16 predictive genes utilized in a more recent study [22], seven were also found to be predictive in at least one of our gene sets. While the overlapping genes are likely to be important in outcome prediction, it may be inappropriate to focus entirely on these genes. The comparisons between the three different microarrays are by no means comprehensive. For example, almost 100 of the genes identified by van't Veer as being predictive were not present on our arrays. Thus, the lack of overlap between the predictive sets may reflect the lack of overlap of the arrays in general. A recent study examining poor overlap in predictive gene sets derived from separate studies in breast cancer [29] indicated that the poor gene overlap was due to the fact that a number of genes showed correlation with survival, but that these associations vary greatly between subsets of patients.

Interestingly, while the overlap between the gene sets is minimal, there is some evidence that similar families of genes are found within the different classifiers. Thus, it may be possible to identify a set of genes, each of which is interchangeable with the other members of that gene set with respect to their predictive abilities. For example, we found that high levels of ESR1 and the MYB gene, which has been shown to be coordinately expressed with ESR1 [5, 26], were both predictive of outcome in our data set. Neither the van't Veer nor Sotiriou predictive gene sets contained these genes, but the predictive set of Sotiriou et al. [10] included GATA3, a gene which has also been shown to be coordinately expressed with ESR1. Thus, it is possible that ESR1, GATA3, or MYB may be surrogates for one another in predicting outcome.

Unsupervised clustering of the entire data set resulted in separation of the tumors based primarily on their ER status. This has been observed previously by several groups [5, 26], and is a strong factor driving unsupervised clustering in breast cancer. There was a strong correlation between ESR1 expression and ER status as measured by IHC. While promoter methylation and chromatin condensation of ESR1 gene seems to be the predominant mechanism for ablation of ER protein expression [30], the finding that a number of ER negative tumors had higher than average levels of ESR1 gene expression suggests that some tumors may be ER negative due to post-transcriptional events.

The putative transcription factor ZNF217 was identified within our predictive gene sets, and overexpression was associated with poor outcome in our breast cancer patients. This gene was originally identified as a potential target oncogene from the 20q13 region, which is commonly amplified in breast and other cancers and has been associated with poor prognosis [31, 32]. Subsequent analysis of ZNF217 has shown that it is capable of immortalizing human mammary epithelial cells [33]. Interestingly, in addition to ZNF217, we observed that high levels of expression of several genes from the 20q13 region were associated with poor prognosis in the breast cancer patients. Included in this region were STK15, which has recently been suggested to be a candidate low-penetrance tumor susceptibility gene in breast cancer [34] and MYBL2, which has been shown to be overexpressed along with STK15 and ZNF217 in prostate cancer [35]. Overexpression of STK15 and MYBL2 was found to be associated with metastases in these prostate tumors [35], and in our data set high levels of expression of all three genes were associated with poor prognosis. Interestingly, both MYBL2 and STK15 were found to be predictive by Paik et al. [22], and STK15 was found to be predictive by van't Veer et al[12].

Generation of predictive gene sets has been done primarily in mixed tumor sets (e.g., ones that include both ER negative and ER positive, and with the exception of van't Veer et al, in node negative and node positive samples). By examining the classification rate within the tumor subgroups, it is evident that our predictive gene sets tend to perform better in ER positive, Stage I and II, and node negative tumors. Similarly, the gene prediction classifier tended to perform better in patients who did not receive chemotherapy or radiation. Together, these results are encouraging, since these tumors tend to have better prognosis in general and thus it is difficult to determine which patients are at highest risk. Future studies to identify predictive gene sets within clinically homogeneous subgroups of breast cancer may further improve outcome prediction based on genetic signatures.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值