摘要:
Clustering approaches that rely on a large number of variables, such as expression levels of thousands of genes, are often not well adapted to address the complexity and heterogeneity of tumors where small sets of genes may drive multiple cellular processes associated with carcinogenesis. Biclustering algorithms that perform local clustering on subsets of genes and conditions help address this problem. We propose a Tunable Biclustering Algorithm (TuBA) based on a novel pairwise proximity measure among gene pairs, which examines the relationship of samples at the extremes of genes' expression profiles to identify similarly altered signatures. The identified pairwise associations are illustrated graphically with nodes and edges representing the genes and the shared samples, respectively. Robust biclusters are then identified in these graphs iteratively. The consistency of TuBA's predictions was tested by comparing biclusters in 3,940 Breast Invasive Carcinoma (BRCA) samples from three independent sources, which employed different technologies for gene expression analysis (RNAseq and Microarray). Over 60% of the biclusters identified independently in each dataset had significant agreement among associated genes as well as similar clinical implications. About 50% of the biclusters were enriched in the ER-/HER2- (or basal-like) subtype, while more than 50% were associated with transcriptionally active copy number changes. Biclusters associated with gene expression patterns in non-malignant tissue were also found in tumor specimens. Overall, our method identifies a multitude of altered transcriptional profiles associated with the tremendous heterogeneity of diseased states in breast cancer, both within and across tumor subtypes, which is an important advance in understanding disease heterogeneity, and a necessary first step in individualized therapy.
展开