Tumor tissues
Three sets of frozen breast cancer tissues were used: Training set (n = 712) was collected at M. D. Anderson Cancer Center (MDACC), Hospital Clinico Universitario de Valencia, Spain, University of British Columbia, Vancouver, BC, and Baylor College of Medicine, Houston, TX. Complete clinical information was available for 541 patients. Test set (n = 168) was obtained from an independent group of patients enrolled in the Danish DBCG 82 b and c breast cancer studies [13, 14]. All tumors in the training and test sets were collected by excision during their primary surgery. Tumor content was verified by histopathology. The third set consisted of 256 FNAs obtained from primary breast cancers prior to NST of which 132 belonged to patients who subsequently received uniform taxane and anthracycline-based NST at MDACC (12 cycles of weekly paclitaxel or 4 cycles of every 3-week docetaxel, followed by 4 cycles of FAC or FEC100). All tissues were collected under Institutional Review Board-approved laboratory protocols.
Tumors were characterized for ER and PR status by immunohistochemistry (IHC), ligand-binding dextran-coated charcoal assay or reverse phase protein lysate array (RPPA). ER/PR positivity was designated when nuclear staining occurred in ≥10% of tumor cells, with ligand binding of ≥ 10 fmol/mg, or with a log2 mean centered cutoff of -1.48(ER) or +0.52(PR) by RPPA. Hormone receptor (HR) positivity was designated when either ER or PR was positive. HER2 status was assessed by IHC, fluorescent in situ hybridization (FISH) or RPPA. HER2 positivity was designated when 3+ membranous staining occurred in ≥10% of tumor cells, with a HER2/CEP17 ratio of > 2.0 or with a log2 mean centered cutoff of +0.82 by RPPA [15].
Reverse phase protein lysate microarray (RPPA)
RPPA was completed independently and at different time points for training and tests sets using individual arrays. Protein was extracted from human tumors and RPPA was performed as described previously [16–19]. Lysis buffer was used to lyse frozen tumors by homogenization (excised tumors) or sonication (FNAs). Tumor lysates were normalized to 1 μg/μL concentration as assessed by bicinchoninic acid assay (BCA) and boiled with 1% SDS. Supernatants were manually diluted in five-fold serial dilutions with lysis buffer. An Aushon Biosystems 2470 arrayer (Burlington, MA) created 1,056 sample arrays on nitrocellulose-coated FAST slides (Schleicher & Schuell BioScience, Inc.). Slides were probed with 146 validated primary antibodies (Additional File 1, Table S1) and signal amplified using a DakoCytomation-catalyzed system. Secondary antibodies were used as a starting point for amplification. Slides were scanned, analyzed, and quantified using Microvigene software (VigeneTech Inc., Carlisle, MA) to generate spot signal intensities, which were processed by the R package SuperCurve (version 1.01) [18], available at "http://bioinformatics.mdanderson.org/OOMPA". A fitted curve ("supercurve") was plotted with the signal intensities on the Y-axis and the relative log2 concentration of each protein on the X-axis using the non-parametric, monotone increasing B-spline model [18]. Protein concentrations were derived from the supercurve for each lysate by curve-fitting and normalized by median polish. Protein measurements were corrected for loading as described [15–17, 19]. For the selection of the 146 antibody set, we focused on markers currently used for breast cancer classification due to their value in treatment decisions (ER, PR, HER2). We then added additional antibodies to targets implicated in breast cancer pathophysiology, followed by antibodies to targets implicated in the pathophysiology of other cancer lineages. Final selection of antibodies was also driven by the availability of their high quality that could pass a strict validation process as previously described [20].
Statistical Methods
Detailed statistical methods are described in Additional File 2.
Identification of Prognostic Groups
To develop a set of markers for breast cancer classification and outcomes prediction, we used a hypothesis-driven approach, selecting markers according to their functional assignments and subsequently performing supervised proteomic clustering analysis to optimize the selection of groups with the most distinct recurrence-free survival (RFS) outcomes. We hypothesized that three functions would strongly affect the behavior and therapy responsiveness in breast cancer: ER function, grade/proliferation, and receptor tyrosine kinase activity. From the initial 146 antibodies, we selected markers within these three functional categories. We tested multiple combinations requiring that a minimum of one marker per functional category remain in each model. Unsupervised clustering analysis, using the uncentered correlation distance metric [21] and Ward's linkage rule [22], was applied to the training set to define groups and allow correlation with previously defined breast cancer subtypes. We then visualized the RFS curves to select the marker set that was associated with the clearest differences in RFS between the groups identified in the training set. Because of multiple testing and the possibility of false discovery, this model was locked and then applied to an independent test set to which the statistical analysis team was kept blinded. The selected protein groups were as follows: ER function (ER, ERpS118, ERpS167, PR, AR, EIG121, Bcl2, GATA3, IGF1R, and IGFBP2), grade/proliferation (CCNB1, CCND1, CCNE1, CCNE2, and PCNA), and receptor tyrosine kinase activity (cKit, EGFR, EGFRp1045, EGFRp922, HER2, HER2p1248, FGFR1, FGFR2, IGF1R, IGFRpY1135/Y1136).
RFS was estimated according to the Kaplan-Meier method and compared between groups using the log-rank statistic. Cox proportional Hazard Models were fitted using proteomic subgroups, selected markers and clinical variables.
Decision trees
We constructed a statistical model to predict the classes discovered by hierarchical clustering using a binary decision tree with a logistic regression model at each node. The split at each node was a union of two of the classes. Protein-by-protein two-sample t-tests between the two halves of the split were computed. The proteins were ordered by p-value and then added one at a time into a logistic regression model until the desired prediction accuracy was achieved. In order to avoid overfitting data, a default precision accuracy of 95% was set for each node. Finally, the Akaike Information Criterion (AIC) was used to eliminate redundant terms from the logistic regression model [23].
Validation of Prognostic Groups for RFS
The coefficients of the model, which used logistic regression at each node of a decision tree to place samples in one of six classes (or prognostic groups) were finalized and locked. An implementation of the model was provided to an independent analyst, along with the class predictions. The independent analyst was provided with the unblinded clinical data after implementation of the model. Cox proportional hazards models were then constructed using the predicted classes as covariates to test their association with RFS.
Validation of Prognostic Groups for pCR
We applied the algorithm to the last sample set (132 FNAs) and correlated the groups with response to NST. We clustered the samples as above and compared these clusters to the class labels predicted by the decision tree model with Cohen's kappa statistic [24, 25]. Using the predicted prognostic groups, we developed a Bayesian model to estimate the posterior probability of pCR in each group. We modeled the pCR rates as coming from a beta-binomial distribution [26].
Development of a Prognostic Score and its Application to Prediction of pCR
We next converted the six prognostic groups into a continuous prognostic score (PS) by fitting an ordinal regression model on the training set [27]. PS is a weighted linear combination of the relative protein concentration of the markers:
PS = -0.2841*ER - 1.3038*PR + 0.0826*Bcl2 -0.6876*GATA3 + 0.5169*CCNB1 + 0.1000*CCNE1 + 0.4321*EGFR + 0.5564*HER2 + 0.8284*HER2p1248 +
0.2424*EIG121.
We used this formula to compute PS on the test set; PS was associated with RFS estimates by the Cox proportional hazards model. We also used the same formula to compute PS on the NST treated FNA set. We fitted a logistic regression model using the NST response as the binary response variable (pCR vs. residual disease) and PS as a predictor. The prediction of response was evaluated by a receiver operating characteristics (ROC) curve.
Models for Recurrence-Free Survival and Likelihood of Pathologic Complete Response
A Cox proportional hazards model to estimate association with RFS was fit using each of the following covariates: prognostic group, tumor size, histologic grade, node status, each of the 10 protein markers, and PS. Using the same covariates, a logistic regression model was fit to estimate the association of each covariate with pCR. Stepwise multivariate model selection [28, 29] was used to determine the combination of covariates for the multivariate models.
All statistical analysis was performed in R 2.8.1. (R Development Core Team (2008). R: A language and environment for statistical computing (R Foundation for Statistical Computing, Vienna, Austria). http://www.R-project.org.