signature=1c1223d10236c18f125cb9f7a490588e,Combining discovery and targeted proteomics reveals a pro...

Clinical tissue sample collection

This retrospective cohort study was approved by the Research Ethics Committee of the Faculty of Medical Sciences at the University of Campinas (Campinas-SP, Brazil) through Plataforma Brasil protocol CAAE 23163113.5.1001.5404 and the Research Ethics Committee of the Piracicaba Dental School through Plataforma Brasil protocol CAAE: 71351517.7.0000.5418. The methods and experimental protocols of the present study were performed in accordance with the approved guidelines and informed consent was obtained from all human participants. Two sets of tissue samples were used: the 20-patient cohort 1 (Supplementary Data 1), with samples for mass spectrometry analysis, and the 125- and 96-patient cohorts (Supplementary Data 19, including approximately 14 cases used in cohort 1 for the discovery MS), with surgical samples for IHC staining. The first set comprised 20 cases of primary tongue squamous cell carcinoma, retrieved from two reference hospitals in Cascavel, Paraná, Brazil, the Oncology Center of Cascavel, CEONC and the UOPECCAN Cancer Hospital, collected over a 20-year period (from 1998 to 2008). The inclusion criteria were as follows: (a) demographic and complete clinicopathological data; (b) location of the tumor in the tongue; (c) type of treatment based on radical surgery with or without postoperative radiotherapy and/or chemotherapy; and (d) availability of surgical specimens in FFPE blocks. The presence of small and large neoplastic islands in the ITF and the inner tumor from hematoxylin and eosin (H&E)-stained histological sections was evaluated and used as inclusion criteria. As proposed by Bryne et al.

The information collected from the medical records and follow-up of patients included the following: sex; age; habits, such as smoking and alcohol consumption; tumor location; TNM stage; status of surgical margins; local recurrence and lymph node metastasis; distant metastasis; the presence of second primary tumors; treatment; and survival. After treatment, the patients were monitored for at least 5 years, and disease recurrence was histologically confirmed. Outcomes were classified as specific survival (SS) for the disease, the time from the beginning of treatment until death as a result of OSCC or the last follow-up information when the patient was alive, and DFS, the time from the beginning of treatment until diagnosis of the first recurrence (local, regional or distant) or last follow-up information for patients without recurrence. Stained H&E histological slides were evaluated according to four histopathologic classification systems, the WHO classification system

Sample preparation and LMD

Four histological slides were prepared for each of the 20 OSCC samples. Paraffin blocks were cut using a microtome with a thickness of 5 μm; the other three slides had a thickness of 10 μm. Histological sections with a thickness of 5 μm were stained with H&E to guide the LMD. The other three sections of 10 μm thickness were prepared using specific membrane slides (PEN Arcturus® Membrane, Life Technologies, Foster City, CA, USA) for LMD, which were deparaffinized in xylol, hydrated with decreasing concentrations (100, 90, 70, and 50%) of ethanol, washed in water and stained with hematoxylin blue for 8 min prior to drying for LMD

Samples were processed using Leica Laser Microdissection Systems. The microdissected areas were as follows: (1) small neoplastic islands from the ITF; (2) large neoplastic islands from the ITF; (3) small neoplastic islands from the inner tumor; (4) large neoplastic islands from the inner tumor; (5) stroma from the ITF; and (6) stroma from the inner tumor (Fig. 1b). Neoplastic islands from the ITF were collected from the farthest island in the invasive surface of the tumor, up to a depth of one millimeter in the histological section, and neoplastic cell islands from the inner tumor were cut from the farthest island inside the tumor, up to a depth of one millimeter (Fig. 1a). Average microdissected tissue areas of 100,000 µm2 and 1,000,000 µm2 were isolated for small neoplastic islands and large neoplastic islands, respectively, and an average area of 1,000,000 µm2 for stroma (Supplementary Fig. 1).

All samples were collected in 600 μL microtubes and stored at −80 °C. LMD was standardized for FFPE tissues with a thickness of 10 μm. The adjustable parameters in the LMD Laser Microdissection Leica software are considered optimum for these samples, and the area (µm2) cut for each patient was recorded.

Protein extraction and trypsin digestion

For protein extraction and digestion, samples were treated with 8 M urea, followed by protein reduction with dithiothreitol (5 mM for 25 min at 56 °C) and alkylation with iodoacetamide (14 mM for 30 min at room temperature in the dark). For protein digestion, urea was diluted to a final concentration of 1.6 M with 50 mM ammonium bicarbonate, and 1 mM of calcium chloride was added to the samples for trypsin digestion for 16 h at 37 °C (2 µg of trypsin)

Mass spectrometry analysis using DDA

The peptide mixture (4.5 µL) was analyzed using an LTQ Orbitrap Velos (Thermo Fisher Scientific) mass spectrometer coupled to nanoflow liquid chromatography on an EASY-nLC system (Proxeon Biosystems) with a Proxeon nanoelectrospray ion source. Peptides were subsequently separated in a 2–90% acetonitrile gradient in 0.1% formic acid using a PicoFrit analytical column (20 cm × ID75, 5 µm particle size, New Objective) at a flow rate of 300 nL/min over 212 min, in which a gradient of 35% acetonitrile is reached in 175 min. The nanoelectrospray voltage was set to 2.2 kV, and the source temperature was set to 275 °C. The instrument methods employed for LTQ Orbitrap Velos were set up in DDA mode. Full scan MS spectra (m/z 300–1600) were acquired in the Orbitrap analyzer after accumulation to a target value of 1e6. Resolution in the Orbitrap was set to r = 60,000, and the 20 most intense peptide ions (top 20) with charge states ≥2 were sequentially isolated to a target value of 5000 and fragmented in the high-pressure linear ion trap by CID (collision-induced dissociation) with a normalized collision energy of 35%. Dynamic exclusion was enabled with an exclusion size list of 500 peptides, an exclusion duration of 60 s and a repetition count of 1. An activation Q of 0.25 and an activation time of 10 ms were used. The run order of the samples is described in Supplementary Data 2.

Proteomic data analysis

One hundred and twenty LC-MS/MS runs were performed, in which 20 LC-MS/MS runs were conducted for each region (Fig. 1e; Supplementary Data 2). Raw data were processed using MaxQuant v1.3.0.3 software2).

Protein abundance, which was calculated based on the normalized spectrum intensity (LFQ intensity), was log2-transformed, and the dataset was filtered by minimum valid values in at least one group (ten valid values for neoplastic island samples and eight valid values for tumor stroma samples). Missing values for the LFQ intensity were imputed with random numbers from a normal distribution, the mean and standard deviation of which were selected to best simulate low abundance values close to the noise level (imputation width = 0.3, shift = 1.8)t test to identify differentially expressed proteins between the ITF and inner areas (P value 

GO annotation of biological processes for the proteome of neoplastic islands and tumor stroma was performed using the BinGO pluginP value 

Discovery proteomics and clinicopathological data correlation

Linear regression analysis was performed using the R code to evaluate the linear relationship between protein expression and the following clinicopathological variables: age (>40 or <40 years old for island samples, >50 or <50 years old for stroma), sex, smoking habits, tumor size, lymph node metastasis at diagnosis, clinical stage, type of treatment (surgery, surgery and radiotherapy, or a combination of surgery, radiation and chemotherapy), disease-free survival, second primary tumors (from different histological origins), local recurrence, lymph node recurrence, presence of the worst pattern of invasionP value 

IHC of OSCC tissues

Slides of the OSCC cases (125 cases for the evaluation of neoplastic island proteins and 96 cases for tumor stroma) were incubated with anti-PGK1 (SAB1300102, Sigma) diluted 1:50, anti-NDRG1 (HPA006881, Sigma) diluted 1:100, anti-LTA4H (HPA008399, Sigma) diluted 1:100, anti-CSTB (HPA017380, Sigma) diluted 1:250, anti-COL6A1 (HPA019142, Sigma) diluted 1:1000, anti-ITGAV (HPA004856, Sigma) diluted 1:300 and anti-MB (HPA003123, Sigma) diluted 1:75 according to The Human Protein Atlas and the manufacturers’ instructions and were assessed using the Envision detection system (Dako). The control reactions were performed by exclusion of the primary antibodies. The specificities of the antibodies employed were shown by western blot with seven cell line extracts.

Western blot analysis

Protein extracts were obtained from BJ-5ta (ATCC CRL-4001, a fibroblast immortalized cell line), CAF (a primary oral cancer-associated fibroblasts, α-SMA positive), SCC-9 (ATCC CRL-1629, a tongue cancer cell line), HSC3 (JCRB 0623, a human tongue squamous cell carcinoma cell line, Osaka National Institute of Health Sciences, Japan), SK-MEL-28 (ATCC HTB-72, malignant skin-derived melanoma cell line), MCF7 (ATCC HTB-22, a breast cancer cell line) and A549 (BCRJ 0033, an epithelial lung cancer cell line) and subjected to western blot (Supplementary Fig. 7). The antibodies were anti-CSTB (1:500, Sigma—HPA017380), anti-LTA4H (1:500, Sigma—HPA008399), anti-NDRG1 (1:500 Sigma—HPA006881), anti-PGK1 (1:500, Sigma—SAB1300102), anti-COL6A1 (1:1000, Sigma—HPA019142), anti-ITGAV (1:1000, Sigma—HPA004856), anti-MB (Sigma—HPA003123). Western blots were performed using 40 μg of total lysate protein and analyzed on a 10–15% SDS-PAGE gel. After incubation with secondary antibodies, visualization of the proteins were achieved by chemiluminescence with the ECL kit (Amersham Biosciences). Anti-ACTB (1:2000, Sigma—A1978) antibody was used as loading control. Uncropped scans of all blots are shown in Supplementary Fig. 7 in the Supplementary Information. The cell lines were tested for mycoplasma contamination.

IHC data analysis

Histological slides were independently evaluated by three pathologists in a blinded manner. The examiners were instructed to reach a consensus on discordant cases. The ITF and the inner tumor were scanned under a low power field to select the correct tumor area20−22).

Correlations between the immunostaining and clinical parameters of the tumors were performed via crosstabulation and the chi-square test. Furthermore, a survival analysis was calculated using Kaplan−Meier methodology and compared with the log-rank test. In the univariate survival analysis, the comparison was performed between the greater expression in the ITF in relation to the lower expression in the ITF. For the multivariate survival analysis, the Cox proportional hazard model with a stepwise method was used. For this analysis, the equal expression between the ITF and inner was clustered to the low expression data. P value 

Saliva sample collection

This study was approved by the Ethics Review Board of the Cancer Institute of São Paulo (ICESP), Octavio Frias de Oliveira, ICESP, São Paulo, SP, Brazil, and Plataforma Brasil through protocol CAAE 30658014.1.1001.0065. Informed consent was obtained from all patients. The procedures used for saliva collectionn = 14) and patients with lymph node metastasis (designated as N+, n = 26). The detailed clinical and pathological information for the 40 patients enrolled in this study are summarized in Supplementary Data 26. Individuals who had not eaten for at least 1 h first rinsed their mouths with 5 mL of drinking water, and saliva was subsequently harvested without stimulation into a glass receptacle. The saliva samples were aliquoted into 15 mL tubes and frozen at −80 °C for long-term storage until use.

Proteotypic peptides and transition selection

The proteins CSTB, LTA4H, NDRG1, PGK1, COL6A1, ITGAV, and MB were selected for verification in saliva patients without (N0) and with (N+) lymph node metastasis. Briefly, three proteotypic peptides per protein were selected based on the number of residues, hydrophobicity, and DDA and/or SRMAtlas evidence27). Eight or nine peptides with their respective 24 or 27 transitions of the internal retention time standard (Pierce™ Peptide Retention Time Calibration Mixture, Thermo Fisher Scientific) were monitored as a control for retention time shifts in liquid chromatography. A detailed description of the peptide and transition selection is provided (Supplementary Data 27) as follows:

Peptide selection: (1) Peptides were selected based on previous DDA data from the same samples that were used in the discovery phase. With our own DDA data, we generated spectral libraries using Skyline software; (2) In the cases we could not retrieve three proteotypic peptides per protein, considering the rules for proteotypic peptide selection

Transition selection: (1) All transitions monitored in saliva were initially obtained by spectral libraries built from our own DDA data and Human plasma DDA data (obtained from the PeptideAtlas repository); (2) In the cases of transitions that were not confidently detected, they were excluded from the method and were complemented with transitions informed by the SRMAtlas during the refinement of the SRM method; (3) The transitions were selected based on the rank of intensity identified in the spectral libraries or the SRMAtlas spectra. The three most intense transitions were preferably selected for further SRM analysis.

Saliva sample preparation for SRM-MS

The saliva samples were centrifuged at 1500 × g for 5 min at 4 °C to pellet the debris. The resulting supernatant was collected and quantified using the Bradford assay (Bio-Rad, Hercules, CA, USA). A volume that corresponded to 10 µg of total protein was used for sample preparation, and the sample volumes were adjusted. Ten micrograms of total protein were denatured in urea buffer (100 mM Tris-HCl pH 7.5, 8 M urea, 2 M thiourea, 5 mM EDTA, 1 mM PMSF, and 1 mM DTT) that contained Protease Inhibitor Cocktail Complete Mini Tablets (Roche, Auckland New Zealand). The samples were sonicated for 10 min and subsequently centrifuged at 10,000 × g for 5 min. The supernatant was collected, and the proteins were reduced with 5 mM DDT and alkylated with 14 mM iodoacetamide. Prior to the addition of trypsin, all samples were diluted 1:5 in 50 mM ammonium bicarbonate. Proteins were digested overnight at 37 °C using 1.8 µg of trypsin. After digestion, the reaction was terminated by the addition of trifluoroacetic acid. Desalting was performed by solid-phase extraction using Stage-tips C18 resin

To prevent bias during the measurements, the data collection was blocked and randomized for each group (N0 and N+). Samples from both the N0 and N+ groups were randomized using the R (v3.4.0) environment. Randomization was applied for each set of technical replicates (Supplementary Data 28). Each sample was analyzed in three technical replicates using the same instrument parameters as described below.

SRM-MS

Samples were analyzed on a Xevo TQ-XS triple quadrupole mass spectrometer (Waters, Milford, MA, USA) equipped with an electrospray ion source (Ion Key, Waters, Milford, MA, USA) with MassLynx software (version 4.2).

An aliquot that contained 1 µg of saliva peptide mixture was separated on a trap column (Waters Acquity UPLC BEH C18 130A, 5 µm, 300 µm × 50 mm) and a BEH Shield C18 IonKey column (10 cm × 150 µm ID packed with 1.7 µm C18 particles, Waters, USA) heated to 40 °C. Peptides were maintained at 4 °C in sample manager and loaded onto the column from an Acquity UPLC-Class M LC autosampler (Waters, Milford, MA, USA). Chromatographic conditions were as follows: 60-min gradient at a flow rate of 1.2 µL/min starting with 98% A (water), followed by 40% B (ACN) at 45 min with a step increase, followed by a step increase to 85% B until 47 min and 2% B at 60 min. Targeted acquisition of eluting ions was performed using the mass spectrometer operated in SRM-MS mode with Q1 and Q3 analyzers set to 0.7 Th FWMH and a cycle time of 3 s. For all SRM-MS runs, multiple scheduled injections with a 3-min elution window were used, each targeting three transitions per peptide. The optimal collision energy was determined for each peptide by Skyline

Quantitative and statistical analysis

Visualization and inspection of peaks were manually performed in Skyline. Both the light and heavy peptides were checked regarding the quality of the data by observing the alignment of light and heavy peptide elution times, the co-elution of all three transitions, relative intensity correlation with the spectral library (dotp, close to 1), relative intensity correlation of light with heavy transitions (rdotp close to 1), proximity to the predicted retention time and reproducibility in terms of retention time and intensity between technical replicates (Supplementary Fig. 8−9). All monitored peptides were detected and presented a measured retention time close to the predicted retention time among the replicates (r = 0.9955 for set 1 and r = 0.9994 for set 2, Supplementary Fig. 8).

The reproducibility based on sample group were assessed by Pearson correlation analysis (Supplementary Fig. 10), and visualization of sample grouping was assessed by unsupervised hierarchical clustering (Supplementary Fig. 11). Each peptide was quantified in a sample by dividing the intensity of the light peptide (sum of light transitions) by the intensity of each reference SIL peptide (sum of heavy transitions) to obtain the light-to-heavy peptide ratio.

Comparison of the levels of the monitored peptides between the groups of patients was performed using Mann−Whitney U test (not log transformed data). P values were adjusted for multiple comparisons using the Benjamini−Hochberg FDR methodP values less than 0.05 were considered statistically significant.

Targeted proteomics and clinicopathological data correlation

Correlations between the protein abundance in SRM and the clinicopathological data of the tumors were performed by crosstabulation and the chi-square test (Supplementary Data 32). P value 

Machine learning to predict power of prognostic signatures

To analyze the prediction power of combinations of peptides and proteins to distinguish OSCC patients by the presence of lymph node metastasis, the peptide and protein abundances of 14 patients (N0) and 26 patients (N+) were used (Supplementary Data 33). We split the dataset (40 samples) into a training set (80% of the dataset) and an independent test set (20% of the dataset).

The training set was used to perform filtering of the variables, cross-validation and running of the SES tool methodU test to obtain only variables with significant differences between the groups (P value 

To define the best classifier to evaluate the predictive power of signatures, we performed a repeated cross-validation (100 rep. of stratified tenfold cross-validation) on the training set and employed seven different types of machine-learning algorithms (Linear SVM, RBF SVM, Decision Tree, Logistic Regression, Random Forest, Perceptron, and Naive Bayes). For both levels (protein and peptide), Random Forest has the highest performance (Supplementary Data 34-35).

The steps used to obtain and validate the signatures are detailed in Fig. 7a. We performed a repeated cross-validation (100 rep. of stratified tenfold cross-validation) on the training set using Random Forest to classify samples in N0 versus N+, thus creating a list of potential signatures with the highest accuracy scores (Fig. 7b–d; Supplementary Data 36−37). Furthermore, we performed the SES method39). We generated ROC curves considering the 100 rep. of stratified tenfold cross-validation on the selected signatures (Fig. 7d). We compared the AUC of each signature, as well as their accuracy, specificity, sensitivity, and precision, as shown in Supplementary Data 38, to identify the most relevant results regarding the pairs of Accuracy and ROC AUC. Finally, we validated the models using the independent test set (Supplementary Fig. 12−13). Groups of signatures with both the highest accuracy (≥74% for peptides, >68% for proteins) and the highest AUC were selected as the most relevant features. Moreover, we applied a technique to oversample the training subsets and balance the number of samples in each class, using the Synthetic Minority Over-sampling Technique (SMOTE)40−41, Fig. 7e). The 1000 training subsets were synthetically oversampled, and the associated 1000 test subsets were composed of only original samples (nonsynthetic).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值