signature=1c1223d10236c18f125cb9f7a490588e,Microscaled proteogenomic methods for precision oncology

Patient-derived xenografts and drug treatment

For PDX studies, all animal procedures were approved by the Institutional Animal Care and Use Committee at Baylor College of Medicine (Houston, TX, USA) (protocol# AN-6934). 2–3 mm tumor pieces from PDX tumors were engrafted into cleared mammary fat pads of 3–4 weeks old SCID/bg mice (Envigo) and allowed to grow without exogenous estrogen supplementation until tumors reached 200–250 mm3. The human tissue used for PDX generation was collected at Washington University St Louis and with appropriate patient consenting. For the core and bulk comparison experiment, two non-adjacent cores were first obtained from the PDX models, immediately embedded in optimal cutting temperature (OCT) medium and snap-frozen in liquid nitrogen. Following coring, tumors were surgically resected, and the tumor bulk were snap-frozen in liquid nitrogen. For treatment experiments, mice were randomized into 4 groups receiving (i) vehicle or control; (ii) everolimus (5 mg per kilogram (kg) body weight in chow daily); (iii) trastuzumab (30 mg per kg body weight weekly by intraperitoneal injection) or iv) a combination of trastuzumab and everolimus (administered as described in (ii) and (iii)). There were n = 15 mice per arm. Tumor volumes were measured by caliper every 3–4 days. For all animal experiments, tumor volumes were calculated by \(V = \frac{4}{3} \ast \pi \ast \left( {\frac{{{\mathrm{Length}}}}{2}} \right)^2 \ast \left( {\frac{{{\mathrm{Width}}}}{2}} \right)\). Baseline samples were collected on the day of randomization and treatment start date followed by sample collection at 1-week and 4-week post-treatment. Animals were sacrificed when tumors reached 1500 mm3 or at the study end time-point.

DP1 Clinical Data

Following informed consent, patients diagnosed ERBB2 positive via diagnostic breast biopsy were enrolled in the National Surgical Adjuvant Breast and Bowel Project (NSABP) Biospecimen Discovery Project (DP1) for ERBB2 + breast cancer (https://clinicaltrials.gov/ct2/show/study/NCT01850628). In accordance with consent, regular cancer care and optional additional 14-gauge needle biopsies preserved in optimal cutting temperature (OCT) fixative were collected at diagnostic breast biopsy and 48 to 72 h following chemotherapy and anti-ERBB2 therapy. Blood samples were also collected and compacted to a frozen pellet before the start of standard treatment, up to 3 weeks after the first dose but before the second dose, and at the time of surgery and sent to Washington University (St. Louis, MO) for research purposes.

Biopsy samples, blood samples, and medical information (including pathology reports) were collected and labeled with a study number, which was a unique code assigned to samples and medical information. This unique code number, which is linked to the patient’s name, was kept separate from other sample information. Sample was given a separate unique BCN number for each patient (i.e., BCN“XXXX”) upon enrollment in the study. All subsequent sample derivatives were associated with the corresponding BCN number.

Patients were able to withdraw samples without any penalty or loss of benefits entitled. However, in order to protect the anonymity of the databases, DNA sequences or other information that came from samples were not removed once entered into databases to prevent the risk of identification.

Biopsy Trifecta Extraction embedding and sectioning

14-gauge needle human biopsies were embedded in OCT fixative and stored at −80 °C. Utilizing a cryostat maintained between −15 to −23 °C, each biopsy was sectioned at 50 microns. Six (6) 50-micron curls were alternated amongst three (3) 1.5 mL microcentrifuge tubes assigned for denatured protein-DNA, native protein-DNA, or RNA extraction. At the start of sectioning and after an interval of six (6) curls were sectioned, a 5 micron curl was mounted on a slide for Hematoxylin and Eosin (H&E) staining and histopathological confirmation. This process was repeated until six (6) 50-micron curls were collected in all tubes per sample. The samples were then shipped from Washington University (St. Louis, MO) to Baylor College of Medicine (Houston, TX) for subsequent processing. The tumor content percentages of each biopsy H&E slide (TC1, TC2, TC3, and TC4) were recorded and calculated to form a mean tumor content (avgTC) for that biopsy. Those biopsies with an avgTC less than 50% were removed from further processing.

Immunohistochemistry

Tissue sections on charged glass slides were cut to 5 µm and deparaffinised in xylene and rehydrated via an ethanol step gradient. Peroxidase blocking, heat-induced antigen retrieval, and primary antibody incubation were performed per standard protocol under the following abbreviated conditions: ERBB2 (SP3, Neomarkers) 1:100, Tris pH 9.0; AR (441, sc-7305, Santa Cruz) 1:50, Tris pH 9.0; Muc1 (sc-7313, Santa Cruz) 1:150, Citrate pH 6.0; CD3 (polyclonal, A0452, Dako) 1:100, Tris pH 9.0. All primary antibodies were incubated at room temperature for 1 h followed by standard chromogenic staining with the Envision Polymer-HRP anti-mouse/3,3′diaminobenzidine (DAB; Dako) process. Immunohistochemistry scoring were performed using established guidelines, when appropriate. All IHC results were evaluated against positive and negative controls.

BioTExt denatured protein extraction

1 mL of cold 70% ethanol (EtOH) was added to tubes assigned for denatured protein. Each tube was quickly pulse-vortexed for 30 s and briefly centrifuged at 20,000 x g for 5 min at 4 °C. The 70% EtOH was carefully aspirated. 1 mL of cold NanoPure water was added, and the tube was quickly pulse-vortexed for 30 s and briefly centrifuged at 20,000 x g for 5 min at 4 °C. The NanoPure water was carefully aspirated. 1 mL of cold 100% EtOH was added, and the tube was quickly pulse-vortexed for 30 s and briefly centrifuged at 20,000 x g for 5 min at 4 °C. The 100% EtOH was carefully aspirated. 100 µL of denatured protein lysis buffer (8 M urea, 75 mM NaCl, 1 mM EDTA, 50 mM Tris-Cl pH 8.0, 10 mM NaF, Phosphatase inhibitor cocktail 2 (Sigma; P5726), Phosphatase inhibitor cocktail 3 (Sigma; P0044), Aprotinin (Sigma; A6103), Leupeptin (Roche; 11017101001), PMSF (Sigma; 78830)) was added to each sample, which was then transferred to a micro-sonicator vial. All samples were incubated on ice for 10 min. Following incubation, samples were individually sonicated in the S220 Ultrasonicator for 2 min at peak power: 100.0, duty factor: 10.0, cycles per burst: 500. Lysates were transferred to 1.5 mL labeled tubes and centrifuged at 4 °C, maximum speed (20,000xg), for 30 min. Lysate supernatants containing denatured proteins were transferred to a new labeled tube. The remaining precipitated pellets were snap frozen for subsequent DNA isolation. Quality control of the denatured protein was validated via mass spectrophotometer analysis.

BioTExt DNA extraction

DNA was isolated via QIAamp DNA Mini Kit (Qiagen; 51306). DNA pellets were equilibrated to room temperature. 100 µL of Buffer ATL and then 20 µL of proteinase K were added to each sample and mixed by vortex. Samples were then incubated at 56 °C for 3 h in a shaking heat block. Following incubation, samples were briefly centrifuged. 20 µL of RNase A (20 mg per mL) was added to each sample, which was then pulse-vortexed for 15 s and incubated for 2 min at room temperature. Samples were briefly centrifuged then pulse-vortexed for 15 s and incubated at 70 °C for 10 min. Following a brief centrifugation of the sample, 200 µL of Buffer AL was added to the sample, which was then pulse-vortexed for 15 s and incubated for an additional 70 °C for 10 min. Following another brief centrifugation, samples were carefully applied to a corresponding QIAamp Mini spin column placed in a collection tube without wetting the rim. The spin columns with sample were centrifuged at 6000xg for 1 min and then placed in a new collection tube while discarding the original filtrate. 500 µL of Buffer AW2 was added to spin columns without wetting the rim. Spin columns were centrifuged at maximum speed (20,000xg) for 3 min. Following centrifugation, the spin columns were placed in new collection tubes and once again centrifuged at maximum speed for 1 min. Spin columns were then placed in new 1.5 mL micro-centrifuge tubes. 100 µL of Buffer AE was added to each spin column and incubated at room temperature for 5 min while in a shaking heat block. The final DNA isolates were collected in their corresponding 1.5 mL tubes following centrifugation at 6000xg for 1 min. DNA quality control was validated via Picogreen analysis.

BioTExt RNA extraction

1 mL of TRIzol Reagent (Thermo Fisher Scientific; 15596026) was added to each RNA-designated tube of cryo-sectioned curls, which was immediately inverted three times followed by transfer of its contents to a sonicator vial. Samples were individually sonicated in the S220 Ultrasonicator for 2 min at peak power: 100.0, duty factor: 10.0, cycles/burst: 500. All samples were then incubated for 5 min and then transferred to 1.5 mL microcentrifuge tubes. Following addition of 200 µL of chloroform, each sample was incubated for 3 min and then centrifuged at 12,000xg for 15 min at 4 °C. The supernatants were discarded. The pellet was air dried in the micro-centrifuge tube for 10 min. The pellet was re-suspended in 20 µL of RNase-free water and incubated at 56–60 °C in a heat block for 10–15 min. RNA was isolated using RNeasy Mini kit (Qiagen; 74106). 10 µL of Buffer RDD and 2.5 µL of DNase I (Qiagen; 79254) was added to each sample. The sample volume was then brought up to 100 µL with RNase-free water, and the sample incubated at room temperature for 10 min. 350 µL of Buffer RLT was added and mixed well with each sample. Thereafter, 250 µL of 100 % EtOH was mixed with each sample, and the mixture was quickly transferred to an RNeasy MinElute spin column (Qiagen; 74106) and placed in a 2 mL collection tube, which was then centrifuged at 12,000xg for 15 s. The flow through was discarded, and 500 µL of 80% EtOH was added to each spin column. The columns were centrifuged at 12,000xg for 2 min. The flow through was discarded and the column in placed in a new 2 mL collection tube. The samples were centrifuged at full speed for 5 min with the lid of the spin column open. Following centrifugation, the spin column was placed in a 1.5 mL micro-centrifuge tube, and 14 µL of RNase-free water was directly added to the center of the spin column membrane. The spin columns were centrifuged at max speed for 1 min to elute the RNA. RNA quality control was validated via Picogreen analysis.

BioTExt native protein extraction

100 µL of native protein lysis buffer (50 mM HEPES pH 7.5, 150 mM NaCl, 0.5% Triton X-100, 1 mM EDTA, 1 mM EGTA, 10 mM NaF, 2.5 mM NaVO4, Protease inhibitor cocktail, Phosphatase inhibitor cocktail) was added to each native protein sample, which was then transferred to a micro-sonicator vial. Each lysate tube was assigned a trackable Mass Spectrometer label. Lysate concentration measured via Bradford assay of 10 µL of each sample mixed with 800 µL of deionized water. 200 µL of Bradford reagent was added to each deionized water plus lysate aliquot. Each sample was inverted and transferred to assigned cuvettes. Lysates were measured via a spectrophotometer with a corresponding blank sample.

Genomic data generation and QC analysis

DNA from core biopsies and germline blood samples was PicoGreen quantified. Samples that met the minimum PicoGreen quantified input requirements (≥300 ng DNA, preferred concentration 10 ng/μL) proceeded into the Somatic Whole Exome workflow by which DNA was processed for Somatic Whole Exome Sequencing. This process included library preparation, hybrid capture, sequencing with 76 bp paired-end reads, sample identification QC check, and product-utilized ligation-based library preparation followed by hybrid capture with the Illumina Rapid Capture Exome enrichment kit with 38 Mb target territory.

All libraries were sequenced to attempt to meet a goal of 85% of targets covered at greater than 50x coverage (+/− 5%) for tumor samples utilizing the Laboratory Picard bioinformatics pipeline. All sequencing was performed by the Laboratory on Illumina instruments with 76 base pair, paired‐end sequencing. The Laboratory Picard pipeline aggregated all data from a particular sample into a single BAM file that included all reads, all bases from all reads, and original/vendor-assigned quality scores.

DNA samples were additionally processed for Fluidigm Fingerprint Checks. By genotyping a panel of highly polymorphic SNPs (including SNPs on chromosomes X and Y), a unique genetic ‘fingerprint’ is generated for each sample. These genotypes are stored in the sample tracking database and compared automatically to genotypes from the production pipeline to ensure the integrity of sample tracking.

Identification of mutations by whole exome sequencing

VarScan2 was used to identify germline mutations (SNPs and INDELs) from the germline BAM files and somatic mutations by comparing the tumor BAM file to the germline BAM file for each patient. Annovar was then used to separately annotate SNP and INDEL vcf files from VarScan for germline and somatic mutations from each patient. Mutations with “non-synonymous SNVs”, “stopgain”, “stoploss”, and “splicing” annotations that affect the protein coding sequences of genes were extracted from the resulting SNP multianno files and combined into a single text file for all patients. Similarly, INDELs annotated as occurring in the exons of genes were extracted from each INDEL multianno file and combined into a single file. The somatic SNPs and INDELs were combined into a single mutation by patient table (unique mutations; Supplementary Data 3A) and a single mutated gene by patient table (Supplementary Data 3B).

Analysis of copy number alterations

We used the R Package CopywriteR (version 1.18.0)3C) and call significant copy number alterations in the cohort (integer calls). The stringent threshold (2 or −2) of the integer call results were used to define genes with copy number aberration.

RNA-sequencing data generation and analysis

RNA was quantified via RiboGreen, and RNA quality was measured by the RQS (RNA Quality Score). Samples that did not meet the minimum RiboGreen quantified input requirements (≥500 ng RNA, preferred concentration 10 ng/μL, RQS > 5.5) were held for further evaluation. RNA samples of sufficient quality were processed for Long-Insert Strand-Specific Transcriptome Sequencing. Library preparation utilizes a unique high-quality, high-throughput, low-input process using the Illumina TruSeq RNA protocol, which generates poly-A mRNA libraries from total RNA using oligo dT beads. The RNA sequencing library construction includes poly-A selection, cDNA synthesis and library construction using the strand specific Illumina TruSeq Protocol. Each RNA sample entering library construction receives an aliquot of (ThermoFisher) ERCC Controls. All libraries are sequenced to attempt to meet a goal of 50 M reads aligned in pairs (+/− 5%) at 101 bp read length using the Illumina platform as measured using our Picard bioinformatics pipeline. The Picard pipeline aggregates all data from a particular sample into a single demultiplexed, aligned BAM file which includes all reads, all bases from all reads and original/vendor assigned quality scores.

The SamToFastq function from Picard tools was used to convert the BAM file to fastq files for each sample. The RSEM tool was used to calculate both estimated read counts (RSEM) and Fragments Per Kilobase of transcript per Million mapped reads (FPKM) for each gene from the fastq files

Experimental design for MiProt

For the CPTAC workflow, the 4 PDX models were analyzed in process replicates (8 TMT channels) along with 2 common reference (CR) samples in a TMT ten-plex format. The first common reference (CR1) was constructed from equal proportions of peptides derived from the 4 cryopulverized PDX bulk tumors. The second common reference (CR2) had been used in a prior proteogenomic breast cancer PDX study that included these four modelshttps://cptac-data-portal.georgetown.edu/cptac/study/disclaimer?accNum=S039). For this manuscript, all ratios were calculated relative to CR4. For both PDX and clinical core analyses, samples within a TMT11 plex were randomized to reduce batch effects (Supplementary Data 1).

Proteomic sample preparation for MiProt analysis

Protein lysates in 8 M Urea were treated with 1 mM DTT for 45 min followed by 2 mM iodoacetamide (IAA) for an additional 45 min. 8 M Urea was diluted to a final concentration of 2 M with 50 mM Tris-HCL pH 8.5. Protein lysates were incubated with endopeptidase LysC (Promega) at a concentration of 1:50 (μg of LysC to μg of Proteins) for 2 h followed by overnight incubation with Trypsin (Promega) at a concentration of 1:30 (μg of Trypsin to μg of Proteins). Both enzymatic digestions were performed at room-temperature. Following protein digestion, peptides were acidified to a final concentration of 1% Formic acid followed by purification using 50 mg Sep-Pak cartridge (Waters). Peptides were eluted off the Sep-Pak cartridge with 50% acetonitrile and 0.1% formic acid. Peptide concentration was measured using 280 absorbance using a Nanodrop (Thermo Scientific). For qualitative assessment, 0.5 μg peptides were run on a nLC1200 coupled to Q-Exactive + LC-MS setup (Thermo Scientific). Eluted peptides were snap-frozen and dried using a speed-vac apparatus. For the CPTAC workflow, a total of 300 μg peptides were labeled with 800 μg TMT reagent as described previosly

Basic reverse fractionation and phosphoenrichment

For basic phase reverse (bRP) fractionation, ~250 μg of peptides were dissolved in 500 μL of 5 mM ammonium formate and 5% acetonitrile. An offline Agilent 1260 LC coupled to 30 cm and 2.1 diameter column running at a flow-rate of 200 μL per minute was used for bRP fractionation. Peptides were fractionated into 72 fractions and finally concatenated into 24 fractions. A total of 2 μg peptides per fraction was transferred into the mass-spectrometer vial for whole proteome analysis, but only 0.5 μg per fraction was injected for whole proteome analysis. The 24 fractions were further concatenated (by pooling of every 6th fraction) into 4 fractions (~62 μg peptides per fraction) for phosphopeptide enrichment.

The CPTAC workflow has been described before

Phosphopeptide enrichment was done using Fe3 + immobilized metal affinity chromatography (IMAC). For this, Ni-NTA (Qiagen) beads were washed three times with HPLC grade water followed by incubation with 100 mM EDTA (Sigma) for 30 min to strip Ni2+ off the beads. The beads were washed 3 times with HPLC grade water followed by incubation with FeCl3 (Sigma) for 45 min. Beads were again washed with HPLC grade water followed by resuspension of Fe3+ loaded agarose beads with resuspension buffer containing methanol, acetonitrile and 0.01% acetic acid at 1:1:1 ratio. For both CPTAC and MiProt workflows, dried down peptides were resuspended to a final volume of 500 μL in 50% acetonitrile and 0.1% trifluoroacetic acid (TFA) and supplemented with 97% acetonitrile and 0.1% TFA to a final concentration of 80% acetonitrile and 0.1% TFA. A total of 20 μL of 50% slurry was used per fraction for phosphopeptide enrichment. IMAC beads and peptides were incubated at room temperature for 30 min on a tumble-top rotator. Beads were spun down and resuspended with 200 μL of 80% acetonitrile and 0.1% TFA and transferred directly onto a conditioned C18 stage-tips. Phosphopeptides were eluted off the beads using 500 mM K2HPO4, pH 7 buffer onto C18 stage-tip, washed with 1% formic acid and finally eluted into a mass spectrometer LC vial using 50% acetonitrile and 0.1% FA.

Proteomic data acquisition and processing

A Proxeon nLC-1200 coupled to Thermo Lumos instrumentation was used for proteome and phosphoproteome data acquisition. Peptides were run on a 110 min gradient with 86 min of effective gradient (6 to 30% buffer B containing 90% ACN and 0.1%FA). For phosphoproteomics analysis of cores, a second injection was performed and analyzed over a 145 min gradient with 120 min of effective gradient (6 to 30% buffer B containing 90% ACN and 0.1% FA). The acquisition parameters are as follows, MS1: resolution- 60,000, MS1 injection time: 50 s, MS2: resolution: 50,000, MS2 injection time: 110 s, AGC 5E4. Data acquisition was performed with a cycle time of 2 s.

Raw files were searched against the human (clinical samples) or human and mouse (PDX samples) RefSeq protein databases complemented with 553 small-open reading frames (smORFs) and common contaminants (Human: RefSeq.20111003_Human_ucsc_hg38_cpdb_mito_259contamsnr_553smORFS), (Human and Mouse:RefSeq.20160914_Human_Mouse_ucsc_hg19_mm10_customProDBnr_mito_150contams) using Spectrum Mill suite vB.06.01.202 (Broad Institute and Agilent Technologies) as previously described in detailN-termini was used. Carbamidomethylation of cysteines was set as a fixed modification, and N-terminal protein acetylation, oxidation of methionine (Met-ox), de-amidation of asparagine, and cyclization of peptide N-terminal glutamine and carbamidomethylated cysteine to pyroglutamic acid (pyroGlu) and pyro-carbamidomethyl cysteine were set as variable modifications. For phosphoproteome analysis, phosphorylation of serine, threonine, and tyrosine were allowed as additional variable modifications, while de-amidation of asparagine was disabled. Trypsin Allow P was specified as the proteolytic enzyme with up to 4 missed cleavage sites allowed. For proteome analysis, the allowed precursor mass shift range was −18 to 64 Da to allow for pyroGlu and up to 4 Met-ox per peptide. For phosphoproteome analysis, the range was expanded to −18 to 272 Da, to allow for up to 3 phosphorylations and 2 Met-ox per peptide. Precursor and product mass tolerances were set to ±20 ppm and peptide FDR to 1 % employing a target-decoy approach using reversed protein sequences

For generation of protein and phosphopeptide ratios, reporter ion signals were corrected for isotope impurities and relative abundances of proteins, and phosphorylation sites were determined using the median of TMT reporter ion intensity ratios from all PSMs matching to the protein or phosphorylation site. PSMs lacking a TMT label, having a precursor ion purity <50%, or having a negative delta forward-reverse score (half of all false-positive identifications) were excluded. To normalize quantitative data across TMT10/11plex experiments, TMT intensities were divided by the specified common reference for each phosphosite and protein. Log2 TMT rations were further normalized by median centering and median absolute deviation scaling.

Parallel reaction monitoring

Two unique peptides for ERBB2 protein (VLQGLPR and GLQSLPTHDPSPLQR) were used for PRM analysis. Peptides used for proteome analysis were analyzed by Orbitrap Fusion Lumos mass spectrometer coupled with the EASY-nLC1200 system (Thermo Fisher Scientific) for PRM analysis. 1 μg of peptides was loaded to a trap column (150 μm × 2 cm, particle size 1.9 μm) with a max pressure of 280 bar using Solvent A (0.1% formic acid in water) and then separated on a silica microcolumn (150 μm × 5 cm, particle size, 1.9 μm) with a gradient of 4–28% mobile phase B (90% acetonitrile and 0.1% formic acid) at a flow rate of 750 nl per min for 75 min. Both data-dependent acquisition (DDA) and PRM modes were used in parallel. For DDA scans, a precursor scan was performed in the Orbitrap by scanning m/z 300–1200 with a resolution of 120,000 at 200 m/z. The most 20 intense ions were isolated by Quadrupole with a 2 m/z window and fragmented by higher energy collisional dissociation (HCD) with normalized collision energy of 32% and detected by ion trap with rapid scan rate. Automatic gain control targets were 5 × 105 ions with a maximum injection time of 50 ms for precursor scans and 104 with a maximum injection time of 50 ms for MS2 scans. Dynamic exclusion time was 20 s (±7 ppm). For PRM scans, pre-selected peptides were isolated by quadrupole with a 0.7 m/z window followed by HCD with normalized collision energy of 32%, and product ions (MS2) were scanned by Orbitrap with a resolution of 30,000 at 200 m/z. Scan windows were set to 4 min for each peptide. For relative quantification, the raw spectrum file was crunched to mgf format by Proteome Discoverer 2.0 software (Thermo Fisher Scientific) and then imported to Skyline along with the raw data file. We validated each result by deleting non-identified spectra and adjusting the AUC range. Finally, the sum of the area of at least six strongest product ions for each peptide was used for the result.

Network-based gene function prediction

Co-expression network construction using mRNA and protein expression data and network-based gene function prediction for KEGG pathways were performedhttps://github.com/bzhanglab/OmicsEV).

Outlier analysis

The data for each gene or protein from the set of baseline samples from the patients that showed pathological complete response was used to establish a normal distribution for that gene/protein. For each gene, a Z-score for each baseline sample from the non-pCR case was calculated by determining the number of standard deviations the expression value in the non-pCR deviated from the mean of this distribution. Genes/proteins with low variance (variance 

Differential analysis using limma

The limma R package was used to analyze the set of patients with both on-treatment and pre-treatment cores in order to compare on-treatment vs. pre-treatment expression in pCR and non-pCR patients separately in each dataset (RNA, protein, phosphoprotein (mean phosphosite level for each protein), and phosphosite datasets) and to compare on-treatment vs. pre-treatment changes in expression in pCR patients to non-pCR patients. Samples from BCN1368 and BCN1369 were excluded from this analysis because of they did not receive the full treatment regimen (didn’t get pertuzumab). Phosphosite level data for this analysis was first processed by taking the mean of all peptides containing each fully localized site as determined by Spectrum Mill. For this analysis, duplicate cores for a given patient were included but the limma duplicateCorrelation function was used to derive a consensus for each patient for the differential analysis. Each gene (or site) in each dataset was fitted to a linear model with coefficients for each group (on-treatment pCR, pre-treatment pCR, on-treatment non-pCR, and pre-treatment non-PCR) and each plex (to account for batch effects), and moderated T-tests for each comparison were carried out by limma using the residual variances estimated from the linear models. PTM-SEA was applied to signed, log10 transformed p-values from this analysis using the parameters described below.

Geneset enrichment and PTM-signature enrichment analyses

Pathway analysis was performed using single sample Gene Set Enrichment Analysis (ssGSEA) and post-translational modification signature enrichment analysis (PTM-SEA). Protein and phosphosite measurements of technical replicates were combined by taking the average across replicates before subsequent analysis. Pathway level comparisons of bulk and core material were based on signed, log10-transformed p-values derived from a moderated two-sample T-test using the limma R-package comparing luminal and basal tumors separately for bulk and core samples. For proteome data we first applied the two-sample moderated T-test for each protein and resulting transformed p-values (see above) were collapsed to gene-centric level for ssGSEA by retaining the most significant p-value per gene symbol. Phosphosite-level data were subjected to limma-analysis to derive transformed p-values (see above) for each phosphorylation site. Sequence windows flanking the phosphorylation site by 7 amino acids in both directions were used as unique site identifier. For PTM-SEA, only fully localized phosphorylation sites as determined by Spectrum Mill software were taking into consideration. Phosphorylation sites on multiply phosphorylated peptides were resolved using methods described in Krug et al

Pubmed crawling

An in-house Python script was used to drive queries using NCBI-s E-utilities, and resulting freely available information (title, abstract, keywords) were saved to a local SQL database. For each publication, a case-insensitive text search for “resist” OR “recur” AND “breast cancer” was performed, with positive hits retained and tallied for each gene. Publications with over 100 different gene associations were excluded to avoid false positives from high-throughput studies. The results are available in Supplementary Data 8.

Additional statistical analyses and R code

T-tests, Shapiro-Wilk tests and Wilcoxon rank sum and signed rank tests were performed using base R (http://www.R-project.org/). Spearman correlation analyses were performed using the R Hmisc package (https://CRAN.R-project.org/package=Hmisc). Heatmaps were generated using the heatmap.2 function in the gplots R package (https://CRAN.R-project.org/package=gplots) and Morpheus (https://github.com/cmap/morpheus.R) and R code will be made available upon request.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是一个简单的Python爬虫程序,可以爬取东方财富网的股票数据: ```python import requests from bs4 import BeautifulSoup url = 'http://quote.eastmoney.com/stocklist.html' # 股票列表页面 # 发送请求获取页面内容 response = requests.get(url) response.encoding = 'gbk' # 设置编码格式 html = response.text # 解析页面内容,获取股票代码和名称 soup = BeautifulSoup(html, 'html.parser') stock_list = soup.find('div', {'class': 'quotebody'}).find_all('a') for stock in stock_list: code = stock['href'].split('/')[-1].replace('.html', '') name = stock.text print(code, name) ``` 该程序会输出东方财富网股票列表页面中所有股票的代码和名称。 如果需要爬取股票的实时行情数据,可以使用东方财富网提供的接口,例如: ```python import requests url = 'http://push2.eastmoney.com/api/qt/stock/get?secid=1.600519&ut=bd1d9ddb04089700cf9c27f6f7426281&fields=f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166,f167,f168,f169,f170,f171,f172,f173,f174,f175,f176,f177,f178,f179,f180,f181,f182,f183,f184,f185,f186,f187,f188,f189,f190,f191,f192,f193,f194,f195,f196,f197,f198,f199,f200,f201,f202,f203,f204,f205,f206,f207,f208,f209,f210,f211,f212,f213,f214,f215,f216,f217,f218,f219,f220,f221,f222,f223,f224,f225,f226,f227,f228,f229,f230,f231,f232,f233,f234,f235,f236,f237,f238,f239,f240,f241,f242,f243,f244,f245,f246,f247,f248,f249,f250,f251,f252,f253,f254,f255,f256,f257,f258,f259,f260,f261,f262,f263,f264,f265,f266,f267,f268,f269,f270,f271,f272,f273,f274,f275,f276,f277,f278,f279,f280,f281,f282,f283,f284,f285,f286,f287,f288,f289,f290,f291,f292,f293,f294,f295,f296,f297,f298,f299,f300,f301,f302,f303,f304,f305,f306,f307,f308,f309,f310,f311,f312,f313,f314,f315,f316,f317,f318,f319,f320,f321,f322,f323,f324,f325,f326,f327,f328,f329,f330,f331,f332,f333,f334,f335,f336,f337,f338,f339,f340,f341,f342,f343,f344,f345,f346,f347,f348,f349,f350,f351,f352,f353,f354,f355,f356,f357,f358,f359,f360,f361,f362,f363,f364,f365,f366,f367,f368,f369,f370,f371,f372,f373,f374,f375,f376,f377,f378,f379,f380,f381,f382,f383,f384,f385,f386,f387,f388,f389,f390,f391,f392,f393,f394,f395,f396,f397,f398,f399,f400,f401,f402,f403,f404,f405,f406,f407,f408,f409,f410,f411,f412,f413,f414,f415,f416,f417,f418,f419,f420,f421,f422,f423,f424,f425,f426,f427,f428,f429,f430,f431,f432,f433,f434,f435,f436,f437,f438,f439,f440,f441,f442,f443,f444,f445,f446,f447,f448,f449,f450,f451,f452,f453,f454,f455,f456,f457,f458,f459,f460,f461,f462,f463,f464,f465,f466,f467,f468,f469,f470,f471,f472' # 发送请求获取实时行情数据 response = requests.get(url) data = response.json() # 解析数据 print(data['data']['f58']) # 当前股价 print(data['data']['f170']) # 涨跌幅 ``` 以上代码中的`secid`参数可以通过股票代码获取,例如: ```python code = '600519' # 股票代码 url = f'http://push2.eastmoney.com/api/qt/stock/get?secid=1.{code}&ut=bd1d9ddb04089700cf9c27f6f7426281&fields=f56,f57,f58,f59,f60,f61,f62,f63,f64,f65,f66,f67,f68,f69,f70,f71,f72,f73,f74,f75,f76,f77,f78,f79,f80,f81,f82,f83,f84,f85,f86,f87,f88,f89,f90,f91,f92,f93,f94,f95,f96,f97,f98,f99,f100,f101,f102,f103,f104,f105,f106,f107,f108,f109,f110,f111,f112,f113,f114,f115,f116,f117,f118,f119,f120,f121,f122,f123,f124,f125,f126,f127,f128,f129,f130,f131,f132,f133,f134,f135,f136,f137,f138,f139,f140,f141,f142,f143,f144,f145,f146,f147,f148,f149,f150,f151,f152,f153,f154,f155,f156,f157,f158,f159,f160,f161,f162,f163,f164,f165,f166,f167,f168,f169,f170,f171,f172,f173,f174,f175,f176,f177,f178,f179,f180,f181,f182,f183,f184,f185,f186,f187,f188,f189,f190,f191,f192,f193,f194,f195,f196,f197,f198,f199,f200,f201,f202,f203,f204,f205,f206,f207,f208,f209,f210,f211,f212,f213,f214,f215,f216,f217,f218,f219,f220,f221,f222,f223,f224,f225,f226,f227,f228,f229,f230,f231,f232,f233,f234,f235,f236,f237,f238,f239,f240,f241,f242,f243,f244,f245,f246,f247,f248,f249,f250,f251,f252,f253,f254,f255,f256,f257,f258,f259,f260,f261,f262,f263,f264,f265,f266,f267,f268,f269,f270,f271,f272,f273,f274,f275,f276,f277,f278,f279,f280,f281,f282,f283,f284,f285,f286,f287,f288,f289,f290,f291,f292,f293,f294,f295,f296,f297,f298,f299,f300,f301,f302,f303,f304,f305,f306,f307,f308,f309,f310,f311,f312,f313,f314,f315,f316,f317,f318,f319,f320,f321,f322,f323,f324,f325,f326,f327,f328,f329,f330,f331,f332,f333,f334,f335,f336,f337,f338,f339,f340,f341,f342,f343,f344,f345,f346,f347,f348,f349,f350,f351,f352,f353,f354,f355,f356,f357,f358,f359,f360,f361,f362,f363,f364,f365,f366,f367,f368,f369,f370,f371,f372,f373,f374,f375,f376,f377,f378,f379,f380,f381,f382,f383,f384,f385,f386,f387,f388,f389,f390,f391,f392,f393,f394,f395,f396,f397,f398,f399,f400,f401,f402,f403,f404,f405,f406,f407,f408,f409,f410,f411,f412,f413,f414,f415,f416,f417,f418,f419,f420,f421,f422,f423,f424,f425,f426,f427,f428,f429,f430,f431,f432,f433,f434,f435,f436,f437,f438,f439,f440,f441,f442,f443,f444,f445,f446,f447,f448,f449,f450,f451,f452,f453,f454,f455,f456,f457,f458,f459,f460,f461,f462,f463,f464,f465,f466,f467,f468,f469,f470,f471,f472' ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值