Java Python Module 2 Assignment: Identification and Quantification of Proteins & Metabolites
Question 1. Study Design (30% of assignment)
Background
Biomarkers indicate associations between biological measurements and states of health and wellness at the subcellular, cellular, system, and organism levels. Proteomics advances allow us to detect and quantify thousands of proteins at a time, leading to the discovery of an avalanche of potential biomarkers. However, critical steps in experimental design are often neglected and the failure to validate candidate biomarkers results in a lack of experimental rigor, data reproducibility, and identification of biomarkers. To standardized biomarker studies, experimental designs now include two fundamental steps; 1) identification of proteins as candidate biomarkers and 2) verification of the identity and differential expression of the candidate biomarkers. Implementing methodical experimental design at each step ensures robust, reproducible data, allowing meaningful biological interpretations of candidate biomarkers increasing the likelihood of discovering relevant, actionable biomarkers.
Task
You are designing abiomarker study using mass-spectrometry-based proteomic analysis, please outline your approach for the 1) biomarker identification, and 2) biomarker verification steps in your study. Describe the key steps in your workflow, include your rationale and any considerations when using these workflows for such studies.
You may include diagrams/tables and references. References should be in the style. of the Journal of Biochemistry and Molecular Science. Maximum 400 words, references and figures do not count towards the word limit.
Question 2. Sample Preparation (30% of assignment)
Background
Metabolites have complex chemical structures, often requiring chemical modifications for suitable analysis. Derivatization is commonly used for metabolomics studies to enable large scale metabolite analysis on GC-MS.
Task
Discuss the importance, purpose and types of derivatizations used for GC-MS. Please detail the biochemical mechanisms, target groups and usage of thesederivatizations, highlighting their advantages and disadvantages.
You may include diagrams/tables and references. References should be in the style. of the Journal of Biochemistry and Molecular Science. Maximum 400 words, references and figures do not count towards the word limit.
Question 3. Data visualization and analysis (40% of assignment)
Prot Module 2 Assignment: Identification and Quantification of Proteins & MetabolitesStatistics eomics and metabolomics studies produce large datasets which require sophisticated data processing, visualization, and analysis. Data analysis pipelines can be completed efficiently in programs such as MS-DIAL.
First download and set-up the MS-DIAL program and the sample dataset using the instructions provided and take sometime exploring the program.
‘Assignment 2 MSDial_setup_instructions’
Next, following the ‘MS-DIAL Tutorial’ you will analyze and visualize the dataset and describe your data results.
Please follow the instructions below and answer the corresponding questions as complete but short answers, you may include graphics from your analysis.
1. Explore the features.
Question a) How many compounds were detected?
Question b) How many compounds were identified matching the library?
2. Compound search.
• Here can change the result for each compound of interest.
• Sort by dot prod. (This is a type of confidence.)
Question c) What do ion matches with very low and very high numbers look like?
Question d) Compare and describe the EI spectra between the measurement (Mass spectrum acquired) and the reference (database).
3. Click on Show ion table (this is all the compounds that were detected and identified from the database search.
• Here you can sort the data by:
o p-value (how significant the difference is)
o Fold change (how large the difference is)
o S/N – signal to noise (how large the peak in the chromatogram)
Question e) How many compounds were significantly different (p<0.05) between the two Groups?
Question f) What was the largest fold change, in what direction and for what metabolite?
4. Data visualisation
• Select ‘Data Visualisation’
• Click ‘normalisation’
• Select ‘Total Ion Chromatogram (TIC)’
• Now work through each of the data visualisation methods
Question g) Perform a PCA, describe if you used all compounds detected and why?
Question h) Describe the PCA plot and explain what % of variation is explained by PC1 and PC2?
5. Create a partial least square plot PLS
Question i) Does the distributing of samples change when you include/exclude unknown compounds? Describe and if it is different from the PCA result