OpenSwath打分原理

OpenSwath Algorithm

Principle

转自文献OpenSwath原理

Step 1 Data conversion

mzML/mxXML/TraML:
打开文件格式,library包含有precuisor-,fragment-ion m/z value, relative fragment-ion intensity and normalized peptide retention times
Decoy :
用于之后的classification and error rate estimation

Step 2 Retention-time alignment

采集窗口的retention-time并非一样时常,相同蛋白会出现在window的相同的相对位置,因此在分析中需要将其Normalized
同时进行孤立点检测(outlier detection),用于移除错误的肽段数据和序列质量检测

Step 3 Chromatogram extraction

提取方法:Top-hat/Bartlett

Step 4 Peak-group scoring
Step 5 Statistical analysis

首先利用decoy文件进行打分,计算FDR(FDR = false positives/(true positives + false positives))),开发者用 Kolmogorov-Smirnov statistics (d) *附录1检测target-decoy distributions 的相似性(d score),用Shuffle method检测FDR。
*为了验证SWATH-MS数据的准确性, 开发者建立了一种金标准(SGS),包含422个化学合成的isotope-labeled standard peptides.
开发者验证O-SWATH的misidentification rate 低于0.7%, 要是错误率太高都怪你们操作者弄得peak group不行,我程序辨认是没错哒!

Software

software

1. OpenSwath-DecoyGenerator

In TraML file – output decoys, by method : “shuffle”, “pseudo-reverse”, “reverse”, “shift”.
扫描所有的target peptides,用合适的方法生成decoy sequence。对于每个肽段,匹配其fragment ions在library里最可能的origin,将其放入decoy sequence (e.g. if the target peptide sequence has a b5 ion with a normalized intensity of 200, an equivalent b5 ion for the decoy sequence is created and assigned the intensity 200)。m/z 选用理论值。

2. OpenSwath-ChromatogramExtractor

In mzML files – output TraML format
用Top-hat 或 Bartlett 卷积函数(convolution function),用m/z作为分类,提取出所有的window中同一m/z的transition。

3. OpenSwath-RTNormalizer

在这里插入图片描述

4. OpenSwath-Analyzer

4.1 pick-picking
Smoothing alogorithm: Gaussian or Savitzky-Golay (高斯模糊)
This step is implemented in the MRMTransitionGroupPicker class.
4.2 Peak grouping
4.3 Peak Group scoring

chromatography-based scores and spectra-based scores.
While the chromatographybased scores only use chromatographic information and can thus be applied to SWATH-MS and SRM data, the spectra-based scores take a full MS2 spectrum as input and can only be used with SWATH-MS data.

4.3.1 Chromatography-based scores:
4.3.1.1 Cross-correlation Score:
S3: The first score s3 describes the co-elution of two signals and retrieves the delay j at which the cross-correlation is maximal, indicates by how much two signals are shifted in retention time
S4: describes the shape similarity of two peaks and retrieves the maximal cross-correlation value from x[]
4.3.1.2 Intensity Score:
S5: s5 describes the fraction of the intensity of the individual peak group compared to the total chromatographic area over all time
4.3.1.3signal-to-noise Score: S6:
1.4EMG Score: Score S7 is a fit score of the peak group to an exponentially modified
gaussian (EMG) distribution.
4.3.2. Library-based scores
2.1 Relative Intensity Score: For this score, the experimental relative intensities are compared to the relative intensities stored in the spectral library. A Pearson correlation score (s1) is computed between the two intensity vectors.
2.2Retention Time Score: For this score, the normalized experimental retention time (fitted using the linear correlation computed earlier) is compared to the assay normalized retention time (iRT). The score s2 simply consists of the absolute value of the difference.

4.2.3.Spectra-based scores
3.1Isotope scores:
S8: Score S8 is based on the isotopic distribution of the fragment ion signal in the MS2 spectra at the peak of the chromatographic apex. A putative charge state can be provided with each fragment ion, if no charge is given, charge 1 is assumed.
S9: Score s9 describes the isotopic overlap of interfering fragment ions.
3.2Mass accuracy score:
S10: Score s10 describes the mass accuracy of the fragment ion peak. For each fragment ion signal, an integration of the signal at the theoretically expected m/z is performed and the weighted average is used as an estimate of the peak m=z. This is then compared to the theoretical, assay-derived m/z and the difference in ppm is used as score. The weighted or non-weighted average of these differences in ppm over all fragment ions is used as the final score s10.
3.3Ion series score:
S11: Score s11 counts the number of b- and y-ions present in the full MS2 spectrum by computing the theoretical locations of the b- and y-ions based on the amino acid sequence of the peptide in question.

附录1 KS-检验

转载自https://www.cnblogs.com/arkenstone/p/5496761.html

附录2 高斯平滑(模糊)算法原理说明

转载自https://blog.csdn.net/wei375653972/article/details/88713980
代码实现:https://blog.csdn.net/qq_36359022/article/details/80188873

附录3 Savitzky-Golay 滤波器

转载自:https://blog.csdn.net/liyuanbhu/article/details/9094945

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值