为什么这么好的单细胞蛋白组学数据分析，没有人看呢|DeepSCP工具

最新推荐文章于 2024-10-28 15:26:40 发布

王元启的生信记录

最新推荐文章于 2024-10-28 15:26:40 发布

阅读量464

点赞数 5

分类专栏：单细胞蛋白组学数据处理文章标签：数据分析数据挖掘 python 深度学习人工智能

本文链接：https://blog.csdn.net/wcy1995427/article/details/139855354

版权

单细胞蛋白组学数据处理专栏收录该内容

13 篇文章 1 订阅

订阅专栏

坚持记录学习的第21天，俗话说得好，21天能养成一个习惯。

文章地址：https://academic.oup.com/bib/article/23/4/bbac214/6598882?login=false。

源码地址：https://github.com/XuejiangGuo/DeepSCP。

今天继续分享FDR计算开源工具，前面分享过Dart-ID算法，今天的主角是—DeepSCP，前几天找代码作者要了相关的数据，今天就能实际操作一遍，下面这个是作者源码里面的流程图，比较清晰明了，分为了五个分析模块，分别是Raw data processing、SampleRT、DeepSpec、LgbBayes、FDR Estimation。代码也是根据这个框架进行编写的，感兴趣的可以去看看。
在这里插入图片描述

DeepSCP工具目的： utilizing deep learning to boost SCP coverage，DeepSCP identified more confident peptides and proteins by controlling q-value at 0.01 using target-decoy competition method. 简而言之，就是使用深度学习算法提升SCP数据肽段和蛋白的鉴定数目。

部分代码展示：

只展示了部分代码，对源码感兴趣的可以去看看

#####main:构造一个main函数入口就可以调用测试自己写的函数啦~

if __name__ == '__main__':
    #######外部传参数
    parser = argparse.ArgumentParser(
        description="DeepSCP: utilizing deep learning to boost single-cell proteome coverage")
    parser.add_argument("-e",
                        "--evidence",
                        dest='e',
                        type=str,
                        help="SCP SampleSet, evidence.txt, which recorde information about the identified peptides \
                        by MaxQuant with setting  FDR to 1 at both PSM and protein levels")
    parser.add_argument("-m",
                        "--msms",
                        dest='m',
                        type=str,
                        help="SCP SampleSet, msms.txt, which recorde fragment ion information about the identified peptides \
                        by MaxQuant with setting  FDR to 1 at both PSM and protein levels")
    parser.add_argument("-lbm",
                        "--lbmsms",
                        dest='lbm',
                        type=str,
                        help="LibrarySet, msms.txt, which recorde fragment ion information about the identified peptides \
                        by MaxQuant with setting  FDR to 0.01 at both PSM and protein levels")
    args = parser.parse_args()
    evidenve_file = args.e
    msms_file = args.m
    lbmsms_file = args.lbm
    t0 = time()
    #####定义主要函数
    main(evidenve_file, msms_file, lbmsms_file)

#####main函数定义，根据分析模块进行
def main(evidenve_file, msms_file, lbmsms_file):
    print(' ###################SampleRT###################')
    evidence = pd.read_csv(evidenve_file, sep='\t', low_memory=False)
    sampleRT = MQ_SampleRT()
    dfRT = sampleRT.fit_tranform(evidence)
    del evidence
    print('###################DeepSpec###################')
    msms = pd.read_csv(msms_file, sep='\t', low_memory=False)
    lbmsms = pd.read_csv(lbmsms_file, sep='\t', low_memory=False)
    deepspec = DeepSpec()
    deepspec.fit(lbmsms)
    dfSP = deepspec.predict(dfRT, msms)
    del msms, lbmsms
    print('###################LgbBayses###################')
    dfdb = deepcopy(dfSP)
    del dfSP
    feature_columns = ['Length', 'Acetyl (Protein N-term)', 'Oxidation (M)', 'Missed cleavages',
                       'Charge', 'm/z', 'Mass', 'Mass error [ppm]', 'Retention length', 'PEP',
                       'MS/MS scan number', 'Score', 'Delta score', 'PIF', 'Intensity',
                       'Retention time', 'RT(*|rev)', 'RT(*|tag)', 'DeltaRT', 'PEPRT', 'ScoreRT',
                       'Cosine', 'PEPCosine', 'ScoreCosine']
    target_column = 'label'
    file_column = 'Experiment'
    protein_column = 'Leading razor protein'
    lgs = LgbBayes()
    data_set = lgs.fit_tranform(data=dfdb,
                                feature_columns=feature_columns,
                                target_column=target_column,
                                file_column=file_column,
                                protein_column=protein_column)
    
    data = data_set[(data_set.psm_qvalue < 0.01) & (data_set.protein_qvalue < 0.01) &
                    (data_set.label == 1)]
    
    peptide_column = 'Sequence'
    intensity_columns = [i for i in data.columns if 'Reporter intensity corrected' in i]

    df_pro, df_pep = PSM2ProPep(data, file_column=file_column,
                                protein_column=protein_column,
                                peptide_column=peptide_column,
                                intensity_columns=intensity_columns)

    data_set.to_csv('DeepSCP_evidence.txt', sep='\t', index=False)
    df_pro.to_csv('DeepSCP_pro.csv')
    df_pep.to_csv('DeepSCP_pep.csv', index=False)

分析环境准备：

numpy
pandas
scipy
lightgbm
networkx
matplotlib
copy
time
joblib
bayes_opt
triqler
scikit-learn
torch
argparse
warnings
python环境，目前测试3.9存在问题
报错信息：
Error：`np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead.
解决方法：
强行对numpy进行降级安装，对pandas也相应降级
Numpy>=1.23.3;pandas>=1.4.0,对使用版本进行兼容，然后进行分析。

数据准备：

根据需要输入的进行输入文件准备，当然联系我也可以发给你示例数据，前提是要关注我！！！

$ python DeepSCP.py -h
usage: DeepSCP.py [-h] [-e E] [-m M] [-lbm LBM]

DeepSCP: utilizing deep learning to boost single-cell proteome coverage

optional arguments:
  -h, --help            show this help message and exit
  -e E, --evidence E    SCP SampleSet, evidence.txt, which recorde information
                        about the identified peptides by MaxQuant with setting
                        FDR to 1 at both PSM and protein levels
  -m M, --msms M        SCP SampleSet, msms.txt, which recorde fragment ion
                        information about the identified peptides by MaxQuant
                        with setting FDR to 1 at both PSM and protein levels
  -lbm LBM, --lbmsms LBM
                        LibrarySet, msms.txt, which recorde fragment ion
                        information about the identified peptides by MaxQuant
                        with setting FDR to 0.01 at both PSM and protein
                        levels

分析命令：

电脑配置不好的别来沾边！！！

python DeepSCP.py -e evidence.txt -m msms.txt -lbm lbmsms.txt

目前我的电脑也太辣鸡，所以还没有最后的结果出来，等我换台电脑有结果了再来分享一期结果。
在这里插入图片描述
今天就先写到这里，对DeepSCP工具感兴趣的可以联系管理员带你分析：kriswcyYQ。

王元启的生信记录

关注

5
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录