为什么这么好的单细胞蛋白组学数据分析,没有人看呢|DeepSCP工具

坚持记录学习的第21天,俗话说得好,21天能养成一个习惯。

文章地址:https://academic.oup.com/bib/article/23/4/bbac214/6598882?login=false。

源码地址:https://github.com/XuejiangGuo/DeepSCP。

今天继续分享FDR计算开源工具,前面分享过Dart-ID算法,今天的主角是—DeepSCP,前几天找代码作者要了相关的数据,今天就能实际操作一遍,下面这个是作者源码里面的流程图,比较清晰明了,分为了五个分析模块,分别是Raw data processing、SampleRT、DeepSpec、LgbBayes、FDR Estimation。代码也是根据这个框架进行编写的,感兴趣的可以去看看。
在这里插入图片描述

DeepSCP工具目的: utilizing deep learning to boost SCP coverage,DeepSCP identified more confident peptides and proteins by controlling q-value at 0.01 using target-decoy competition method. 简而言之,就是使用深度学习算法提升SCP数据肽段和蛋白的鉴定数目。

部分代码展示:

只展示了部分代码,对源码感兴趣的可以去看看

#####main:构造一个main函数入口就可以调用测试自己写的函数啦~

if __name__ == '__main__':
    #######外部传参数
    parser = argparse.ArgumentParser(
        description="DeepSCP: utilizing deep learning to boost single-cell proteome coverage")
    parser.add_argument("-e",
                        "--evidence",
                        dest='e',
                        type=str,
                        help="SCP SampleSet, evidence.txt, which recorde information about the identified peptides \
                        by MaxQuant with setting  FDR to 1 at both PSM and protein levels")
    parser.add_argument("-m",
                        "--msms",
                        dest='m',
                        type=str,
                        help="SCP SampleSet, msms.txt, which recorde fragment ion information about the identified peptides \
                        by MaxQuant with setting  FDR to 1 at both PSM and protein levels")
    parser.add_argument("-lbm",
                        "--lbmsms",
                        dest='lbm',
                        type=str,
                        help="LibrarySet, msms.txt, which recorde fragment ion information about the identified peptides \
                        by MaxQuant with setting  FDR to 0.01 at both PSM and protein levels")
    args = parser.parse_args()
    evidenve_file = args.e
    msms_file = args.m
    lbmsms_file = args.lbm
    t0 = time()
    #####定义主要函数
    main(evidenve_file, msms_file, lbmsms_file)
#####main函数定义,根据分析模块进行
def main(evidenve_file, msms_file, lbmsms_file):
    print(' ###################SampleRT###################')
    evidence = pd.read_csv(evidenve_file, sep='\t', low_memory=False)
    sampleRT = MQ_SampleRT()
    dfRT = sampleRT.fit_tranform(evidence)
    del evidence
    print('###################DeepSpec###################')
    msms = pd.read_csv(msms_file, sep='\t', low_memory=False)
    lbmsms = pd.read_csv(lbmsms_file, sep='\t', low_memory=False)
    deepspec = DeepSpec()
    deepspec.fit(lbmsms)
    dfSP = deepspec.predict(dfRT, msms)
    del msms, lbmsms
    print('###################LgbBayses###################')
    dfdb = deepcopy(dfSP)
    del dfSP
    feature_columns = ['Length', 'Acetyl (Protein N-term)', 'Oxidation (M)', 'Missed cleavages',
                       'Charge', 'm/z', 'Mass', 'Mass error [ppm]', 'Retention length', 'PEP',
                       'MS/MS scan number', 'Score', 'Delta score', 'PIF', 'Intensity',
                       'Retention time', 'RT(*|rev)', 'RT(*|tag)', 'DeltaRT', 'PEPRT', 'ScoreRT',
                       'Cosine', 'PEPCosine', 'ScoreCosine']
    target_column = 'label'
    file_column = 'Experiment'
    protein_column = 'Leading razor protein'
    lgs = LgbBayes()
    data_set = lgs.fit_tranform(data=dfdb,
                                feature_columns=feature_columns,
                                target_column=target_column,
                                file_column=file_column,
                                protein_column=protein_column)
    
    data = data_set[(data_set.psm_qvalue < 0.01) & (data_set.protein_qvalue < 0.01) &
                    (data_set.label == 1)]
    
    peptide_column = 'Sequence'
    intensity_columns = [i for i in data.columns if 'Reporter intensity corrected' in i]

    df_pro, df_pep = PSM2ProPep(data, file_column=file_column,
                                protein_column=protein_column,
                                peptide_column=peptide_column,
                                intensity_columns=intensity_columns)

    data_set.to_csv('DeepSCP_evidence.txt', sep='\t', index=False)
    df_pro.to_csv('DeepSCP_pro.csv')
    df_pep.to_csv('DeepSCP_pep.csv', index=False)

分析环境准备:

numpy
pandas
scipy
lightgbm
networkx
matplotlib
copy
time
joblib
bayes_opt
triqler
scikit-learn
torch
argparse
warnings
python环境,目前测试3.9存在问题
报错信息:
Error:`np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead.
解决方法:
强行对numpy进行降级安装,对pandas也相应降级
Numpy>=1.23.3;pandas>=1.4.0,对使用版本进行兼容,然后进行分析。

数据准备:

根据需要输入的进行输入文件准备,当然联系我也可以发给你示例数据,前提是要关注我!!!

$ python DeepSCP.py -h
usage: DeepSCP.py [-h] [-e E] [-m M] [-lbm LBM]

DeepSCP: utilizing deep learning to boost single-cell proteome coverage

optional arguments:
  -h, --help            show this help message and exit
  -e E, --evidence E    SCP SampleSet, evidence.txt, which recorde information
                        about the identified peptides by MaxQuant with setting
                        FDR to 1 at both PSM and protein levels
  -m M, --msms M        SCP SampleSet, msms.txt, which recorde fragment ion
                        information about the identified peptides by MaxQuant
                        with setting FDR to 1 at both PSM and protein levels
  -lbm LBM, --lbmsms LBM
                        LibrarySet, msms.txt, which recorde fragment ion
                        information about the identified peptides by MaxQuant
                        with setting FDR to 0.01 at both PSM and protein
                        levels

分析命令:

电脑配置不好的别来沾边!!!

python DeepSCP.py -e evidence.txt -m msms.txt -lbm lbmsms.txt

目前我的电脑也太辣鸡,所以还没有最后的结果出来,等我换台电脑有结果了再来分享一期结果。
在这里插入图片描述
今天就先写到这里,对DeepSCP工具感兴趣的可以联系管理员带你分析:kriswcyYQ。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值