坚持记录学习的第21天,俗话说得好,21天能养成一个习惯。
文章地址:https://academic.oup.com/bib/article/23/4/bbac214/6598882?login=false。
源码地址:https://github.com/XuejiangGuo/DeepSCP。
今天继续分享FDR计算开源工具,前面分享过Dart-ID算法,今天的主角是—DeepSCP,前几天找代码作者要了相关的数据,今天就能实际操作一遍,下面这个是作者源码里面的流程图,比较清晰明了,分为了五个分析模块,分别是Raw data processing、SampleRT、DeepSpec、LgbBayes、FDR Estimation。代码也是根据这个框架进行编写的,感兴趣的可以去看看。
DeepSCP工具目的: utilizing deep learning to boost SCP coverage,DeepSCP identified more confident peptides and proteins by controlling q-value at 0.01 using target-decoy competition method. 简而言之,就是使用深度学习算法提升SCP数据肽段和蛋白的鉴定数目。
部分代码展示:
只展示了部分代码,对源码感兴趣的可以去看看
#####main:构造一个main函数入口就可以调用测试自己写的函数啦~
if __name__ == '__main__':
#######外部传参数
parser = argparse.ArgumentParser(
description="DeepSCP: utilizing deep learning to boost single-cell proteome coverage")
parser.add_argument("-e",
"--evidence",
dest='e',
type=str,
help="SCP SampleSet, evidence.txt, which recorde information about the identified peptides \
by MaxQuant with setting FDR to 1 at both PSM and protein levels")
parser.add_argument("-m",
"--msms",
dest='m',
type=str,
help="SCP SampleSet, msms.txt, which recorde fragment ion information about the identified peptides \
by MaxQuant with setting FDR to 1 at both PSM and protein levels")
parser.add_argument("-lbm",
"--lbmsms",
dest='lbm',
type=str,
help="LibrarySet, msms.txt, which recorde fragment ion information about the identified peptides \
by MaxQuant with setting FDR to 0.01 at both PSM and protein levels")
args = parser.parse_args()
evidenve_file = args.e
msms_file = args.m
lbmsms_file = args.lbm
t0 = time()
#####定义主要函数
main(evidenve_file, msms_file, lbmsms_file)
#####main函数定义,根据分析模块进行
def main(evidenve_file, msms_file, lbmsms_file):
print(' ###################SampleRT###################')
evidence = pd.read_csv(evidenve_file, sep='\t', low_memory=False)
sampleRT = MQ_SampleRT()
dfRT = sampleRT.fit_tranform(evidence)
del evidence
print('###################DeepSpec###################')
msms = pd.read_csv(msms_file, sep='\t', low_memory=False)
lbmsms = pd.read_csv(lbmsms_file, sep='\t', low_memory=False)
deepspec = DeepSpec()
deepspec.fit(lbmsms)
dfSP = deepspec.predict(dfRT, msms)
del msms, lbmsms
print('###################LgbBayses###################')
dfdb = deepcopy(dfSP)
del dfSP
feature_columns = ['Length', 'Acetyl (Protein N-term)', 'Oxidation (M)', 'Missed cleavages',
'Charge', 'm/z', 'Mass', 'Mass error [ppm]', 'Retention length', 'PEP',
'MS/MS scan number', 'Score', 'Delta score', 'PIF', 'Intensity',
'Retention time', 'RT(*|rev)', 'RT(*|tag)', 'DeltaRT', 'PEPRT', 'ScoreRT',
'Cosine', 'PEPCosine', 'ScoreCosine']
target_column = 'label'
file_column = 'Experiment'
protein_column = 'Leading razor protein'
lgs = LgbBayes()
data_set = lgs.fit_tranform(data=dfdb,
feature_columns=feature_columns,
target_column=target_column,
file_column=file_column,
protein_column=protein_column)
data = data_set[(data_set.psm_qvalue < 0.01) & (data_set.protein_qvalue < 0.01) &
(data_set.label == 1)]
peptide_column = 'Sequence'
intensity_columns = [i for i in data.columns if 'Reporter intensity corrected' in i]
df_pro, df_pep = PSM2ProPep(data, file_column=file_column,
protein_column=protein_column,
peptide_column=peptide_column,
intensity_columns=intensity_columns)
data_set.to_csv('DeepSCP_evidence.txt', sep='\t', index=False)
df_pro.to_csv('DeepSCP_pro.csv')
df_pep.to_csv('DeepSCP_pep.csv', index=False)
分析环境准备:
numpy
pandas
scipy
lightgbm
networkx
matplotlib
copy
time
joblib
bayes_opt
triqler
scikit-learn
torch
argparse
warnings
python环境,目前测试3.9存在问题
报错信息:
Error:`np.float_` was removed in the NumPy 2.0 release. Use `np.float64` instead.
解决方法:
强行对numpy进行降级安装,对pandas也相应降级
Numpy>=1.23.3;pandas>=1.4.0,对使用版本进行兼容,然后进行分析。
数据准备:
根据需要输入的进行输入文件准备,当然联系我也可以发给你示例数据,前提是要关注我!!!
$ python DeepSCP.py -h
usage: DeepSCP.py [-h] [-e E] [-m M] [-lbm LBM]
DeepSCP: utilizing deep learning to boost single-cell proteome coverage
optional arguments:
-h, --help show this help message and exit
-e E, --evidence E SCP SampleSet, evidence.txt, which recorde information
about the identified peptides by MaxQuant with setting
FDR to 1 at both PSM and protein levels
-m M, --msms M SCP SampleSet, msms.txt, which recorde fragment ion
information about the identified peptides by MaxQuant
with setting FDR to 1 at both PSM and protein levels
-lbm LBM, --lbmsms LBM
LibrarySet, msms.txt, which recorde fragment ion
information about the identified peptides by MaxQuant
with setting FDR to 0.01 at both PSM and protein
levels
分析命令:
电脑配置不好的别来沾边!!!
python DeepSCP.py -e evidence.txt -m msms.txt -lbm lbmsms.txt
目前我的电脑也太辣鸡,所以还没有最后的结果出来,等我换台电脑有结果了再来分享一期结果。
今天就先写到这里,对DeepSCP工具感兴趣的可以联系管理员带你分析:kriswcyYQ。