山东大学2019级软件工程应用与实践——基于人工智能的多肽药物分析问题（二）

本文链接：https://blog.csdn.net/ChloeS0/article/details/120617341

2021SC@SDUSC

基于人工智能的多肽药物分析问题

主题：肽与HLA分子结合预测研究（2）

代码分析

代码结构

在这里插入图片描述核心代码

bin下的deephlapan文件是整个框架的入口

deephlapan

#!/usr/bin/python

from deephlapan.deephlapan_main import *
from deephlapan.parse_args import *


def main():
    (opt,_)=CommandLineParser()
    deephlapan_main(opt)


if __name__ == '__main__':
    main()

main函数，里面有两条语句

CommandLineParser()方法来自parse_args.py

deephlapan_main()方法来自deephlapan_main.py

下面依次分析这两部分代码：

parse_args.py

from optparse import OptionParser

def CommandLineParser():
    
    parser=OptionParser()

    '''
        =====================================================================
        DeepHLApan is a deep learning approach used for predicting high-confidence 
        neoantigens by considering both the presentation possibilities of 
        mutant peptides and the potential immunogenicity of pMHC.

        Usage:

        Single peptide and HLA:

            deephlapan -P LNIMNKLNI -H HLA-A02:01 

        List of peptides and HLA alleles in a file:

            deephlapan -F [file] -O [output directory]  

            (see 1.csv in demo/ for the detailed format of input file)
        =====================================================================
        '''

    parser.add_option("-P","--peptide",dest="sequence",help="single peptide for prediction",default="")
    parser.add_option("-H","--HLA allele",dest="hla",help="single hla for prediction, used with -P",default="")
    parser.add_option("-F","--file",dest="file",help="Input file with peptides and HLA alleles : if given, overwrite -P, -H option",default="")
    parser.add_option('-O','--OutputDirectory',dest="WD",default="",help="Directory to store predicted results. User must have write privilege. If omitted, the current directory will be applied.")
    return parser.parse_args()

optparse用于处理命令行参数，它功能强大，而且易于使用，可以方便地生成标准的、符合Unix/Posix 规范的命令行说明。

使用此模块前，首先需要导入模块中的类OptionParser，然后创建它的一个实例。

OptionParser通过parser.add_option()添加选项参数，再通过parser.parse_args()进行解析参数选项。

add_option()参数说明：
-action:存储方式
-type:类型
-dest:存储的变量
-default:默认值
-help: 用于指定当前命令的提示信息

deephlapan_main

def deephlapan_main(opt):
    i = datetime.datetime.now()
    print (str(i) + ' Prediction starting.....\n')
    peptide=opt.sequence
    hla=opt.hla
    WD=opt.WD
    if len(WD)==0:
        WD='.'
    
    fname=peptide+'_'+hla
    if (opt.file):
        fname=opt.file.split('/')[-1]
        fname=fname.split('.')[0]
        df=pd.read_csv(opt.file)
        X_test = read_and_prepare(opt.file)
    else:
        X_test = read_and_prepare_single(peptide,hla)

以上代码对输入的命令行进行处理，输入分为两种情况：

Single peptide and HLA:

		deephlapan -P LNIMNKLNI -H HLA-A02:01 

List of peptides and HLA alleles in a file:

        deephlapan -F [file] -O [output directory]

即单个肽和HLA，或者以文件形式输入多个肽和HLA等位基因。

后面的数据处理方法也因为输入情况的不同而有两套方法，如run_model和run_model1

    pool=mp.Pool(mp.cpu_count())
    for i in range(5):
        pool.apply_async(run_model,args=(i,X_test),callback=collect_result)
        pool.apply_async(run_model1,args=(i,X_test),callback=collect_result1)
    pool.close()
    pool.join()

以上代码通过调用 Python 自带的多进程库 Multiprocessing 进行多核并行计算

核心数量：cpu_count() 函数可以获得本地运行计算机的核心数量。

进程池：Pool() 函数创建了一个进程池类，用来管理多进程的生命周期和资源分配。这里进程池传入的参数是核心数量，意思是最多有多少个进程可以进行并行运算。

异步调度：apply_async() 是进程池的一个调度函数。第一个参数是计算函数，在此文件中有定义，下面会进行分析；第二个参数是需要传入计算函数的参数，这里传入了计算函数名字和计算调参。而异步的意义是在调度之后，虽然计算函数开始运行并且可能没有结束，异步调度都会返回一个临时结果，并且通过列表生成器临时的保存在一个列表里。

调度结果：检查上述列表里的类，发现 apply_async() 返回的是 ApplyResult，也就是调度结果类。这里用到了 Python 的异步功能，这是一个用来等待异步结果生成完毕的容器。

获取结果：调度结果 ApplyResult 类可以调用函数 get(), 这是一个非异步函数，也就是说 get() 会等待计算函数处理完毕，并且返回结果。这里的结果就是计算函数的 return。

pool.close()：进程池不再创建新的进程

pool.join()：wait进程池中的全部进程。必须对Pool先调用close()方法才能join。

    result = np.average(predScores, axis=0)
    result1 = np.average(predScores1, axis=0)
    with open(WD + '/' + fname + '_predicted_result.csv','w') as f:
        f.write('Annotation,HLA,Peptide,binding score,immunogenic score\n')
        if (opt.file):
            for i in range(len(result)):
                result[i]=("%.4f" % result[i])
                result1[i]=("%.4f" % result1[i])
                f.write(str(df.Annotation[i]) + ',' + str(df.HLA[i]) + ',' + str(df.peptide[i]) + ',' + str(result[i]) + ',' + str(result1[i]) + '\n')
        else:
            f.write('single peptide,' + str(hla) + ',' + str(peptide) + ',' + str(result[0]) + ',' + str(result1[0]) + '\n')
    f.close()
    if (opt.file):
        command = 'perl ' + curDir + '/model/rank.pl ' + WD + '/' + fname + '_predicted_result.csv'
        os.system(command)
    j = datetime.datetime.now()
    print (str(j) + ' Prediction end\n')