代码版
def readFa(fa):
'''
@msg: 读取一个fasta文件
@param: fa {str} fasta 文件路径
@return: {generator} 返回一个生成器,能迭代得到fasta文件的每一个序列名和序列
'''
with open(fa,'r') as FA:
seqName,seq,fold='','',''
while 1:
line=FA.readline()
line=line.strip('\n')
if (line.startswith('>') or not line) and seqName:
yield((seqName,seq,fold))
if line.startswith('>'):
seqName = line[1:]
seq=''
fold=''
elif line.startswith('(') or line.startswith('.'):
fold+=line
else:
seq+=line
if not line:break
training_seq = []
training_fold = []
training_labels = []
fa="RBP-24/CLIPSEQ_ELAVL1.train.positives.fa"
#读取fasta文件
for seqName,seq,fold in readFa(fa1):
training_seq.append(seq)
#保留501长度
training_fold.append(fold[0:501])
training_labels.append(1)
下载库版(非常全面)
python 学习之 fasta/fastq 处理利器–pyfastx – 恒诺新知https://www.weinformatics.cn/89bddcbc14/