我已经修复了缩进和冗余:#!/usr/bin/python
"""
This script reads the sequences of the desert areas (fasta files) and calculates the percentage of the Ns and the repeats.
2014-10-05 v1.0 by Vasilis
2014-10-05 v1.1 by Llopis
2015-02-27 v1.2 by Cees Timmerman
"""
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("fasta_file", help="The fasta file to be processed.", type=str)
args = parser.parse_args()
with open(args.fasta_file, "r") as f:
for line in f.readlines():
line = line.strip()
if line[0] == '>':
name = line.split(">")[1]
print name,
else:
numberOfN = line.count('N')
allChar = len(line)
lowerChars = sum(1 for c in line if c.islower())
Ns_percentage = 100 * (numberOfN/float(allChar))
lower_percentage = 100 * (lowerChars/float(allChar))
waste = Ns_percentage + lower_percentage
print "\t", round(waste) # Note: https://docs.python.org/2/library/functions.html#round
美联储:
^{pr2}$
给出:C:\Python27\python.exe -u "dna.py" fasta.txt
Process started >>>
chr14_Gap_2 29.0
chr14_Gap_3 29.0
<<< Process finished. (Exit code 0)
这是一个Python脚本,用于读取FASTA文件中沙漠区域的序列,并计算Ns(未知碱基)的比例和重复序列的浪费百分比。脚本由Vasilis、Llopis和Cees Timmerman分别在2014年和2015年进行了更新。它首先打开指定的FASTA文件,然后逐行读取,当遇到以'>'开头的行时,提取序列名称,并计算非'>'行中Ns的数量、小写字母的数量以及它们占总字符的比例。

被折叠的 条评论
为什么被折叠?



