【bioinfo】hisat2/bowtie2比对结果summary文件解读

青灯照颦微

已于 2023-03-10 13:17:54 修改

阅读量2.1k

点赞数 2

分类专栏： bioinfo 文章标签： bioinfo

于 2022-10-17 15:11:38 首次发布

本文链接：https://blog.csdn.net/sinat_32872729/article/details/127363872

版权

bioinfo 专栏收录该内容

21 篇文章

订阅专栏

本文详细解读hisat2/bowtie2比对后的summary文件，涉及read计数、配对比例、准确比对次数等，并提供公式和实例，帮助理解比对效率和质量。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

hisat2/bowtie2比对后，比对结果基本信息统计的summary文件解读。

参考：

https://www.cnblogs.com/leezx/p/8540862.html
seqanswers 论坛上的一个问答

使用seqanswers上的一个结果文件示例：
在这里插入图片描述
以及对应的回答：

整理该汇总文件中的数值和比例对应的含义及计算公式：

counts	ratio	info	说明	Label/计算公式	比例计算公式
16182999	reads;	of these:	总reads对数	T	-
16182999	100.00%	were paired; of these:	成对reads	T	T/T
5731231	35.42%	aligned concordantly 0 times	不一致比对	`a0`	a0/T
4522376	27.95%	aligned concordantly exactly 1 time	一致比对恰好1次	a1	a1/T
5929392	36.64%	aligned concordantly >1 times	一致比对大于1次	a1a	a1a/T
----			----
5731231	pairs	aligned concordantly 0 times; of these:	不一致比对	`a0`	-
2381431	41.55%	aligned discordantly 1 time	其中的一次比对	a01	a01/a0
----			----
3349800	pairs	aligned 0 times concordantly or discordantly; of these:	没有比对上或比对不一致的对数	Cd0	-
6699600	mates	mates make up the pairs; of these:	没有比对上或比对不一致的reads数	Cd0s=Cd0*2	-
3814736	56.94%	aligned 0 times	没比对上	Cm0	Cm0/Cd0s
1883429	28.11%	aligned exactly 1 time	比对上恰好1次	Cm1	Cm1/Cd0s
1001435	14.95%	aligned >1 times	比对上大于1次	Cm1a	Cm1a/Cd0s
	88.21%	overall alignment rate	比对率	-	`R`

其中：

$C d 0 = T - (a 1 + a 1 a + a 01) = a 0 - a 01$
$R = [(a 1 + a 1 a) * 2 + a 01 * 2 + C m 1 + C m 1 a] / (T * 2) = 1 - C m 0/ (T * 2)$

附：

hisat2_summary文件格式看起来不方便，转换一下：

python trans_hisat2sum.py ${hisat2_summary} ${hisat2_summary_trs}

# trans_hisat2sum.py
import sys

infile = sys.argv[1]
outfile = sys.argv[2]

lst = []
key_lst = []
with open(infile, 'r') as f:
    for line in f:
        if "----" in line:
            continue
        litm = line.strip().split(" ")
        lst.append(litm[0])
        info = litm[1]
        if info.startswith('('):
            lst.append(info.strip('()'))
            key_lst.append(' '.join(litm[2:]))
            key_lst.append('ratio:' + ' '.join(litm[2:]))
        else:
            key_lst.append(' '.join(litm[1:]))

with open(outfile, 'w') as pf:
    pf.write('\t'.join(key_lst) + '\n')
    pf.write('\t'.join(lst) + '\n')