hisat2/bowtie2比对后,比对结果基本信息统计的summary文件解读。
参考:
- https://www.cnblogs.com/leezx/p/8540862.html
- seqanswers 论坛上的一个问答
使用seqanswers上的一个结果文件示例:
以及对应的回答:
整理该汇总文件中的数值和比例对应的含义及计算公式:
counts | ratio | info | 说明 | Label/计算公式 | 比例计算公式 |
---|---|---|---|---|---|
16182999 | reads; | of these: | 总reads对数 | T | - |
16182999 | 100.00% | were paired; of these: | 成对reads | T | T/T |
5731231 | 35.42% | aligned concordantly 0 times | 不一致比对 | a0 | a0/T |
4522376 | 27.95% | aligned concordantly exactly 1 time | 一致比对恰好1次 | a1 | a1/T |
5929392 | 36.64% | aligned concordantly >1 times | 一致比对大于1次 | a1a | a1a/T |
---- | ---- | ||||
5731231 | pairs | aligned concordantly 0 times; of these: | 不一致比对 | a0 | - |
2381431 | 41.55% | aligned discordantly 1 time | 其中的一次比对 | a01 | a01/a0 |
---- | ---- | ||||
3349800 | pairs | aligned 0 times concordantly or discordantly; of these: | 没有比对上或比对不一致的对数 | Cd0 | - |
6699600 | mates | mates make up the pairs; of these: | 没有比对上或比对不一致的reads数 | Cd0s=Cd0*2 | - |
3814736 | 56.94% | aligned 0 times | 没比对上 | Cm0 | Cm0/Cd0s |
1883429 | 28.11% | aligned exactly 1 time | 比对上恰好1次 | Cm1 | Cm1/Cd0s |
1001435 | 14.95% | aligned >1 times | 比对上大于1次 | Cm1a | Cm1a/Cd0s |
88.21% | overall alignment rate | 比对率 | - | R |
其中:
- C d 0 = T − ( a 1 + a 1 a + a 01 ) = a 0 − a 01 Cd0=T-(a1+a1a+a01) = a0-a01 Cd0=T−(a1+a1a+a01)=a0−a01
- R = [ ( a 1 + a 1 a ) ∗ 2 + a 01 ∗ 2 + C m 1 + C m 1 a ] / ( T ∗ 2 ) = 1 − C m 0 / ( T ∗ 2 ) R=[(a1+a1a)*2+a01*2+Cm1+Cm1a]/(T*2)=1-Cm0/(T*2) R=[(a1+a1a)∗2+a01∗2+Cm1+Cm1a]/(T∗2)=1−Cm0/(T∗2)
附:
hisat2_summary文件格式看起来不方便,转换一下:
python trans_hisat2sum.py ${hisat2_summary} ${hisat2_summary_trs}
# trans_hisat2sum.py
import sys
infile = sys.argv[1]
outfile = sys.argv[2]
lst = []
key_lst = []
with open(infile, 'r') as f:
for line in f:
if "----" in line:
continue
litm = line.strip().split(" ")
lst.append(litm[0])
info = litm[1]
if info.startswith('('):
lst.append(info.strip('()'))
key_lst.append(' '.join(litm[2:]))
key_lst.append('ratio:' + ' '.join(litm[2:]))
else:
key_lst.append(' '.join(litm[1:]))
with open(outfile, 'w') as pf:
pf.write('\t'.join(key_lst) + '\n')
pf.write('\t'.join(lst) + '\n')
输出是一行表头一行数据,样本多的话看起来更方便: