RNAseq---Hisat2 标准输出中比对率信息解读

RNA-Seq—Hisat2 标准输出中比对率信息解读

本文具体解释部分(一)中内容复制自Biostar内容,后面附上我实际的例子,二者略有不同,整体理解上没大问题,有不合适之处可联系我。

(一)Biostar上解释的例子

HISAT2是速度、敏感性和比对率方面比较优秀的比对软件,通常用于比对二代测序产生的RNA-Seq数据。在实际使用时需要统计HISAT2需要对结果中的比对率统计,HISAT2完成比对后标准输出如下:

HISAT2 summary stats:
            Total pairs: 11587225
                    Aligned concordantly or discordantly 0 time: 4464083 (38.53%)
                    Aligned concordantly 1 time: 2195620 (18.95%)
                    Aligned concordantly >1 times: 4877336 (42.09%)
                    Aligned discordantly 1 time: 50186 (0.43%)
            Total unpaired reads: 8928166
                    Aligned 0 time: 8019048 (89.82%)
                    Aligned 1 time: 304653 (3.41%)
                    Aligned >1 times: 604465 (6.77%)
            Overall alignment rate: 65.40%

具体解释如下:

1. Total pairs: 11587225

Total reads = 11587225 * 2 = 23174450 (matches total number of reads in the sample)

2. Aligned concordantly or discordantly 0 time: 4464083 (38.53%)

These are unmapped reads : 4464083 * 2 (paired end) = 8928166

( 8928166 / 23174450 (Total reads) ) * 100 ~ 38.53%

3. Aligned concordantly 1 time: 2195620 (18.95%)

These are uniquely mapped reads : 2195620 * 2 (paired end) = 4391240

( 4391240 / 23174450 (Total reads) ) * 100 ~ 18.95%

4. Aligned concordantly >1 times: 4877336 (42.09%)

These are multi mapped reads : 4877336 * 2 = 9754672

( 9754672 / 23174450 (Total reads) ) * 100 ~ 42.09%

5.Aligned discordantly 1 time: 50186 (0.43%)

Discordant aligned : 50186 * 2 = 100372

( 100372 / 23174450 (Total reads) ) * 100 ~ 0.43%

6. Total unpaired reads: 8928166

These are not paired reads

Aligned 0 time: 8019048 (89.82%)

(8019048 / 8928166 ) * 100 = 89.82% i.e. 89% of the unpaired reads did not align at all

Aligned 1 time: 304653 (3.41%)

(304653 / 8928166 ) * 100 = 3.41% i.e. 3.41% of the unpaired reads aligned once

Aligned >1 times: 604465 (6.77%)

(604465 / 8928166 ) * 100 = 6.77% i.e. 6.77% of the unpaired reads are multi mapped

7. Overall alignment rate: 65.40%

Calculation as explained below

Paired Reads
Aligned concordantly 1 time: (2195620 * 2 = 4391240)
Aligned concordantly >1 times: (4877336 * 2 = 9754672)
Aligned discordantly 1 time: (50186 * 2 = 100372)

Unpaired Reads
Aligned 1 time: 304653
Aligned >1 times: 604465

Total = 4391240 + 9754672 + 100372 + 304653 + 604465 = 15155402

Overall Alignment Rate = (15155402 / 23174450) * 100 = 65.40%

(二)实际运行的结果

基于上面的解释,以下面的结果为例,二者略有不同,可能是版本的问题:
目前我使用的版本是hisat2 version 2.1.0

(1)双端测序数据比对

23116294 reads; of these:
  23116294 (100.00%) were paired; of these:
    2117146 (9.16%) aligned concordantly 0 times
    18275745 (79.06%) aligned concordantly exactly 1 time
    2723403 (11.78%) aligned concordantly >1 times
    ----
    2117146 pairs aligned concordantly 0 times; of these:
      449895 (21.25%) aligned discordantly 1 time
    ----
    1667251 pairs aligned 0 times concordantly or discordantly; of these:
      3334502 mates make up the pairs; of these:
        2176277 (65.27%) aligned 0 times
        889188 (26.67%) aligned exactly 1 time
        269037 (8.07%) aligned >1 times
95.29% overall alignment rate

我们计算总的比对率0.9529276,与上面结果一致:

Overall alignment rate比例为

(18275745 * 2 + 2723403 * 2 + 449895 * 2 + 889188 + 269037 )/(23116294 * 2) = 0.9529276

Unique mapping 比例为

(18275745 * 2 + 449895 * 2 + 889188)/(23116294 * 2)
[1] 0.8292953

Multiple mapping比例为

(2723403 * 2 + 269037)/(23116294 * 2)
[1] 0.1236323

(2)单端测序数据比对

14479369 reads; of these:
  14479369 (100.00%) were unpaired; of these:
    4838655 (33.42%) aligned 0 times
    8260600 (57.05%) aligned exactly 1 time
    1380114 (9.53%) aligned >1 times
66.58% overall alignment rate

关于上面的Concordant 解释如下(bowtie2 官网):

Concordant pairs match pair expectations, discordant pairs don’t

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align “concordantly”. If both mates have unique alignments, but the alignments do not match paired-end expectations (i.e. the mates aren’t in the expected relative orientation, or aren’t within the expected distance range, or both), the pair is said to align “discordantly”. Discordant alignments may be of particular interest, for instance, when seeking structural variants.

The expected relative orientation of the mates is set using the --ff, --fr, or --rf options. The expected range of inter-mates distances (as measured from the furthest extremes of the mates; also called “outer distance”) is set with the -I and -X options. Note that setting -I and -X far apart makes Bowtie 2 slower. See documentation for -I and -X.

To declare that a pair aligns discordantly, Bowtie 2 requires that both mates align uniquely. This is a conservative threshold, but this is often desirable when seeking structural variants.

By default, Bowtie 2 searches for both concordant and discordant alignments, though searching for discordant alignments can be disabled with the --no-discordantoption.

此外,还得考虑的是如何从SAM/BAM结果文件中将unique mapping reads提取出来,以便于后续分析,具体可以参考链接2。

有一个问题是,看到有的文章是使用concordant unique alignments for paired-end reads进行表达分析,这样的比对数据在我的数据集是19%,不知道这部分数据的实际定量效果如何,目前还没有和全部的数据进行比较。之前我计算表达量都未按SAM tag去提取。

参考:

  1. https://www.biostars.org/p/395017/
  2. https://www.jianshu.com/p/bc3751b72f0c
  3. http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#concordant-pairs-match-pair-expectations-discordant-pairs-dont
  4. https://www.cnblogs.com/leezx/p/8540862.html
  • 3
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值