How to tell RNA-seq library type of strand-specific for RNA-seq data (for reads mapping by Tophat)

Background:

There are three library types for Tophat: fr-unstranded, fr-firststrand and fr-secondstrand. The description for these three from Tophat' documentation is list below:

Library TypeExamplesDescription
fr-unstrandedStandard IlluminaReads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststranddUTP, NSR, NNSRSame as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrandLigation, Standard SOLiDSame as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

Based on my understanding,

(1) fr-unstranded is for non-strand-specific reads, and the other for strand-specific ones; 

(2) fr-firststrand: for paired-end reads, the right-end of the pair is firstly sequenced (in the first round of PCR), followed by the left-end (in the second round of PCR), in other words, the first read of the fragment contains the sequence of the antisense strand and sit in the 3' end of the fragment and the second read is of the sense strand and 5' end; for single-end reads, the reads is the sequence of the sense strand (positive);

(3) fr-secondstrand: for paired-end reads, the left-end of the pair is firstly sequenced (in the first round of PCR), followed by the right-end (in the second round of PCR), in other words, the first read of the fragment contains the sequence of the sense strand and sit in the 3' end of the fragment and the second read is of the antisense strand; for single-end reads, the reads is the sequence of the antisense strand (negative);

Following graph shows the difference of the paired-end reads in these three types:

ps: "/1" means the read we get first from the fragment, and the read id, such as "seq***_1", has the same meaning.

How to tell the library type from our law data?

Approach 1: map our reads (preprocessed read is prefered, in other words, the adapters have been removed) to the genome using UCSC genome browser with BLAT or using IGV, and tell whether the reads mapped to sense strand or antisense strand. More details can be found in: http://onetipperday.blogspot.sg/2012/07/how-to-tell-which-library-type-to-use.html,

Approach 2: map these reads with tophat using fr-firststrand and fr-second-strand respectively, and look at the file called "junctions.bed". Choose the parameter with more junctions found. More details can be found in: http://ccb.jhu.edu/software/tophat/faq.shtml#library_type


   
   
   
   

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值