Experimental design :
1. Assessing biological variation requires biological replicates(no need for technical replicates) : 3 preferred,2 is ok, 1 only for exploratory assays (not good for publications)
note : In general,RNA-seq is highly reproducable. So we don't need too many replicates.
2. For differential expression,don't pool RNA from multiple biological replicates.
3. Try to be consistent or process all samples at the same time to avoid batch effect.
note : before experiments: careful design; after experiments: batch effect removal(combat)
4. Ribosome RNA - minus (Remove too abundant genes)
5. Ploy A (mRNA)
6. Strand specific (anti - sense lncRNA)
7. For expression analysis : PE or SE is OK.
8. For splicing , novel transcripts : PE
9. Depth : 30~50M(differential expression , deeper transcript assembly)
10. Read length : longer for transcript assembly
Quality Control : RSeQC ( one computational tool )
1. Read quality : the quality of reads actually reduces as the position of reads increase.
2. Nucleotide compositions:
Before the position of 15, especially before the position of 6: the composotions of A,T,G and C show a very strong bias.
After the position of 15:the compositions of A,T,G and C are stable.
Note: caused by ransom hexamer priming
3. Read count distribution and GC content:
Note: In most cases, the distribution of GC content will be like a normal-like distribution.
4. Read count distribution across gene body and different regions of genes:
Note: Read counts in the middle are higher than 5 'end and 3' end.
5. Insert size distribution(PE reads):
Note: In most cases,the median of the insert size will be between 100 and 200.
Alignment :
1. Prefer splice-aware aligners
2. TopHat, BWA, STAR
3. Sometimes need to trim the beginning bases