Sequencing bias/errors
1. 产生原因
454:识别不同荧光信号,不易区分homopolymer
Illumina:当分子簇形成数量较少时,不能灵敏地捕获荧光信号;及信号冲突,对于High GC区域的覆盖度比较低。
2. 解决方法(Correcting errors in short reads by multiple alignments/ Quake: quality-aware detection and correction of sequencing errors/ ECHO: A reference-free short-read error correction algorithm)
(1)Deep sequencing
(2)Statistical evaluation
(3)Error correction
Speed and RAM
- 高通量数据分析通常需要多个计算节点(CPU-intensive jobs: read mapping, metagenomics)和大的内存(RAM-intensive jobs:genome assembly),CPU访问内存速度比硬盘快得多,若拼接一个人的基因组,约需要512G的内存。
- Performance in amazon EC2(http://bowtie-bio.sorceforge.net/crossbow)
Serchi