TopHat比对RNA-Seq数据

最新推荐文章于 2024-10-12 10:20:11 发布

qq_27390023

最新推荐文章于 2024-10-12 10:20:11 发布

阅读量1.9k

点赞数 1

文章标签：生物信息学

本文链接：https://blog.csdn.net/qq_27390023/article/details/120640490

版权

TopHat用于转录组测序数据的比对，它建立在Bowtie上，目前已经不再更新。TopHat目前的版本为v2.1.1,运行环境为python2，conda安装时要建立python2的环境，再通过conda install TopHat 安装。

Usage: tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]

1. 比对单端测序数据

# 默认参数
tophat test_genome test_read.fq

#指定参数
tophat –p 20 –o output –G genome.gtf genome test_read.fq

参数说明：

-p 为安排运行tophat所需要的CPU线程数，根据服务器端的CPU总线程数来决定。

-o 为文件输出路径，一般建议每个样单独建立一个文件夹。

-G 后面跟着相应参考基因组的注释文件，在运行时会首先被tophat2调用bowtie2建立index，这个过程会占用一定的时间。

genome，样品物种的基因组index，可用bowtie2-build对物种的genome.fa文件进行建库，命令为：bowtie2-build genome.fa genome ，此命令建立index也需要花费较长的时间。

2. 比对双端测序数据

tophat –p 20 –o output –G genome.gtf genome test_reads_1.fq test_reads_2.fq

注：Starting with version 2.0.10 TopHat accepts mixed input file formats (FASTA/FASTQ)

3. 混合使用测序数据

Usage: tophat [options]* <genome_index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]

‐ or ‐

tophat [options]* <genome_index_base> PE_reads_1.fq.gz,SE_reads.fa PE_reads_2.fq.gz
‐ or ‐
tophat [options]* <genome_index_base> PE_reads_1.fq.gz PE_reads_2.fq.gz,SE_reads.fa

4. 提前建好GTF文件的index，减少运行时间

# 建立注释文件index
tophat –G genome.gtf --transcriptome-index=transcriptome_data/genome.gtf.index  genome
# 使用注释文件index。不需要-G选项
tophat –p 20 –o output --transcriptome-index=transcriptome_data/genome.gtf.index  genome reads_1.fa reads_2.fa

注：conda 安装的TopHat v2.1.1, 使用时加-p，-G等参数，报错。

Error: Could not find Bowtie 2 index files (–G.*.bt2l)