HISAT2序列比对

最新推荐文章于 2023-12-06 13:17:10 发布

qq_27390023

最新推荐文章于 2023-12-06 13:17:10 发布

阅读量2.4k

点赞数

文章标签：生物信息学

本文链接：https://blog.csdn.net/qq_27390023/article/details/120682636

版权

HISAT2是一种快速、灵敏的比对程序，用于将下一代测序读数（全基因组、转录组和外显子组测序数据）与普通人群（以及单个参考基因组）进行比对。

1. 建立索引

建立索引时间长，一般不需要自己建立，常见的基因组索引可以在这里下载。

Usage: hisat2-build [options]* <reference_in> <ht2_index_base>

# 建立基因组索引
hisat2-build hg38.fa  ht2_hg38

# 建立基因组+转录组+SNP索引
hisat2-build -p 8 genome.fa --snp genome.snp --ss genome.ss --exon genome.exon genome_snp_tran_index

注：-p 线程数；--snp，--ss, --exon 后面的文件分别通过 hisat2的python脚本生成。

如：

hisat2_extract_exons.py hg19.refGene.gtf >hg19.exon

hisat2_extract_splice_sites.py hg19.refGene.gtf >hg19.ss

hisat2_extract_snps_haplotypes_UCSC.py hg19_snp151.txt >hg19.snp

Use hisat2_extract_snps_haplotypes_UCSC.py (in the HISAT2 package) to extract SNPs and haplotypes from a dbSNP file (e.g. http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/snp144Common.txt.gz). or hisat2_extract_snps_haplotypes_VCF.py to extract SNPs and haplotypes from a VCF file (e.g. ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/GRCh38_positions/ALL.chr22.phase3_shapeit2_mvncall_integrated_v3plus_nounphased.rsID.genotypes.GRCh38_dbSNP_no_SVs.vcf.gz).

如果你使用--snp、-ss和/或--exon，hisat2构建将需要大约200GB的内存用于人类基因组大小，因为索引构建涉及到一个图形构建。否则，您将能够使用8GB RAM在桌面上构建索引。

2. 查看索引

Usage: hisat2-inspect hisat2-inspect [options]* <ht2_base>

hisat2-inspect ht2_hg38

hisat2-inspect -n ht2_hg38 # 打印参考基因组名称

hisat2-inspect -a ht2_hg38 > hg38.fa # 输出基因组序列，重定向到文件

hisat2-inspect  --exon ht2_hg38  # 打印外显子

hisat2-inspect -ss ht2_hg38 # 打印剪切位点

hisat2-inspect --ss-all ht2_hg38 # 打印所有的剪切位点

hisat2-inspect --snp ht2_hg38 #  打印snp

hisat2-inspect -s ht2_hg38 #  打印summary

2. 比对

Usage:

hisat2 [options]* -x <ht2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]

# SE
hisat2 -p 4 -x genome_index -U test_reads.fq -S eg1.sam
# PE
hisat2 -p 4 -x genome_index -1 test_reads_1.fq -2 test_reads_2.fq -S eg2.sam

参考：

HISAT2 manual

qq_27390023

关注

0
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
HISAT2序列比对

HISAT2是一种快速、灵敏的比对程序，用于将下一代测序读数（全基因组、转录组和外显子组测序数据）与普通人群（以及单个参考基因组）进行比对。1.建立索引建立索引时间长，一般不需要自己建立，常见的基因组索引可以在这里下载。Usage: hisat2-build [options]* <reference_in> <ht2_index_base># 建立基因组索引hisat2-build hg38.fa ht2_hg38# 建立基因组+转录组+SNP索引h.
复制链接

扫一扫