1.Hisat2建立基因组索引:
First, using the python scripts included in the HISAT2 package, extract splice-site and exon information from the gene
annotation file:
$ extract_splice_sites.py gemome.gtf >genome.ss#得到剪接位点信息
$ extract_exons.py genome.gtf >genome.exon#得到外显子信息
Second, build a HISAT2 index:
$ hisat2-build --ss genome.ss --exon genome.exon genome.fa genome
备注extract_splice_sites.py 和 extract_exons.py 在hisat2软件包中涵盖了,这两步不是必须的,只是为了发现剪切位点,也可以直接:
$ hisat2-build
genome.fa genome
2. 利用hisat2比对到基因组:
hisat2 -p 8 --dta -x genome -1 file1_1.fastq.gz -2 file1_2.fastq.gz -S file1.sam
hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 file2_1.fastq.gz -2 file2_2.fastq.gz -S file2.sam
备注:--dta:输出转录组型的报告文件
-x:基因组索引
-S : 输出sam文件
-p: 线程数
其他参数:
Input:
-q query input files are FASTQ .fq/.fastq (default)
--qseq query input files are in Illumina's qseq format
-f query input files are (multi-)FASTA .fa/.mfa
-r query input files are raw one-sequence-per-line
-c , , are sequences themselves, not files
-s/--skip skip the first reads/pairs in the input (none)
-u/--upto stop after first reads/pairs (no limit)
-5/--trim5 trim bases from 5'/left end of reads (0)
-3/--trim3 trim bases from 3'/right end of reads (0)
--phred33 qualities are Phred+33 (default)
--phred64 qualities are Phred+64
--int-quals qualities encoded as space-delimited integers
Alignment:
-N max # mismatches in seed alignment; can be 0 or 1 (0)
-L length of seed substrings; must be >3, <32 (22)
-i interval between seed substrings w/r/t read len (S