下载GTF注释文件
对NCBI:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.3/GFF/ref_GRCh37.p5_top_level.gff3.gz ## hg19
对于ensembl:
wget ftp://ftp.ensembl.org/pub/current_gtf/homo_sapiens/Homo_sapiens.GRCh38.90.gtf.gz## hg38
wget ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz## hg19
变换中间的release就可以拿到所有版本信息:ftp://ftp.ensembl.org/pub/
对于UCSC
需要选择一系列参数:
1. Navigate to http://genome.ucsc.edu/cgi-bin/hgTables
2. Select the following options:
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: Select "genome" for the entire genome.
output format: GTF - gene transfer format
output file: enter a file name to save your results to a file, or leave blank to display results in the browser
3. Click 'get output'.
下载参考基因组序列
UCSC里面下载:
wegt http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz ##hg19
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz ##hg38
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/chromFa.tar.gz ##hg38
tar -zxvf chromFa.tar.gz
cd chroms
cat *.fa >hg19.fa
##hg38同样处理
NCBI:建议用迅雷或ncbi官网提供的下载器下载速度更快
##hg19:
do wget ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.3/Assembled_chromosomes/seq/hs_ref_GRCh37.p5_chr${i}.fa.gz;
gzip -d hs_ref_GRCh37.p5_chr${i}.fa.gz;
cat hs_ref_GRCh37.p5_chr${i}.fa.gz >>hg19.fa;
sleep 10s;
done;
##hg38
:
同理,只要找到下载地址,改动一下即可。