常用或特别的人类fasta参考基因组下载链接

Fasta格式是一种基于文本用于表示核酸序列或多肽序列的格式,一般情况下可在大型的国际基因组网站的ftp链接下载到,文件后缀名多为".fasta"、".fa"、".fna",有的为压缩格式。比NCBI36/hg18更早的版本因为过旧,所以这里不予以推荐。以下的参考链接以智人(Homo Sapiens)为主。


一、NCBI36 / hg18

1. human_b36

该参考的染色体编号开头不含“chr”,是千人基因组过去使用过的参考基因组,包含EBV病毒序列type1类型(NC_007605)但不含ALT重叠群(alternate loci)。现已弃用,以下分女用、男用两种版本。

(1) human_b36_female

https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gzhttps://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_female.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_female.fa.gz(2) human_b36_male

https://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gzhttps://ftp.ncbi.nlm.nih.gov/1000genomes/ftp/technical/retired_reference/human_b36_male.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/retired_reference/human_b36_male.fa.gz

2. Ensembl release 54

Homo_sapiens.NCBI36.54.dna.toplevel,该参考的染色体编号开头不含“chr”。

https://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gzhttps://ftp.ensembl.org/pub/release-54/fasta/homo_sapiens/dna/Homo_sapiens.NCBI36.54.dna.toplevel.fa.gz


二、GRCh37 / hg19

1. human_g1k_v37(别名:hs37-1kg)

Human g1k v37 是GRCh37系列的基础参考,且该参考的染色体编号开头不含“chr”。

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz

2. Homo_sapiens_assembly19(别名:hs37)

Broad Institute所用参考的类GRCh37版本,介于 human g1k v37 和 hs37d5 之间,它比 human g1k v37 多了EBV病毒序列 NC_007605,但不含 hs37d5 的级联诱饵序列。

https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fastahttps://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta

3. hs37d5

该参考在 human g1k v37 的基础上增加了 Broad Institute 的名为hs37d5的级联诱饵序列( concatenated decoy sequences,有来自HuRef、BAC或者质粒克隆和NA12878,可以提高序列比对的准确率)和 human herpesvirus 4 type 1 sequence 人类疱疹病毒序列(NC_007605),且该参考也是 Dante Labs 全基因组测序目前使用的参考基因组。该参考的染色体编号开头不含“chr”。

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gzhttps://www.yfish.org/static/hs37d5.7zhttps://www.yfish.org/static/hs37d5.7z

4. hg19

(1) YSEQ全基因组测序目前使用的参考,采用长度16569的通用的rCRS线粒体序列:

https://genomes.yseq.net/WGS/ref/hg19/hg19.ziphttps://genomes.yseq.net/WGS/ref/hg19/hg19.zip(2) UCSC原版,采用长度16571的旧版的约鲁巴人(Yoruba)线粒体序列,不推荐一般情况下使用:

https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gzhttps://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz

5. Ensembl release 75

(1) Homo_sapiens.GRCh37.75.dna.primary_assembly,该参考的染色体编号开头不含“chr”,且SN与 human g1k v37 基本一致。

https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gzhttps://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz(2) Homo_sapiens.GRCh37.75.dna.toplevel,该参考的染色体编号开头不含“chr”。

https://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gzhttps://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.toplevel.fa.gz

6. build37_used_by_cg

该参考的染色体下载自千人基因组,且编号开头为UCSC样式(“chr”+编号),只有已编排到主序列的部分,不含未定位序列,因此不建议一般情况下使用。

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/cg_alignment_reference/build37_used_by_cg.fa.gz


 三、GRCh38 / hg38

1. GCA_000001405.15_GRCh38_no_alt_analysis_set(别名:hs38)

该参考的染色体编号开头包含“chr”前缀,比 GCA_000001405.15_GRCh38_full_analysis_set 序列少了可能影响读取映射器的ALT重叠群(alternate loci),且比 GRCh38 primary assembly 多出EBV病毒序列以作诱饵,更适合一般情况下的参考选用,且该参考也是 Nebula 全基因组测序目前使用的参考。

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz

2. GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set(别名:hs38d1)

该参考比 GCA_000001405.15 GRCh38 no alt analysis set 多了哈佛医学院提交到NCBI的 hs38d1 诱饵序列(包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列)。

(1) NCBI官网的版本:

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna.gz(2) 其他来源,比NCBI的参考多了多种病毒序列:

https://www.yfish.org/static/hs38d1.7zhttps://www.yfish.org/static/hs38d1.7z

3. GCA_000001405.15_GRCh38_full_analysis_set(别名:hs38a)

该参考比UCSC的hg38多了EBV病毒的序列(chrEBV),且比 GCA_000001405.15 GRCh38 no alt analysis set 多了ALT重叠群。

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

4. GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set

该参考比 GCA_000001405.15 GRCh38 full analysis set 多了hs38d1的诱饵序列,也比 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 多了ALT重叠群。在Broad,也叫Homo_sapiens_assembly38。

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_full_plus_hs38d1_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.fna.gz

5. GRCh38_full_analysis_set_plus_decoy_hla(别名:hs38DH、GRCh38DH)

该参考比 GCA_000001405.15 GRCh38 full plus hs38d1 analysis set 多了大量HLA分型的序列,且比 GCA_000001405.15 GRCh38 no alt analysis set 多了ALT重叠群、hs38d1的诱饵序列、HLA分型所在序列,同时该参考也被用作古人DNA(aDNA)的cram数据的参考基因。

https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fahttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fahttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fahttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa

6. hg38

https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gzhttps://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gzhttps://genomes.yseq.net/WGS/ref/hg38/hg38.fahttps://genomes.yseq.net/WGS/ref/hg38/hg38.fa

7. Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC

该参考在 GCA_000001405.15_GRCh38_no_alt_analysis_set 的基础上增加了核酸内切酶非催化亚基序列ERCC。

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fastahttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGDP_transcriptome/working/HGDP_transcriptome_GRCh38/reference/Homo_sapiens_assembly38_noALT_noHLA_noDecoy_ERCC.fasta

8. Homo_sapiens_assembly38(别名:hs38DH、GRCh38DH)

Broad Institute 所用参考的类hg38版本,碱基序列与 GRCh38 full analysis set plus decoy hla 基本相同。

https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fastahttps://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta

9. Ensembl release 106

(1) Homo_sapiens.GRCh38.dna.primary_assembly,该参考的SN不含EBV病毒序列,且该参考的染色体编号开头不含“chr”,但其他部分与 GCA_000001405.15 GRCh38 no alt analysis set 相对一致。

https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gzhttps://ftp.ensembl.org/pub/release-105/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz(2) Homo_sapiens.GRCh38.dna.toplevel,该参考的染色体编号开头不含“chr”。

https://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gzhttps://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

10. hs38s

该参考在 GCA_000001405.15 GRCh38 no alt plus hs38d1 analysis set 的基础上多了包含GSTT1基因的 22_KI270879v1_alt 序列,其他染色体编号也在hs38d1参考的基础上去掉前缀“chr”,并且线粒体编号用MT来表示。这也是Sequencing.com测序机构所使用的参考基因组。

https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/contenthttps://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgR0QualUlHx53-0U/root/content


 
四、T2T-CHM13

与其他参考相比,Telomere-to-Telomere(T2T)机构已实现从端粒到端粒的完整测序,填补了传统测序的残留空白,但T2T仍处于实验阶段,且可能存在单个位点错误等问题。详细资料建议自行查阅。2021年4月,有关T2T-CHM13的完整人类参考相关论文已被发布到《科学》杂志。

1. CHM13_v1.1

CHM13 T2T v1.1 参考的“chr+数字”命名染色体+线粒体的版本,不含Y染色体。

https://processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CP086569.1-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CM034974.1-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

2. CP086569.1-CHM13_v1.1

该参考在 CHM13 T2T v1.1 的基础上增加德系犹太人NA24385样本作Y染色体参考,默认父系单倍群为J1-ZS2712,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CP086569.1。

https://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CP086569.2-CHM13/CHM13_v1.1.fa

3. T2T-v2.0

官方 CHM13 T2T v2.0 的全部参考基因组,Y染色体序列自带NA24385样本的第二版(CP086569.2),且染色体和线粒体命名含前缀“chr”。

https://processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fahttps://processing.open-genomes.org/reference/CP086569.2-CHM13/T2T-v2.0.fa

4. CM034974.1-CHM13_v1.1

该非官方参考在 CHM13 T2T v1.1 的基础上增加了样本HG01243的Y染色体作参考,默认父系单倍群为R1b-DF27,且常染色体、X染色体、线粒体命名前缀包含“chr”,Y染色体命名为CM034974.1且该Y染色体更接近GRCh38。

https://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fahttps://processing.open-genomes.org/reference/CM034974.1-CHM13/CM034974.1-CHM13_v1.1.fa

5. T2T-CHM13v2.0(Genome Informatics Section版本)

(1) CHM13v2.0

T2T-CHM13v2.0 参考本体,染色体X、Y部分重复软屏蔽,且序列名已转换为UCSC样式(“chr”+编号)。

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gzhttps://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0.fa.gz

(2) CHM13v2.0_noY

该参考不含Y染色体,即 T2T-CHM13v1.1。

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gzhttps://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_noY.fa.gz

(3) CHM13v2.0_maskedY

该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽。

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gzhttps://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY.fa.gz

(4) CHM13v2.0_maskedY_rCRS

该参考Y染色体上的假常染色体区(PAR)即同源区被一长串的字母“N”硬屏蔽,并且本参考的线粒体被rCRS的线粒体模型NC_012920.1/J01415.2替换(rCRS也被用于GRCh37/GRCh38/hg38)。

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gzhttps://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_maskedY_rCRS.fa.gz

6. hs1

T2T-CHM13v2.0 参考的“chr+染色体/线粒体编号”命名版本,阅读起来相对方便。

https://hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gzicon-default.png?t=N3I4https://hgdownload.cse.ucsc.edu/goldenPath/hs1/bigZips/hs1.fa.gz


 

五、诱饵序列

1. hs37d5cs

hs37d5 的级联诱饵序列(concatenated decoy sequences),有来自HuRef、BAC或者质粒克隆和NA12878,SN仅以一条“hs37d5”单独命名,且其中的各种序列之间以长度若干的N相连。该诱饵序列已被用于hs37d5参考主序列中。

Index of /vol1/ftp/technical/reference/phase2_reference_assembly_sequence/https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5cs.fa.gz

2. hs37d5ss

hs37d5 的非级联诱饵序列,其中的每条序列均单独存在,这一点类似于hs38d1诱饵序列。

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5ss.fa.gz

3. GCA_000786075.2_hs38d1_genomic

hs38d1 的几千条单独存在的非级联诱饵序列,包括未加入人类基因组的架构、分离自254个公共SGDP样本的全基因组鸟枪法测序序列。其命名不含“chr”前缀和“decoy”后缀。

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/786/075/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz

4. GRCh38_full_analysis_set_plus_decoy_hla-extra(别名:hs38DH-extra)

在诱饵序列 hs38d1 的基础上增加了与HLA分型有关的序列以作为类似于ALT重叠群(alternate loci)的存在,但诱饵命名包含“chr”前缀和“decoy”后缀。该诱饵序列已被用于hs38DH参考主序列中。

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fahttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fahttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fahttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla-extra.fa

5. EBVt1(别名:NC_007605、HHV-1)

即 NC_007605.1,human herpesvirus 4 type 1 sequence 人类疱疹病毒序列,它不属于人类基因组,但可以增加全基因组检测结果的准确度(尤其是唾液样本)。

https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gzhttps://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gzhttps://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/EBVt1.fa.gz


六、其他

以下参考不常用,因此归为一类。比如 GCA_000001405.28_GRCh38.p13 genomic 的2号染色体以GenBank-Accn编号“CM000664.1”表示,而不是常规的“2”或“chr2”;比如 GCF_000001405.39 GRCh38.p13 genomic 的2号染色体以RefSeq-Accn编号“NC_000002.12”表示,用“NT_187361.1”表示“chr1_KI270706v1_random”;以及比如不属于NCBI GRC或UCSC系列的参考基因组“CHM13”“NA12878_prelim”等。以下只列举一部分链接:

1. hg38_CP086569

该混合参考的常染色体(1~22号)、X染色体和线粒体使用hg38序列,Y染色体使用T2T的CP086569.1序列,且不含未定位在主要序列的hg38序列片段。

https://ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fastahttps://ybrowse.org/gbrowse2/gff/CP086569.1/hg38_CP086569.fasta

2. NeandertalizedReference

尼安德特人化的智人参考基因组。该参考的非线粒体部分基因长度与hs37d5长度一致,但参考碱基改为了与古人类——尼安德特人一致的内容,且线粒体长度不等(17569)、增加了肠杆菌噬菌体phiX序列。

https://cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fahttps://cdna.eva.mpg.de/neandertal/Hohlenstein-Stadel/NeandertalizedReference.fa

3. HG01243_v3

https://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/contenthttps://api.onedrive.com/v1.0/shares/s!AgorjTSMFYpjgT_cFUVNNMz6QoTX/root/content

4. NCBI收录分类:GCF

(1) GCF_000001405.25_GRCh37.p13_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz(2) GCF_000001405.40_GRCh38.p14_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz(3) GCF_009914755.1_T2T-CHM13v2.0_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/009/914/755/GCF_009914755.1_T2T-CHM13v2.0/GCF_009914755.1_T2T-CHM13v2.0_genomic.fna.gz(4) CHM1_1.1

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/306/695/GCF_000306695.2_CHM1_1.1/GCF_000306695.2_CHM1_1.1_genomic.fna.gz(5) HuRef

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/125/GCF_000002125.1_HuRef/GCF_000002125.1_HuRef_genomic.fna.gz

5. NCBI收录分类:GCA

(1) T2T_CHM13_v2.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_T2T-CHM13v2.0/GCA_009914755.4_T2T-CHM13v2.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.4_CHM13_T2T_v2.0/GCA_009914755.4_CHM13_T2T_v2.0_genomic.fna.gz(2) T2T_CHM13_v1.1https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.3_T2T-CHM13v1.1/GCA_009914755.3_T2T-CHM13v1.1_genomic.fna.gz(3) T2T_CHM13_v1.0https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/914/755/GCA_009914755.2_T2T-CHM13v1.0/GCA_009914755.2_T2T-CHM13v1.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz(4) GCA_000001405.29_GRCh38.p14_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.29_GRCh38.p14/GCA_000001405.29_GRCh38.p14_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GCA_000001405.28_GRCh38.p13_genomic.fna.gz(5) GCA_000001405.15_GRCh38_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/GCA_000001405.15_GRCh38_genomic.fna.gz(6) CHM1_1.1

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/306/695/GCA_000306695.2_CHM1_1.1/GCA_000306695.2_CHM1_1.1_genomic.fna.gz(7) HuRef

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/125/GCA_000002125.2_HuRef/GCA_000002125.2_HuRef_genomic.fna.gz(8) NA12878_prelim_3.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/077/035/GCA_002077035.3_NA12878_prelim_3.0/GCA_002077035.3_NA12878_prelim_3.0_genomic.fna.gz(9) NA19240_prelim_3.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/524/155/GCA_001524155.4_NA19240_prelim_3.0/GCA_001524155.4_NA19240_prelim_3.0_genomic.fna.gz(10) HG00514_prelim_3.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/180/035/GCA_002180035.3_HG00514_prelim_3.0/GCA_002180035.3_HG00514_prelim_3.0_genomic.fna.gz

(11) HG00733_prelim_1.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/208/065/GCA_002208065.1_HG00733_prelim_1.0/GCA_002208065.1_HG00733_prelim_1.0_genomic.fna.gz(12) YH_2.0(炎黄)

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/004/845/GCA_000004845.2_YH_2.0/GCA_000004845.2_YH_2.0_genomic.fna.gz(13) KOREF1.0

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/712/695/GCA_001712695.1_KOREF1.0/GCA_001712695.1_KOREF1.0_genomic.fna.gz(14) GCA_018873775.2_hg01243.v3.0_genomic

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gzhttps://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/018/873/775/GCA_018873775.2_hg01243.v3.0/GCA_018873775.2_hg01243.v3.0_genomic.fna.gz        …………


【注】

1. 文中列出的 Ensembl 参考仅为NCBI36、GRCh37的最终版,以及GRCh38的最新版,如果需要从Ensembl下载其他release,可进入如下目录来进行选择:

https://ftp.ensembl.org/pub/

2. 以下几种参考基因组也被用于千人基因组(1000genomes)WGS数据的主要参考,后三种推荐在一般情况下使用:

human_b36 (已淘汰)

human_g1k_v37

hs37d5

hs38 (GCA_000001405.15_GRCh38_no_alt_analysis_set,但EBI的官方链接已被移除)

hs38DH (GRCh38_full_analysis_set_plus_decoy_hla)

3. Fasta参考文件的本体既可以直接使用,也可以作为bgzip压缩的gz格式使用。

4. 更多不常见的fasta参考也可以通过在这里逐级搜索对应GCA或GCF的编号下载到(可在 NCBI Genome Remapping Service 的 Source Organism 输入 Homo Sapiens 找出编号):

https://ftp.ncbi.nlm.nih.gov/genomes/all/

https://www.ncbi.nlm.nih.gov/genome/tools/remap

5. hg18的完整版参考被官网移除,因此hg18仅存的的染色体参考版本链接如下(可手动拼接成完整的参考):

http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/

6. Illumina官网也有一部分参考基因组文件,找到Homo Sapiens(智人)所在位置后,根据需要下载并使用即可:

iGenomeshttps://support.illumina.com.cn/sequencing/sequencing_software/igenome.html

【部分内容参考网址及拓展阅读】

Which human reference genome to use?

hg19、GRCH37、b37、hs37d5介绍和区别 | Zhongxu's website

Accurity - Welcome -

WGS Extract Version 3 Beta | wgsextract.github.io

NCBI Genome Remapping Service

概普生信:基因组的T2T测序黑科技

T2T Experiment

The decoy genome

测序中加入Phix的作用

It's finally finished! - Genome Informatics Section

  • 2
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值