13 November 2017
TL;DR: If you map reads to GRCh37 or hg19, use hs37-1kg
:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
If you map to GRCh37 and believe decoy sequences help with better variant calling, use hs37d5
:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
If you map reads to GRCh38 or hg38, use the following:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
There are several other versions of GRCh37/GRCh38. What’s wrong with them? Here are a collection of potential issues:
-
Inclusion of AL