hg19
ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz
convert GRch37 to hg19
ftp://gsapubftp-anonymous@ftp.broadinstitute.org/Liftover_Chain_Files/b37tohg19.chain
gatk --java-options "-Djava.io.tmpdir=./ -Xmx60G"" LiftoverVcf -I query.vcf.gz -O query.hg19.vcf.gz -R ucsc.hg19.fasta --REJECT unmapped.vcf -C b37tohg19.chain
bgzip -c query.hg19.vcf > query.hg19.vcf.gz
tabix -p vcf query.hg19.vcf.gz
http://hgdownload.cse.ucsc.edu/gbdb/hg19/liftOver/
http://bioinfo5pilm46.mit.edu/software/GATK/resources/
gnomad
axel -n20 http://www.openbioinformatics.org/annovar/download/hg19_gnomad211_exome.txt.gz
axel -n20 http://www.openbioinformatics.org/annovar/download/hg19_gnomad211_genome.txt.gz
Annovar will get the same table title information from gnomad211_exome and gnomad211_genome like this:
AF AF_popmax AF_male AF_female AF_raw AF_afr AF_sas AF_amr AF_eas AF_nfe AF_fin AF_asj AF_oth non_topmed_AF_popmax non_neuro_AF_popmax non_cancer_AF_popmax controls_AF_popmax
change the gnomad211_exome table_tile (e.g: from AF_male to exome_AF_male)
change the gnomad211_genome table_tile (e.g: from AF_male to genome_AF_male)
And rebuild the index file using: index_annovar.pl(Download from Annovar)
COSMIC(v88)
For mutations found in at least 50 samples according to the COSMIC database (“hotspots”)[1].
You could find this info from CosmicCodingMuts.vcf(Download from COSMIC).
##INFO=<ID=CNT,Number=1,Type=Integer,Description="How many samples have this mutation">
Clinvar
axel -n20 http://www.openbioinformatics.org/annovar/download/hg19_clinvar_20190305.txt.gz
axel -n20 http://www.openbioinformatics.org/annovar/download/hg19_clinvar_20190305.txt.idx.gz
common snp
A common SNP is one that has at least one 1000Genomes population with a minor allele of frequency >= 1% and for which 2 or more founders contribute to that minor allele frequency.
DNA fusion
FACTERA:https://factera.stanford.edu/download.php
GeneFuse:https://github.com/OpenGene/GeneFuse
2.principle
-
discard the synonymous SNV\intronic\intergenic\UTR.
-
retain “Pathogenic\Likely_pathogenic\drug_response” site in clinvar and InterVar_automated[4].
-
CNT>50 in COSMIC will retain
-
filter common snp from:1000Genomes,EXAC,ESP,genomAD
-
non-synonymous variants annotation
'SIFT_pred','Polyphen2_HDIV_pred','CADD_phred','FATHMM_pred','MutationAssessor_pred'
Reference
1. Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data[J]. Computational and structural biotechnology journal, 2018, 16: 15-24.
2. Sallevelt S C E H, De Koning B, Szklarczyk R, et al. A comprehensive strategy for exome-based preconception carrier screening[J]. Genetics in Medicine, 2017, 19(5): 583.
3. Strom S P. Current practices and guidelines for clinical next-generation sequencing oncology testing[J]. Cancer biology & medicine, 2016, 13(1): 3.
4. Sukhai M A, Misyura M, Thomas M, et al. Somatic Tumor Variant Filtration Strategies to Optimize Tumor-Only Molecular Profiling Using Targeted Next-Generation Sequencing Panels[J]. The Journal of Molecular Diagnostics, 2019, 21(2): 261-273.
5. Mandelker D, Donoghue M T A, Talukdar S, et al. Germline-Focused Analysis of Tumour-Only Sequencing: Recommendations from the ESMO Precision Medicine Working Group[J]. Annals of Oncology, 2019.