1. 安装
1.1 官网:
https://pcingola.github.io/SnpEff/
1.2 下载
- window下载:
点击上述图片上的Download进行下载,上传到服务器即可 - 命令行下载:
wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip
1.3 解压
unzip snpEff_latest_core.zip
cd snpEff/
ll # 查看 如下图所示
2. 配置数据库
2.1 查看数据库
查看数据库物种及其版本,使用snpEff的databases命令
java -jar ./snpEff.jar databases |grep -E "Homo|Human|hg19|hg38|GRCh37|GRCh38"|grep -v -E "test|mane|kg|Bacillus"
GRCh37.75 Homo_sapiens [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.75.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.75.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.75.zip]
GRCh37.87 Human genome GRCh37 using transcripts [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.87.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.87.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.87.zip]
GRCh37.p13 Human genome GRCh37 using RefSeq transcripts [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.p13.zip]
GRCh38.86 GRCh38.86 [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.86.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.86.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.86.zip]
GRCh38.99 Homo_sapiens [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.99.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.99.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.99.zip]
GRCh38.p13 Human genome GRCh38 using RefSeq transcripts [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.p13.zip]
GRCh38.p14 Human genome GRCh38 using RefSeq transcripts [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.p14.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.p14.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.p14.zip]
hg19 Homo_sapiens (UCSC) [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_hg19.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg19.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_hg19.zip]
hg38 Homo_sapiens (UCSC) [https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_hg38.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg38.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_hg38.zip]
2.2 下载
java -jar snpEff.jar download GRCh37.75
java -jar snpEff.jar download GRCh37.87
java -jar snpEff.jar download GRCh38.86
java -jar snpEff.jar download GRCh37.99
java -jar snpEff.jar download hg19
java -jar snpEff.jar download hg38
# or
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.75.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.87.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.86.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.99.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg19.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg38.zip
#解压
unzip snpEff_v5_0_GRCh37.75.zip
unzip snpEff_v5_0_GRCh37.87.zip
unzip snpEff_v5_0_GRCh38.86.zip
unzip snpEff_v5_0_GRCh38.99.zip
unzip snpEff_v5_0_hg19.zip
unzip snpEff_v5_0_hg38.zip
3. 简单使用
3.1 注释
java -jar -Xmx20g ~/software/download/snpEff_new/snpEff/snpEff.jar -v hg19 test.vcf.gz > test.anno2.vcf
#参数解释:
# -Xmx20g 增加java分析分析所需内存
# -v 显示分析的详细过程
# hg19 表示使用hg19的数据库进行注释
3.2 结果
部分结果展示:
##source=GenotypeGVCFs
##bcftools_viewVersion=1.15.1+htslib-1.15.1
##SnpEffVersion="5.2c (build 2024-04-09 12:24), by Pablo Cingolani"
##SnpEffCmd="SnpEff hg19 test.vcf.gz "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA18863_CACTGCTT_3_DL
chr7 55260440 . C T 460.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=0;DP=41;ExcessHet=0;FS=11.992;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=11.24;ReadPosRankSum=-0.847;SOR=0.061;ANN=T|upstream_gene_variant|MODIFIER|EGFR-AS1|EGFR-AS1|transcript|NR_047551.1|pseudogene||n.-3798G>A|||||3798|,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_005228.5|protein_coding|21/27|c.2626-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346897.2|protein_coding|20/25|c.2491-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|21/26|c.2626-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|15/21|c.1825-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|20/26|c.2491-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|21/27|c.2467-19C>T|||||| GT:AD:AF:DP:GQ:PL 0/1:24,17:0.415:41:99:468,0,711
chr7 55266417 . T C 2095.06 PASS AC=2;AF=1;AN=2;DP=65;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=34.35;SOR=2.119;ANN=C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_005228.5|protein_coding|23/28|c.2709T>C|p.Thr903Thr|2970/9905|2709/3633|903/1210||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346897.2|protein_coding|22/26|c.2574T>C|p.Thr858Thr|2835/3848|2574/3276|858/1091||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|23/27|c.2709T>C|p.Thr903Thr|2970/3983|2709/3411|903/1136||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|17/22|c.1908T>C|p.Thr636Thr|2169/9104|1908/2832|636/943||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|22/27|c.2574T>C|p.Thr858Thr|2831/6218|2574/3498|858/1165||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|23/28|c.2550T>C|p.Thr850Thr|2741/9676|2550/3474|850/1157|| GT:AD:AF:DP:GQ:PL 1/1:0,61:1:61:99:2109,184,0
chr7 55268897 . A C 684.64 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=1.06;DP=51;ExcessHet=0;FS=1.404;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=14.57;ReadPosRankSum=0.011;SOR=0.379;ANN=C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.5|protein_coding|25/28|c.2963A>C|p.His988Pro|3224/9905|2963/3633|988/1210||,C|missense_variant|MODERATE|EGFR|EGF|transcript|NM_001346897.2|protein_coding|24/26|c.2828A>C|p.His943Pro|3089/3848|2828/3276|943/1091||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|25/27|c.2963A>C|p.His988Pro|3224/3983|2963/3411|988/1136||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|19/22|c.2162A>C|p.His721Pro|2423/9104|2162/2832|721/943||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|24/27|c.2828A>C|p.His943Pro|3085/6218|2828/3498|943/1165||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|25/28|c.2804A>C|p.His935Pro|2995/9676|2804/3474|935/1157|| GT:AD:AF:DP:GQ:PL 0/1:23,24:0.511:47:99:692,0,647
3.3 结果解析
snpeff在info列中增加了功能注释,具体的信息可以参考:https://pcingola.github.io/SnpEff/adds/VCFannotationformat_v1.0.pdf或者https://pcingola.github.io/SnpEff/snpeff/inputoutput/#ann-field-vcf-output-files