突变结果vcf文件注释工具snpEff的安装和使用

1. 安装

1.1 官网:

https://pcingola.github.io/SnpEff/
snpEff官网

1.2 下载

  1. window下载:
    点击上述图片上的Download进行下载,上传到服务器即可
  2. 命令行下载:
wget -c https://snpeff.blob.core.windows.net/versions/snpEff_latest_core.zip

1.3 解压

unzip snpEff_latest_core.zip
cd snpEff/
ll # 查看 如下图所示

在这里插入图片描述

2. 配置数据库

2.1 查看数据库

查看数据库物种及其版本,使用snpEff的databases命令

java -jar ./snpEff.jar databases |grep -E "Homo|Human|hg19|hg38|GRCh37|GRCh38"|grep -v -E "test|mane|kg|Bacillus"
GRCh37.75                                                   	Homo_sapiens                                                	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.75.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.75.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.75.zip]
GRCh37.87                                                   	Human genome GRCh37 using transcripts                       	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.87.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.87.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.87.zip]
GRCh37.p13                                                  	Human genome GRCh37 using RefSeq transcripts                	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh37.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh37.p13.zip]
GRCh38.86                                                   	GRCh38.86                                                   	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.86.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.86.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.86.zip]
GRCh38.99                                                   	Homo_sapiens                                                	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.99.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.99.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.99.zip]
GRCh38.p13                                                  	Human genome GRCh38 using RefSeq transcripts                	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.p13.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.p13.zip]
GRCh38.p14                                                  	Human genome GRCh38 using RefSeq transcripts                	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_GRCh38.p14.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.p14.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_GRCh38.p14.zip]
hg19                                                        	Homo_sapiens (UCSC)                                         	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_hg19.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg19.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_hg19.zip]
hg38                                                        	Homo_sapiens (UCSC)                                         	          	                              	[https://snpeff.blob.core.windows.net/databases/v5_2/snpEff_v5_2_hg38.zip, https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg38.zip, https://snpeff.blob.core.windows.net/databases/v5_1/snpEff_v5_1_hg38.zip]

2.2 下载

java -jar snpEff.jar download GRCh37.75
java -jar snpEff.jar download GRCh37.87
java -jar snpEff.jar download GRCh38.86
java -jar snpEff.jar download GRCh37.99
java -jar snpEff.jar download hg19
java -jar snpEff.jar download hg38
# or
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.75.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh37.87.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.86.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_GRCh38.99.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg19.zip
wget -c https://snpeff.blob.core.windows.net/databases/v5_0/snpEff_v5_0_hg38.zip
#解压
unzip snpEff_v5_0_GRCh37.75.zip
unzip snpEff_v5_0_GRCh37.87.zip
unzip snpEff_v5_0_GRCh38.86.zip
unzip snpEff_v5_0_GRCh38.99.zip
unzip snpEff_v5_0_hg19.zip
unzip snpEff_v5_0_hg38.zip

手动下载结果
解压后 或自动下载后的data路径下目录
GRCh37.75数据库内容

3. 简单使用

3.1 注释

java -jar -Xmx20g ~/software/download/snpEff_new/snpEff/snpEff.jar  -v hg19 test.vcf.gz > test.anno2.vcf
#参数解释:
# -Xmx20g 增加java分析分析所需内存
# -v 显示分析的详细过程
# hg19 表示使用hg19的数据库进行注释

在这里插入图片描述

3.2 结果

部分结果展示:

##source=GenotypeGVCFs
##bcftools_viewVersion=1.15.1+htslib-1.15.1
##SnpEffVersion="5.2c (build 2024-04-09 12:24), by Pablo Cingolani"
##SnpEffCmd="SnpEff  hg19 test.vcf.gz "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA18863_CACTGCTT_3_DL
chr7	55260440	.	C	T	460.64	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=0;DP=41;ExcessHet=0;FS=11.992;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=11.24;ReadPosRankSum=-0.847;SOR=0.061;ANN=T|upstream_gene_variant|MODIFIER|EGFR-AS1|EGFR-AS1|transcript|NR_047551.1|pseudogene||n.-3798G>A|||||3798|,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_005228.5|protein_coding|21/27|c.2626-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346897.2|protein_coding|20/25|c.2491-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|21/26|c.2626-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|15/21|c.1825-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|20/26|c.2491-19C>T||||||,T|intron_variant|MODIFIER|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|21/27|c.2467-19C>T||||||	GT:AD:AF:DP:GQ:PL   0/1:24,17:0.415:41:99:468,0,711
chr7	55266417	.	T	C	2095.06	PASS	AC=2;AF=1;AN=2;DP=65;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=34.35;SOR=2.119;ANN=C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_005228.5|protein_coding|23/28|c.2709T>C|p.Thr903Thr|2970/9905|2709/3633|903/1210||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346897.2|protein_coding|22/26|c.2574T>C|p.Thr858Thr|2835/3848|2574/3276|858/1091||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|23/27|c.2709T>C|p.Thr903Thr|2970/3983|2709/3411|903/1136||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|17/22|c.1908T>C|p.Thr636Thr|2169/9104|1908/2832|636/943||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|22/27|c.2574T>C|p.Thr858Thr|2831/6218|2574/3498|858/1165||,C|synonymous_variant|LOW|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|23/28|c.2550T>C|p.Thr850Thr|2741/9676|2550/3474|850/1157||	GT:AD:AF:DP:GQ:PL	1/1:0,61:1:61:99:2109,184,0
chr7	55268897	.	A	C	684.64	PASS	AC=1;AF=0.5;AN=2;BaseQRankSum=1.06;DP=51;ExcessHet=0;FS=1.404;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=14.57;ReadPosRankSum=0.011;SOR=0.379;ANN=C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.5|protein_coding|25/28|c.2963A>C|p.His988Pro|3224/9905|2963/3633|988/1210||,C|missense_variant|MODERATE|EGFR|EGF|transcript|NM_001346897.2|protein_coding|24/26|c.2828A>C|p.His943Pro|3089/3848|2828/3276|943/1091||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346898.2|protein_coding|25/27|c.2963A>C|p.His988Pro|3224/3983|2963/3411|988/1136||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346941.2|protein_coding|19/22|c.2162A>C|p.His721Pro|2423/9104|2162/2832|721/943||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346899.1|protein_coding|24/27|c.2828A>C|p.His943Pro|3085/6218|2828/3498|943/1165||,C|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_001346900.2|protein_coding|25/28|c.2804A>C|p.His935Pro|2995/9676|2804/3474|935/1157||	GT:AD:AF:DP:GQ:PL	0/1:23,24:0.511:47:99:692,0,647

3.3 结果解析

snpeff在info列中增加了功能注释,具体的信息可以参考:https://pcingola.github.io/SnpEff/adds/VCFannotationformat_v1.0.pdf或者https://pcingola.github.io/SnpEff/snpeff/inputoutput/#ann-field-vcf-output-files

使用GATK软件将vcf文件中GT为1/1的纯合突变vcf文件中提取出来,可以按照以下步骤进行操作: 1. 安装并配置GATK软件:确保已经安装了GATK软件,并且已经配置了相关的环境变量和路径。 2. 读取vcf文件使用GATK中的VCFReader工具读取要处理的vcf文件。 3. 过滤GT为1/1的纯合突变使用GATK中的过滤器(Filter)对vcf文件中的记录进行过滤,只保留GT为1/1的纯合突变。可以使用HaplotypeCaller过滤器进行过滤。 4. 输出结果:使用GATK中的VCFWriter工具将过滤后的纯合突变写入新的vcf文件中。 下面是使用GATK提取GT为1/1的纯合突变的步骤示例: 1. 使用命令行进入vcf文件所在的目录,并执行以下命令: ```bash gatk VCFReader -I input.vcf -O input_filtered.vcf ``` 上述命令将读取名为input.vcfvcf文件,并将其内容写入名为input_filtered.vcf的新文件中。 2. 使用HaplotypeCaller过滤器对vcf文件进行过滤,只保留GT为1/1的纯合突变。执行以下命令: ```css gatk HaplotypeCaller -R reference.fasta -I input_filtered.vcf -O output_filtered.vcf --java-options "-Xmx4g" --filter "SelectNonRefSamplesWithGT==1" ``` 上述命令中,reference.fasta是参考基因组文件,output_filtered.vcf是过滤后的纯合突变结果文件。 3. 使用VCFWriter工具将过滤后的纯合突变写入新的vcf文件中。执行以下命令: ```css gatk VCFWriter -V output_filtered.vcf -O output_cleaned.vcf ``` 上述命令将过滤后的纯合突变写入名为output_cleaned.vcf的新文件中。 完成上述步骤后,output_cleaned.vcf文件中就包含了GT为1/1的纯合突变了。请注意,这只是一个简单的示例,实际操作中可能需要根据具体情况进行调整和优化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值