使用annovar注释突变数据

1 篇文章 0 订阅
1 篇文章 0 订阅

这里主要是根据课题需要使用其中的部分功能

1.annovar下载安装

https://doc-openbio.readthedocs.io/projects/annovar/en/latest/
annovar官方网站的的下载需要使用邮箱注册后才可下载,这里为了使用方便我将刚下载的annovar上传到了网盘链接:https://pan.baidu.com/s/1T0PhTP1TMADsoFApTUPUOw
提取码:uw7f

2.安装

annovar解压后可以直接使用,也可以将其加入环境变量

 tar -zxvf annovar.latest.tar.gz 
 cd annovar && ls
 # annotate_variation.pl  coding_change.pl  convert2annovar.pl  example  humandb  retrieve_seq_from_fasta.pl  table_annovar.pl  variants_reduction.pl
 #加入到全局环境变量中
 vi /etc/profile
 #在最后一行加入export PATH="$PATH:/home/huen/packages/annovar"
 source /etc/profile

3.annovar 注释

1:annovar的注释可以分为三种,基于基因的注释,基于区域的注释,基于过2:滤的注释,当然也可以一次性完成三种注释;
3:annovar 注释之前需要对文件格式进行转换,当然table_annovar.pl可以使用vcf文件作为输入文件,只需要使用-vcfinput;
4:需要注意的是,在使用convert2annovar.pl转换格式时,会有一个QC过滤的过程,所以建议使用该程序进行过滤和格式转换

下面进行格式 转换
1.看一下数据

root@huen-PC:/media/huen/Entertainment/mutect_75# ls *.vcf
1159-1.filtered.vcf  1206-1.filtered.vcf  1229-1.filtered.vcf  1283-1.filtered.vcf  1354-1.filtered.vcf  1395-1.filtered.vcf  1535-1.filtered.vcf  1787-1.filtered.vcf  1881-1.filtered.vcf
1160-1.filtered.vcf  1211-1.filtered.vcf  1230-1.filtered.vcf  1284-1.filtered.vcf  1368-1.filtered.vcf  1397-1.filtered.vcf  1564-1.filtered.vcf  1810-1.filtered.vcf
1164-1.filtered.vcf  1215-1.filtered.vcf  1232-1.filtered.vcf  1291-1.filtered.vcf  1379-1.filtered.vcf  1398-1.filtered.vcf  1569-1.filtered.vcf  1819-1.filtered.vcf
1175-1.filtered.vcf  1217-1.filtered.vcf  1241-1.filtered.vcf  1299-1.filtered.vcf  1380-1.filtered.vcf  1405-1.filtered.vcf  1607-1.filtered.vcf  1821-1.filtered.vcf
1185-1.filtered.vcf  1218-1.filtered.vcf  1250-1.filtered.vcf  1315-1.filtered.vcf  1384-1.filtered.vcf  1411-1.filtered.vcf  1684-1.filtered.vcf  1847-1.filtered.vcf
1186-1.filtered.vcf  1219-1.filtered.vcf  1252-1.filtered.vcf  1321-1.filtered.vcf  1387-1.filtered.vcf  1438-1.filtered.vcf  1713-1.filtered.vcf  1848-1.filtered.vcf
1188-1.filtered.vcf  1225-1.filtered.vcf  1263-1.filtered.vcf  1325-1.filtered.vcf  1388-1.filtered.vcf  1456-1.filtered.vcf  1759-1.filtered.vcf  1853-1.filtered.vcf
1191-1.filtered.vcf  1226-1.filtered.vcf  1273-1.filtered.vcf  1326-1.filtered.vcf  1389-1.filtered.vcf  1471-1.filtered.vcf  1765-1.filtered.vcf  1876-1.filtered.vcf
1205-1.filtered.vcf  1227-1.filtered.vcf  1278-1.filtered.vcf  1351-1.filtered.vcf  1393-1.filtered.vcf  1511-1.filtered.vcf  1773-1.filtered.vcf  1879-1.filtered.vcf

2.gatk mutect2的filter会对call到的数据进行过滤,通过过滤会被打上PASS标签,所以接下来首先提取出其中的PASS mutation

mkdir PASS
for i in $(ls *vcf)
do
	grep PASS $i >"./PASS/"${i%%\.*}"PASS.vcf"
done

得到以下文件

1159-1PASS.vcf  1191-1PASS.vcf  1219-1PASS.vcf  1241-1PASS.vcf  1284-1PASS.vcf  1351-1PASS.vcf  1388-1PASS.vcf  1411-1PASS.vcf  1569-1PASS.vcf  1787-1PASS.vcf  1876-1PASS.vcf
1160-1PASS.vcf  1205-1PASS.vcf  1225-1PASS.vcf  1250-1PASS.vcf  1291-1PASS.vcf  1354-1PASS.vcf  1389-1PASS.vcf  1438-1PASS.vcf  1607-1PASS.vcf  1810-1PASS.vcf  1879-1PASS.vcf
1164-1PASS.vcf  1206-1PASS.vcf  1226-1PASS.vcf  1252-1PASS.vcf  1299-1PASS.vcf  1368-1PASS.vcf  1393-1PASS.vcf  1456-1PASS.vcf  1684-1PASS.vcf  1819-1PASS.vcf  1881-1PASS.vcf
1175-1PASS.vcf  1211-1PASS.vcf  1227-1PASS.vcf  1263-1PASS.vcf  1315-1PASS.vcf  1379-1PASS.vcf  1395-1PASS.vcf  1471-1PASS.vcf  1713-1PASS.vcf  1821-1PASS.vcf
1185-1PASS.vcf  1215-1PASS.vcf  1229-1PASS.vcf  1273-1PASS.vcf  1321-1PASS.vcf  1380-1PASS.vcf  1397-1PASS.vcf  1511-1PASS.vcf  1759-1PASS.vcf  1847-1PASS.vcf
1186-1PASS.vcf  1217-1PASS.vcf  1230-1PASS.vcf  1278-1PASS.vcf  1325-1PASS.vcf  1384-1PASS.vcf  1398-1PASS.vcf  1535-1PASS.vcf  1765-1PASS.vcf  1848-1PASS.vcf
1188-1PASS.vcf  1218-1PASS.vcf  1232-1PASS.vcf  1283-1PASS.vcf  1326-1PASS.vcf  1387-1PASS.vcf  1405-1PASS.vcf  1564-1PASS.vcf  1773-1PASS.vcf  1853-1PASS.vcf

3.格式 转换bash anno_vcf4_to_aviinput.sh

root@huen-PC:/media/huen/Entertainment/mutect_75/PASS# cat anno_vcf4_to_aviinput.sh 
all_number=73
count=0
mkdir avinput_files
for i in $(ls *vcf)
do
count=$((count+1))
remain=$((73-count))
outname="./avinput_files/"${i%%\.*}
convert2annovar.pl -format vcf4 $i > ${outname}.avinput 2>convert.log
echo -en $count $remain"\r"
done

4.进行table_annoval的注释,这里使用的是软件自带的refGene,同时使用shell的有名管道进行多线程的操作
这是格式转换后的文件

1159-1PASS.avinput  1205-1PASS.avinput  1226-1PASS.avinput  1263-1PASS.avinput  1321-1PASS.avinput  1384-1PASS.avinput  1405-1PASS.avinput  1569-1PASS.avinput  1810-1PASS.avinput  1881-1PASS.avinput
1160-1PASS.avinput  1206-1PASS.avinput  1227-1PASS.avinput  1273-1PASS.avinput  1325-1PASS.avinput  1387-1PASS.avinput  1411-1PASS.avinput  1607-1PASS.avinput  1819-1PASS.avinput  table_annoval.sh
1164-1PASS.avinput  1211-1PASS.avinput  1229-1PASS.avinput  1278-1PASS.avinput  1326-1PASS.avinput  1388-1PASS.avinput  1438-1PASS.avinput  1684-1PASS.avinput  1821-1PASS.avinput  table_annovar_refGene
1175-1PASS.avinput  1215-1PASS.avinput  1230-1PASS.avinput  1283-1PASS.avinput  1351-1PASS.avinput  1389-1PASS.avinput  1456-1PASS.avinput  1713-1PASS.avinput  1847-1PASS.avinput
1185-1PASS.avinput  1217-1PASS.avinput  1232-1PASS.avinput  1284-1PASS.avinput  1354-1PASS.avinput  1393-1PASS.avinput  1471-1PASS.avinput  1759-1PASS.avinput  1848-1PASS.avinput
1186-1PASS.avinput  1218-1PASS.avinput  1241-1PASS.avinput  1291-1PASS.avinput  1368-1PASS.avinput  1395-1PASS.avinput  1511-1PASS.avinput  1765-1PASS.avinput  1853-1PASS.avinput
1188-1PASS.avinput  1219-1PASS.avinput  1250-1PASS.avinput  1299-1PASS.avinput  1379-1PASS.avinput  1397-1PASS.avinput  1535-1PASS.avinput  1773-1PASS.avinput  1876-1PASS.avinput
1191-1PASS.avinput  1225-1PASS.avinput  1252-1PASS.avinput  1315-1PASS.avinput  1380-1PASS.avinput  1398-1PASS.avinput  1564-1PASS.avinput  1787-1PASS.avinput  1879-1PASS.avinput

注释

mkdir table_annovar_refGene
#使用10个线程去注释
bash table_annoval.sh 10
cat table_annoval.sh 
thread=$1
mkfifo "test"
exec 6<>"test"
rm -f "test"
for ((i=1;i<=${thread};i++))
do
	echo >&6
done
	for i in *avinput
	
do
	read -u6
	{
	table_annovar.pl $i /home/huen/packages/annovar/humandb/ -buildver hg19 -out "./table_annovar_refGene/"${i%%\.*}  -remove -protocol refGene -operation g -nastring .
       echo " ">&6	
}&
done
wait
exec 6>&-

5.将格式转换为maftools需要的格式

bash add_samplename.sh
cat add_samplename.sh
for i in *multianno.txt
do
	sed '1d' $i | sed "s/$/\t${i%%\.*}/" >> all_annovar
done 

wait
grep -P "\texonic\t" all_annovar >all_annovar_exonic
wait
awk -F "\t" '{if(NR==1) {print "Chr\tStart\tEnd\tRef\tAlt\tFunc.refGene\tGene.refGene\tGeneDetail.refGene\tExonicFunc.refGene\tAAChange.refGene\tTumor_Sample_Barcode"}} {for(i=1;i<=10;i++){printf $i"\t"}} {print $11}' all_annovar_exonic >annovar_filter

最终获得annovar_filter文件

#展示部分结果
Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	Tumor_Sample_Barcode
chr1	1426048	1426048	G	A	exonic	ATAD3B	.	stopgain	ATAD3B:NM_001317238:exon13:c.G1473A:p.W491X,ATAD3B:NM_031921:exon15:c.G1611A:p.W537X	1159-1PASS
chr1	2235904	2235904	G	T	exonic	SKI	.	synonymous SNV	SKI:NM_003036:exon5:c.G1647T:p.L549L	1159-1PASS
chr1	19545927	19545927	T	G	exonic	EMC1	.	nonsynonymous SNV	EMC1:NM_001271429:exon22:c.A2786C:p.K929T,EMC1:NM_001271427:exon23:c.A2849C:p.K950T,EMC1:NM_00127142
8:exon23:c.A2849C:p.K950T,EMC1:NM_015047:exon23:c.A2852C:p.K951T	1159-1PASS
chr1	27876433	27876433	C	G	exonic	AHDC1	.	nonsynonymous SNV	AHDC1:NM_001029882:exon6:c.G2194C:p.D732H,AHDC1:NM_001371928:exon8:c.G2194C:p.D732H	1159-1PASS
chr1	46073542	46073542	C	T	exonic	NASP	.	nonsynonymous SNV	NASP:NM_001195193:exon4:c.C767T:p.T256I,NASP:NM_002482:exon6:c.C959T:p.T320I	1159-1PASS
chr1	103379236	103379237	CC	AA	exonic	COL11A1	.	nonframeshift substitution	COL11A1:NM_080630:exon51:c.3640_3641delinsTT:p.G1214F,COL11A1:NM_001190709:exon52:c.3871_387
2delinsTT:p.G1291F,COL11A1:NM_001854:exon53:c.3988_3989delinsTT:p.G1330F,COL11A1:NM_080629:exon53:c.4024_4025delinsTT:p.G1342F	1159-1PASS

接下来就可以用这个结果去R里面进行统计分析啦,我准备使用maftools

  • 7
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值