参考:http://bioconductor.org/packages/release/bioc/vignettes/maftools/inst/doc/maftools.html
目录
#读入annovar文件转换为maf——annovarToMaf
介绍
With advances in Cancer Genomics, Mutation Annotation Format (MAF) is being widely accepted and used to store somatic variants detected. The Cancer Genome Atlas Project has sequenced over 30 different cancers with sample size of each cancer type being over 200. Resulting data consisting of somatic variants are stored in the form of Mutation Annotation Format. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.
随着癌症基因组学的进步,突变注释格式(MAF)被广泛接受并用于存储检测到的体细胞变体。 癌症基因组图谱项目对30多种不同的癌症进行了测序,每种癌症类型的样本量超过200种。由体细胞变体组成的结果数据以MAF格式形式存储。 只要数据采用MAF格式,该软件包就会尝试从TCGA源或任何内部研究中有效地汇总,分析,注释和可视化MAF文件。
准备
使用前要先将文件转换为maf格式,对于VCF格式文件,可以使用vcf2maf进行格式转换
maf文件包含的内容:
-
Mandatory fields: Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2, Variant_Classification, Variant_Type and Tumor_Sample_Barcode.
-
Recommended optional fields: non MAF specific fields containing VAF (Variant Allele Frequecy) and amino acid change information.
格式转换
#将突变结果进行注释,得到txt文件
for i in *.somatic.anno;do perl ~/software/Desktop/annovar/table_annovar.pl $sra_file /home/yang.zou/database/humandb_new/ -buildver hg19 -out variants --otherinfo -remove -protocol ensGene -operation g -nastring NA -outfile;done
#然后将所有.hg19_multianno.txt文件添加一列填入文件名前缀并将所有txt文件拼接成一个文件,提取出含有外显子的信息
for i in *.hg19_multianno.txt;do sed '1d' $i | sed "s/$/${i%%.*}/" >> all_annovar;done
grep -P "\texonic\t" all_annovar > all_annovar2
#格式转换
perl to-maftools.pl all_annovar2 #将文件转换为maf格式
#to-maftools.pl
use strict;
use warnings;
open (FA,"all_annovar2");
open (FB,">all_annovar3");
print FB "Chr\tStart\tEnd\tRef\tAlt\tFunc.ensGene\tGene.ensGene\tGeneDetail.ensGene\tExonicFunc.ensGene\tAAChange.ensGene\tTumor_Sample_Barcode\n";
while (<FA>){
chomp;
my @l=split /\t/,$_;
print FB $l[0],"\t",$l[1],"\t",$l[2],"\t",$l[3],"\t",$l[4],"\t",$l[5],"\t",$l[6],"\t",$l[7],"\t",$l[8],"\t",$l[9],"\t",$l[10],"\n";
}
总体分析框架
maftools安装
source("http://bioconductor.org/biocLite.R")
biocLite("maftools")
library(maftools)
注:安装过程特别麻烦,按了好几天,R版本要求3.3以上,也不要使用最新版本,可能有的包新版本还没同步。我使用的是:
version.string R version 3.4.1 (2017-06-30)
正式处理
#读入annovar文件转换为maf——annovarToMaf
#read maf
var.annovar.maf = annovarToMaf(annovar = "all_annovar3", Center = 'NA', refBuild = 'hg19', tsbCol = 'Tumor_Sample_Barcode', table = 'ensGene',sep = "\t")
write.table(x=var.annovar.maf,file="var_annovar_maf",quote= F,sep="\t",row.names=F)
annovarToMaf函数说明
Description
Converts variant annotations from Annovar into a basic MAF.将annovar格式转换为maf格式
Usage
annovarToMaf(annovar, Center = NULL, refBuild = "hg19", tsbCol = NULL,
table = "refGene", basename = NULL, sep = "\t", MAFobj = FALSE,
sampleAnno = NULL)