sv统计步骤
1.得到的max文件变成vcf格式(python脚本)
2. 将得到的所有样本的vcf文件合并
ls all_sample>sample_files
SURVIVOR merge sample_files 1000 300 1 1 0 30 all_sample_sv.vcf
格式说明(文件路径名;合并区域范围;缺失率;类型;?;sv最小长度)
File with VCF names and paths
max distance between breakpoints (0-1 percent of length, 1- number of bp)
Minimum number of supporting caller
Take the type into account (1yes, else no)
Take the strands of SVs into account (1yes, else no)
Disabled.
Minimum size of SVs to be taken into account.
Output VCF filename
得到文件格式如下
chr01 256497 INV000SUR . . . PASS SUPP=565;SUPP_VEC=01111011111110111011111100001010000000000000001011000110010;SVLEN=10901;SVTYPE=INV;SVMETHOD=SURVIVOR1.0.7;CHR2=chr01;END=267398;CIPOS=-27,47;CIEND=-415,85;STRANDS=++ GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN12:0,0:++
3 筛选指定分组,统计其SV
perl match.pl all_sample_merged.vcf count0.txt#筛选出需要的sv类型及各个样本的01值
perl spnum.pl count0.txt group0 group1 sum.txt#将样本名称和其值对应上,并筛选不同分组的值 做累加
4 得到sum.txt
20 565 group0 INV
21 399 group0 INV
22 551 group0 INV
23 576 group0 INV
24 381 group0 INV
25 457 group0 DEL
26 388 group0 INV
5 写脚本统计每组不同类型sv总数,或用excel统计
cnv统计步骤
1,具体流程同上,但1中脚本有变化,2中参数(合并区域范围)要修改