sv cnv 构树之统计的步骤

最新推荐文章于 2021-09-29 17:31:32 发布

zhen19971124

最新推荐文章于 2021-09-29 17:31:32 发布

阅读量1.1k

点赞数

本文链接：https://blog.csdn.net/zhen19971124/article/details/106948966

版权

sv统计步骤

1.得到的max文件变成vcf格式(python脚本）
2. 将得到的所有样本的vcf文件合并

ls all_sample>sample_files
SURVIVOR merge sample_files 1000 300 1 1 0 30 all_sample_sv.vcf

格式说明（文件路径名；合并区域范围；缺失率；类型；?；sv最小长度）

File with VCF names and paths
max distance between breakpoints (0-1 percent of length, 1- number of bp)
Minimum number of supporting caller
Take the type into account (1yes, else no)
Take the strands of SVs into account (1yes, else no)
Disabled.
Minimum size of SVs to be taken into account.
Output VCF filename

得到文件格式如下

chr01   256497  INV000SUR       .       .       .       PASS    SUPP=565;SUPP_VEC=01111011111110111011111100001010000000000000001011000110010;SVLEN=10901;SVTYPE=INV;SVMETHOD=SURVIVOR1.0.7;CHR2=chr01;END=267398;CIPOS=-27,47;CIEND=-415,85;STRANDS=++   GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO     ./.:NaN:0:0,0:--:NaN:NaN:NaN:NAN:NAN:NAN12:0,0:++

3 筛选指定分组，统计其SV

perl match.pl all_sample_merged.vcf count0.txt#筛选出需要的sv类型及各个样本的01值
perl spnum.pl count0.txt group0 group1 sum.txt#将样本名称和其值对应上，并筛选不同分组的值 做累加

4 得到sum.txt

     20 565 group0 INV
     21 399 group0 INV
     22 551 group0 INV
     23 576 group0 INV
     24 381 group0 INV
     25 457 group0 DEL
     26 388 group0 INV

5 写脚本统计每组不同类型sv总数，或用excel统计